computer graphics and augmented reality

22
Published in PRESENCE: Teleoperations and Virtual Environments Special Issue on Augmented Reality Vol. 6, No. 4, August 1997, pp. 433451 Confluence of Computer Vision and Interactive Graphics for Augmented Reality Gudrun J. Klinker, Klaus H. Ahlers, David E. Breen, PierreYves Chevalier, Chris Crampton, Douglas S. Greer, Dieter Koller, Andre Kramer, Eric Rose, Mihran Tuceryan, Ross T. Whitaker European ComputerIndustry Research Centre (ECRC) Arabellastraße 17, 81925 Munich, Germany Abstract Augmented reality (AR) is a technology in which a user's view of the real world is enhanced or augmented with additional information generated from a computer model. Using AR technology, users can interact with a combination of real and virtual objects in a natural way. This paradigm constitutes the core of a very promising new technology for many applications. However, before it can be applied successfully, AR has to fulfill very strong requirements including precise calibration, registration and tracking of sensors and objects in the scene, as well as a detailed overall understanding of the scene. We see computer vision and image processing technology play an increasing role in acquiring appropriate sensor and scene models. To balance robustness with automation, we integrate automatic image analysis with both interactive user assistance and input from magnetic trackers and CADmodels. Also, in order to meet the requirements of the emerging global information society, future humancomputer interaction will be highly collaborative and distributed. We thus conduct research pertaining to distributed and collaborative use of AR technology. We have demonstrated our work in several prototype applications, such as collaborative interior design, and collaborative mechanical repair. This paper describes our approach to AR with examples from applications, as well as the underlying technology. 1. Introduction Augmented reality (AR) is a technology in which a user's view of the real world is enhanced or augmented with additional information generated from a computer model. The enhancement may consist of virtual artifacts to be fitted into the environment, or a display of nongeometric information about existing real objects. AR allows a user to work with and examine real 3D objects, while receiving additional information about those objects or the task at hand. By exploiting people's visual and spatial skills, AR brings information into the user's real world

Upload: afnan-siddique

Post on 17-Dec-2015

68 views

Category:

Documents


5 download

DESCRIPTION

Comparison between both technologies.

TRANSCRIPT

  • Publishedin

    PRESENCE:TeleoperationsandVirtualEnvironments

    SpecialIssueonAugmentedReality

    Vol.6,No.4,August1997,pp.433451

    ConfluenceofComputerVisionandInteractiveGraphicsforAugmentedReality

    GudrunJ.Klinker,KlausH.Ahlers,DavidE.Breen,PierreYvesChevalier,ChrisCrampton,DouglasS.Greer,DieterKoller,AndreKramer,EricRose,MihranTuceryan,RossT.Whitaker

    EuropeanComputerIndustryResearchCentre(ECRC)

    Arabellastrae17,81925Munich,Germany

    Abstract

    Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwithadditionalinformationgeneratedfromacomputermodel.UsingARtechnology,userscaninteractwithacombinationofrealandvirtual objects in a natural way. This paradigm constitutes the core of a very promising new technology for manyapplications.However,beforeitcanbeappliedsuccessfully,ARhastofulfillverystrongrequirementsincludingprecisecalibration,registrationandtrackingofsensorsandobjectsinthescene,aswellasadetailedoverallunderstandingofthescene.

    Weseecomputervisionandimageprocessingtechnologyplayanincreasingrole inacquiringappropriatesensorandscenemodels.Tobalancerobustnesswithautomation,weintegrateautomaticimageanalysiswithbothinteractiveuserassistanceandinputfrommagnetictrackersandCADmodels.Also,inordertomeettherequirementsoftheemergingglobal information society, future humancomputer interactionwill be highly collaborative and distributed.We thusconductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Wehavedemonstratedourworkinseveral prototype applications, such as collaborative interior design, and collaborativemechanical repair. This paperdescribesourapproachtoARwithexamplesfromapplications,aswellastheunderlyingtechnology.

    1. Introduction

    Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwithadditionalinformationgeneratedfromacomputermodel.Theenhancementmayconsistofvirtualartifactstobefittedintotheenvironment,oradisplayofnongeometricinformationaboutexistingrealobjects.ARallowsausertoworkwithandexaminereal3Dobjects,whilereceivingadditionalinformationaboutthoseobjectsorthetaskathand.Byexploitingpeople'svisualandspatial skills,ARbrings information into theuser'srealworld

  • ratherthanpullingtheuserintothecomputer'svirtualworld.UsingARtechnology,userscanthusinteractwithamixedvirtualandrealworldinanaturalway.Thisparadigmforuserinteractionandinformationvisualizationconstitutesthecoreofaverypromisingnewtechnologyformanyapplications.However,realapplicationsimposeverystrongdemandsonARtechnologythatcannotyetbemet.Someofsuchdemandsarelistedbelow.

    Inordertocombinerealandvirtualworldsseamlesslysothatthevirtualobjectsalignwellwiththerealones,weneedveryprecisemodelsoftheuser'senvironmentandhowitissensed.Itisessentialtodeterminethelocationandtheopticalpropertiesoftheviewer(orcamera)andthedisplay,i.e.:weneedtocalibratealldevices,registerthemandallobjectsinaglobalcoordinatesystem,andtrackthemovertimewhentheusermovesandinteractswiththescene.

    Realisticmergingofvirtualobjectswitharealscenerequiresthatobjectsbehaveinphysicallyplausiblemannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects,theyarenotabletomovethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyotherobjectswhilealsocastingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetweenrealandvirtualobjects,theARsystemneedstohaveaverydetaileddescriptionofthephysicalscene.

    InordertocreatetheillusionofanARinterfaceitisrequiredtopresentthevirtualobjectswithahighdegreeofrealism,andtobuilduserinterfaceswithahighdegreeofimmersion.Convincinginteractionandinformationvisualizationtechniquesarestillverymucharesearchissue.Ontopofthat,formultiuserapplicationsinthecontextofARitisnecessarytoaddressthedistributionandsharingofvirtualenvironments,thesupportforusercollaborationandawareness,andtheconnectionbetweenlocalandremoteARinstallations.

    Weseecomputervisionandimageprocessingtechnologyalthoughstillrelativelybrittleandslowplayanincreasingroleinacquiringappropriatesensorandscenemodels.Ratherthanusingthevideosignalmerelyasabackdroponwhichvirtualobjectsareshown,weexploretheuseofimageunderstandingtechniquestocalibrate,register and track cameras and objects and to extract the threedimensional structure of the scene.To balancerobustnesswithautomation,weintegrateautomaticimageanalysiswithinteractiveuserassistanceandwithinputfrommagnetictrackersandCADmodels.

    InourapproachtoARwecombinecomputergeneratedgraphicswithalivevideosignalfromacameratoproduceanenhancedviewofarealscene,whichisthendisplayedonastandardvideomonitor.Wetrackusermotionandprovidebasicpointingcapabilitiesinformofa3Dpointingdevicewithanattachedmagnetictracker,asshowninFigure6.ThissufficesinourapplicationscenariostodemonstratehowARcanbeusedtoqueryinformationaboutobjectsintherealworld.Forthemanipulationofvirtualobjects,weusemousebasedinteractioninseveralrelated2Dviewsofthesceneonthescreen.

    WeconductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Consideringthegrowingglobalinformationsociety,weexpectanincreasingdemandforcollaborativeuseofhighlyinteractivecomputertechnologyovernetworks.Ouremphasisliesonprovidinginteractionconceptsanddistributiontechnologyforusers who collaboratively explore augmented realities, both locally immersed and remotely in the form of atelepresence.

    We have demonstrated our work in several prototype applications, such as collaborative interior design, andcollaborativemechanical repair.Thispaperdescribesour approach toARwith examples fromapplications, aswellastheunderlyingtechnology.

    2. PreviousWork

  • Research in augmented reality is a recent but expanding area of research.We briefly summarize the researchconductedtodate.BaudelandBeaudouinLafonhavelookedattheproblemofcontrollingcertainobjects(e.g.,cursorsonapresentationscreen)throughtheuseoffreehandgestures(Baudel&BeaudouinLafon,1993).Feineret al. have used augmented reality in a laser printermaintenance task. In this example, the augmented realitysystem aids the user in the steps required to open the printer and replace various parts (Feiner, MacIntre &Seligmann,1993).Wellnerhasdemonstratedanaugmentedrealitysystemforofficeworkintheformofavirtualdesktop on a physical desk (Wellner, 1993). He interacts on this physical desk both with real and virtualdocuments.Bajuraetal.haveusedaugmentedrealityinmedicalapplicationsinwhichtheultrasoundimageryofapatientissuperimposedonthepatient'svideoimage(Bajura,Fuchs&Ohbuchi,1992).Lorensenetal.useanaugmentedrealitysysteminsurgicalplanningapplications(Lorensen,Cline,Nafis,Kikinis,Altobelli&Gleason,1993).Milgram andDrascic et al. use augmented reality with computer generated stereo graphics to performtelerobotics tasks (Milgram, Zhai,Drascic&Grodski, 1993Drascic,Grodski,Milgram,Ruffo,Wong&Zhai,1993). Caudell andMizell describe the application of augmented reality to manual manufacturing processes(Caudell&Mizell,1992).Fournierhasposedtheproblemsassociatedwithilluminationincombiningsyntheticimageswithimagesofrealscenes(Fournier,1994).

    TheutilizationofcomputervisioninARhasdependedupontherequirementsofparticularapplications.Deeringhasexplored themethods required toproduceaccuratehigh resolutionheadtracked stereodisplay inorder toachieve subcentimeter virtual to physical registration (Deering, 1992). Azuma and Bishop, and Janin et al.describetechniquesforcalibratingaseethroughheadmounteddisplay(Azuma&Bishop,1994Janin,Mizell&Claudell,1993).GottschalkandHughespresentamethodforautocalibratingtrackingequipmentusedinARandVR(Gottschalk&Hughes,1993).GleicherandWitkinstatethattheirthroughthelenscontrolsmaybeusedtoregister3Dmodelswithobjectsinimages(Gleicher&Witkin,1992).Morerecently,BajuraandNeumannhaveaddressed the issueof dynamic calibration and registration in augmented reality systems (Bajura&Neumann,1995).Theyuseaclosedloopsystemwhichmeasurestheregistrationerrorinthecombinedimagesandtriestocorrectthe3Dposeerrors.Grimsonetal.haveexploredvisiontechniquestoautomatetheprocessofregisteringmedicaldatatoapatient'sheadusingsegmentedCTorMRIdataandrangedata(Grimson,LozanoPerez,Wells,Ettinger,White&Kikinis,1994Grimson,Ettinger,White,Gleason,LozanoPerez,Wells&Kikinis,1995).Inarelatedproject,Mellorrecentlydevelopedarealtimeobjectandcameracalibrationalgorithmthatcalculatestherelationshipbetween thecoordinatesystemsofanobject,ageometricmodel,and the imageplaneofacamera(Mellor,1995).UenoharaandKanadehavedevelopedtechniquesfortracking2Dimagefeatures,suchasfiducialmarksonapatient'sleg,inrealtimeusingspecialhardwaretocorrelateaffineprojectionsofsmallimageareasbetweenimages(Uenohara&Kanade,1995).Periaetal.usespecializedopticaltrackingdevices(calibratedplateswithLEDsattachedtomedicalequipment)totrackanultrasoundprobeandregisteritwithSPECTdata(Peria,Chevalier,FranoisJoubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995).Bettingetal.aswellasHenrietal.usestereodatatoalignapatient'sheadwithMRIorCTdata(Betting,Feldmar,Ayache&Devernay,1995Henri,Colchester,Zhao,Hawkes,Hill&Evans,1995).

    Someresearchershavestudiedthecalibrationissuesrelevanttoheadmounteddisplays(Bajura,Fuchs&Ohbuchi,1992Caudell&Mizell,1992Azuma&Bishop,1994Holloway,1994Kancherla,Rolland,Wright&Burdea,1995).Othershavefocusedonmonitorbasedapproaches(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers, 1995Betting, Feldmar,Ayache&Devernay, 1995Grimson, Ettinger,White,Gleason, LozanoPerez,Wells&Kikinis, 1995Henri,Colchester,Zhao,Hawkes,Hill&Evans, 1995Mellor, 1995Peria,Chevalier,FranoisJoubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995Uenohara&Kanade,1995).Bothapproachescanbesuitabledependingonthedemandsoftheparticularapplication.

    3. ApplicationScenarios

    We have developed a comprehensive system, GRASP, which we have used as the basis for our applicationdemonstrations.Thissectiondiscussestwoexamples.ThenextsectionsdescribeindetailtheGRASPsystemandtheresearchissuesthatwefocuson.

  • 1. CollaborativeInteriorDesign

    Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surroundedbytwovirtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtual

    lampoccludesthetable.

    Thescenariofortheinteriordesignapplicationassumesanofficemanagerwhoisworkingwithaninteriordesigneronthelayoutofaroom(Ahlers,Kramer,Breen,Chevalier,Crampton,Rose,Tuceryan,Whitaker&Greer,1995).Theofficemanagerintendstoorderfurniturefortheroom.Onacomputermonitortheyboth see a picture of the real room from the viewpoint of the camera. By interacting with variousmanufacturersoveranetwork,theyselectfurniturebyqueryingdatabasesusingagraphicalparadigm.Thesystemprovidesdescriptionsandpicturesoffurniturethatisavailablefromthevariousmanufactureswhohavemademodels available in their databases. Pieces or groups of furniture thatmeet certain reUsingGraphicsHardware inVolumeRenderingApplicationsquirements such as color,manufacturer, or pricemay be requested. The users choose pieces from this "electronic catalogue" and 3D renderings of thisfurnitureappearonthemonitoralongwiththeviewoftheroom.Thefurnitureispositionedusinga3Dmouse.Furniturecanbedeleted,added,andrearrangeduntil theusersaresatisfiedwiththeresult theyviewthesepiecesonthemonitorastheywouldappearintheactualroom.Astheymovethecameratheycanseethefurnishedroomfromdifferentpointsofview.

    Theuserscanconsultwithcolleaguesatremotesiteswhoarerunningthesamesystem.Usersatremotesitesmanipulatethesamesetoffurnitureusingastaticpictureoftheroomthatisbeingdesigned.Changesbyoneuserareseeninstantaneouslybyalloftheothers,andadistributedlockingmechanismensuresthatapieceoffurnitureismovedbyonlyoneuseratatime.Inthiswaygroupsofusersatdifferentsitescanworktogetheronthelayoutof theroom(seeFigure1).Thegroupcanrecorda listoffurnitureandthelayoutofthatfurnitureintheroomforfuturereference.

    2. CollaborativeMechanicalRepair

    Figure2.Augmentedengine.

    In the mechanical maintenance and repair scenario, a mechanic is assisted by an AR system whileexaminingandrepairingacomplexengine(Kramer&Chevalier,1996).Thesystempresentsavarietyofinformationtothemechanic,asshowninFigure2.Annotationsidentifythenameofparts,describetheirfunction, or present other important information like maintenance or manufacturing records. The userinteractswiththerealobjectinitsnaturalsettingwithapointingdevicemonitoredbythecomputer.Asthemechanicpointstoaspecificpartoftheengine,theARsystemdisplayscomputergeneratedlinesandtext (annotations) thatdescribe thevisiblecomponentsorgive theuserhints about theobject.Querieswiththepointingdeviceontherealworldobjectmaybeusedtoaddanddeleteannotationtags.Sincewealsotracktheengine,theannotationsmovewiththeengineasitsorientationchanges.Thelinesattachingtheannotationtagswiththeenginefollowtheappropriatevisiblecomponents,allowingtheusertoeasilyidentify the different parts as the viewof the engine changes.Themechanic can also benefit from theassistanceofaremoteexpertwhocancontrolwhatinformationisdisplayedonthemechanic'sARsystem.

    4. SystemInfrastructure

  • Figure3.TheGRASPsystemhardwareconfiguration.

    Figure4.TheGRASPsystemsoftwareconfiguration.

    TheGRASPsystemformsthecentralcoreofoureffortstokeepthegraphicsandvisualsceneinalignmentandtoprovidean interactive threedimensional interface (Ahlers,Crampton,Greer,Rose&Tuceryan,1994).Figure3showsaschematicoftheGRASPhardwareconfiguration.Theworkstationhardwaregeneratesthegraphicalimageanddisplaysitonahighresolutionmonitor.Ascanconvertertransformsthegraphicsdisplayedonthemonitorintoastandardvideoresolutionandformat.Thescanconverteralsomixesthisgeneratedvideosignalwiththevideosignalinputfromthecameravialuminancekeying.A6DOFmagnetictracker,whichiscapableofsensingthe three translational and the three rotational degrees of freedom, provides theworkstationwith continuallyupdated values for the position and orientation of the tracked objects, including the video camera and thepointing device. A frame grabber digitizes video images for processing within the computer during certainoperations.ThesoftwarehasbeenimplementedusingtheC++programminglanguage.AschematicdiagramofthesoftwarearchitectureisshowninFigure4.

    5. SpecificationandAlignmentofCoordinateSpaces

    Inordertoalignthevirtualandrealobjectsseamlessly,weneedveryprecisemodelsoftheuser'senvironmentandhow it is sensed. It is essential to calibrate sensors and display devices (i.e., to determine their locations andopticalproperties),toregisterallobjectsandinteractiondevicesinaglobalcoordinatesystem,andtotrackthemwhiletheuseroperatesinthescene.

    1. CalibrationofSensorsandVideoEquipment

    Duringtheinitialsetup,thecameracharacteristics,thelocationofthe6Dtrackerandtheeffectsofscanconversionandvideomixingmustbedetermined.Theseproceduresarereferredtoastheimage,camera,andtrackingcalibration (Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers, 1995).We nowdescribe several such techniques thatmix computer vision algorithmswith varying amounts ofmodelbasedinformationandinteractiveinputfromtheuser.

    1. ImageCalibration

    Oneof theessentialstepsofourARsystemis themixingof livevideo inputwithsyntheticallygeneratedgeometricdata.Whiletheliveinputiscapturedasananalogvideosignalbythecamerasystem,thesyntheticdataisrendereddigitallyandthenscanconvertedintoavideosignal.Inorderto align the two signals, we need to determine the horizontal and vertical positioning of therendered, scan converted image with respect to the camera image, as well as the relationshipbetweenthetwoaspectratios.

    Weuseasynthetictestimagethathastwomarkersinknownpositionstocomputefourdistortionparameters(2Dtranslationandscaling).Thetestimageisscanconvertedintoavideosignal.Forimage calibration purposes, we redigitize it and determine the location of the markers in the

  • grabbedimage.Thediscrepancybetweentheoriginallocationofthemarkersandtheirpositioninthe grabbed image determines the translational and scaling distortions induced by the scanconverter.Thisinteractiveimagecalibrationmethodaskstheusertoidentifythetwomarkersinthegrabbedimage.

    The GRASP system also provides an alternative, automatic routine to compute the distortionparameters.Algorithmically, it iseasier tofinda large,homogeneouslycoloredareainanimagethan the thin linesofamarker.Accordingly, theautomaticalgorithmusesadifferent test imagewhichcontainsoneblacksquare. It finds thedarkarea, fits four lines to itsboundariesand thusdetermines the corners of the square. Two of the corners suffice to determine the distortionparametersofthescanconverter.

    Thecomparisonofthetwoapproachesillustratesanimportantdistinctionbetweeninteractiveandautomatic algorithms: while humans work best with sharp line patterns to provide preciseinteractive input, automatic algorithms need to accommodate imprecision due to noise anddigitization effects and thus work better on thicker patterns. On the other hand, automaticalgorithmscandeterminegeometricpropertiesofextendedareasmorepreciselythanhumans,suchasthecenter,anedgeoracornerofanarea.Inconclusion,itisessentialtothedesignofasystemandtoitsuseinanapplicationthatvisualcalibrationaidesbechosenaccordingtotheirintendeduse.Thisisarecurringthemeinourwork.

    2. CameraCalibration

    Figure5.Thecameracalibrationgrid.

    Camera calibration is the process which calculates the extrinsic (position and orientation) andintrinsicparameters(focallength,imagecenter,andpixelsize)ofthecamera.Weassumethattheintrinsicparametersofthecameraremainfixedduringtheaugmentedrealitysession.Thecamera'sextrinsicparametersmaybetrackedandupdated.

    Tocomputethecamera'sintrinsicandextrinsicparameters,wepointthecameraataknownobjectinthescene,thecalibrationgridshowninFigure5.Thepositionofthegridand,inparticular,theposition of the centers of the butterfly markers on the grid are known within the 3D worldcoordinate system.We use themapping from these 3D object features to 2D image features tocalculate thecurrentvantagepointof thecameraand its intrinsic imagedistortionproperties. Inprinciple,eachmappingfroma3Dpointto2Dimagecoordinatesdeterminesarayinthescenethatalignstheobjectpointwiththefocalpointofthecamera.Accordingtothepinholecameramodel,several such rays from different object points intersect at the focal point and thus uniquelydeterminetheposeofthecamera,aswellasitsimagingproperties.Accordingly,wecandefineasystemofequationstocomputetheintrinsicandextrinsiccameraparametersusingamappingofobject points to image points andminimizingmeasurement errors. The details are described in(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).

    TheGRASPsystemprovidesaninteractivecameracalibrationroutine:Auserindicatesthecenterofallbutterflypatternswithamouseandlabelsthembytypingtheappropriatecodenameonthekeyboard.

  • Wealsouseanautomatic,computervisionbasedcameracalibrationalgorithm.Inthisapproach,weuseacalibrationboardthatshowsanarrangementof42blacksquaresonawhitebackground.Processing the imageat a coarse scale,wequicklydetermine thepositionsandextentsofblackblobsintheimage.Byfittingrectanglestothebloboutlinesatfinerscalesandmatchingthemleftto right and top to bottom to the squares of the calibrationboard,wedetermine the calibrationparametersofthecamera.

    3. MagneticTrackerCalibration

    AlthoughweemphasizeinthispapertheuseofcomputervisiontechniquesforAR,wedonotrelyexclusively on optical information. Complementarily, we also exploit magnetic trackingtechnology,aswell asother interactiveormodelbased input.The tracking systemconsistsofatransmitterandseveralreceivers(trackers)thatcanbeattachedtoobjects,camerasandpointersinthescene.Thetrackingsystemautomaticallyrelatesthe3Dpositionandorientationofeachtrackerto a tracking coordinate system in the transmitter box. It is the task of the tracker calibrationprocedure to determine where the tracking coordinate system resides with respect to the worldcoordinatesystemoftheARapplication.ThisisacriticalissuethatusuallydoesnotariseinVRapplicationssincesuchsystemsonlyneedtotrackrelativemotion.Yet,theabsolutepositioningandtrackingofobjectsanddeviceswithinarealworldcoordinateframeisofgreatestimportanceinARscenarioswhererealityisaugmentedwithvirtualinformation.

    At the beginning of each session, we calibrate the magnetic tracking system, relating its localcoordinatesystemtotheworldcoordinatesystem.Thisprocessiscurrentlyperformedinteractively,usingthesamecalibrationgridasforcameracalibration.Wedothisbydeterminingthelocationofat least three points on the calibration grid withmagnetic trackers. Since these points are alsoknownintheworldcoordinatesystem,wecanestablishasystemoflinearequations,relatingthetracked coordinates to the world coordinates and thus determining the unknown position andorientationparametersofthetracker(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).

    2. RegistrationofInteractionDevicesandRealObjects

    Inadditiontothesensingdevicesthatwerecalibratedintheprevioussection,scenesalsocontainphysicalobjectsthattheuserwantstointeractwithusing3Dinteractiondevices.Suchobjectsandgadgetsneedtoberegisteredwithrespecttotheworldcoordinatesystem.

    1. PointerRegistration

    Figure6.3Dpointingdevice.

    Currently,weusethemagnetictrackingsystemtoregisterandtrackthepositionofa3Dpointerinoursystem(seeFigure6).

    For thepointer registration,weneed todetermine theposition (offset) of the tipof apointer inrelationshiptoanattachedmagnetictracker.Ourprocedurerequirestheusertopointtothesame

  • pointin3Dspaceseveraltimes,usingadifferentorientationeachtimeforapointerthathasbeenattachedtooneofthetrackers.Foreachpick,thepositionandtheorientationofthetrackermarkwithinthetrackercoordinatesystemarerecorded.Theresultofthisprocedureisasetofpointsanddirections with the common property that the points are all the same distance from the single,pickedpointin3Dspaceandallofthedirectionsassociatedwiththepointsareorientedtowardthepicked point. From this information, we can compute six parameters defining the position andorientation of the pointing device, using a leastsquares approach to solve an overdeterminedsystemoflinearequations.

    2. ObjectRegistration

    Object registration is the process of finding the six parameters that define the 3D position andorientation,i.e.pose,ofanobjectrelativetosomeknowncoordinatesystem.Thisstepisnecessary,even when tracking objects magnetically, in order to establish the 3D relationship between amagneticreceiverandtheobjecttowhichitisfastened.

    We have studied two strategies for determining the 3D pose of an object (Whitaker,Crampton,Breen,Tuceryan&Rose,1995).Thefirstisacamerabasedapproach,whichreliesonacalibratedcamera tomatch 3D landmarks ("calibration points") on the object to their to projection in theimageplane.The secondmethoduses the3Dcoordinatesof thecalibrationpoints, as indicatedmanuallyusingthe3Dpointerwithmagnetictracking,inordertoinferthe3Dposeoftheobject.

    There has been extensive research in pose determination in the computer vision (Lowe, 1985Grimson,1990),butmostofthesetechniquesapplytoonlylimitedclassesofmodelsandscenes.Thefocusofthecomputervisionresearchistypicallyautomationandrecognition,featuresthatareinteresting,butnotessentialtoaugmentedvision.Inourwork,thelocationsoflandmarkpointsintheimagearefoundmanuallybyauserwithamouse.Weassumethatthepointsaremappedfromknownlocationsin3spacetotheimageviaarigid3Dtransformationandaprojection.

    Werepresent theorientationof theobjectas33 rotationmatrix,whichcreatesa linearsystemwith12unknowns.Eachpointgives2equations,and6pointsarenecessaryforauniquesolution.In practice we assume noise in the input data and use an overdetermined system with a leastsquaredsolutioninordertogetreliableresults.However,becauseweusea33rotationmatrix,, and treat each element as an independent parameter, this linear system does not guarantee anorthonormalsolutionsforthismatrix,anditcanproduce"nonrigid"rotationmatrices.Suchnonrigiditiescanproduceundesirableartifactswhenthesetransformationsarecombinedwithothersinthegraphicssystem.

    Orthonormality is enforced adding an additional penalty to the leastsquared solution,

    . This creates a nonlinear optimization problemwhichwe solve through gradientdescent. The gradient descent is initialized with the unconstrained (linear) solution, andconstrainedsolutionsaretypicalfoundin1015iterations.

    Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoarealmodelengineusinganimagebasedcalibration(a),butwhenthemodelisturnedandits

    movementstracked(b),thegraphicsshowthemisalignmentinthecamera'szdirection.

  • Despite goodpointwise alignment in the imageplane, the imagebased calibration canproducesignificanterrorinthedepthtermwhichisnotseeninthereprojectedsolutions.Forinstance,inthecase of the enginemodel shown in Figure 7(a), the imagebased approach can produce a rigidtransformationwhichmatcheslandmarkpointsintheimagetowithinabout2pixels.Yettheerrorinthezdirection(distancefromthecamera)canbeasmuchas23centimeters.Thiserrorbecomesevidentas theobject is turnedas inFigure7(b).Weattribute this errorprimarily toerror in thecamera calibration, andbetter cameramodels and calibration procedures are a topic of ongoingresearch.Becauseofsucherrorwehavedevelopedtheproceduredescribedinthenextsectionforcalibratingobjectswitha3Dpointingdevice.

    Theproblemhereistocomputetherigidtransformationbetweenasetof3Dpointpairs.Usingthe3Dpointerandseveralkeystrokestheuserindicatestheworldcoordinates(orsomeotherknowncoordinate system) of landmark points on the object. also gives rise to a linear system of 12unknowns.Forauniquesolution4pointsareneeded,butinmostcasesweusemorethan4pointsand solve for the leastsquares error. As with the imagebased object calibration, error in themeasurements can produce solutions that represent nonrigid transformations. Thus, the samenonlinearpenaltytermcanbeintroducedinorderproduceconstrainedsolutions.

    3. TrackingofObjectsandSensors

    Calibrationandregistrationrefertostationaryaspectsofascene.InageneralARscenario,however,wehavetodealwithdynamicscenechanges.Withtrackingwedenotetheabilityofoursystemtocopewiththosedynamicscenechanges.Thus,while thecalculationof theexternalcameraparametersandof theposeofanobjectare theresultsofcalibrationandregistration, trackingcanberegardedasacontinuosupdateof thoseparameters.Wearecurrentlyexploringandusing twoapproaches to tracking,magnetictracking,andopticaltracking.

    1. MagneticTracking

    Asamagnetictrackingdeviceweusethe6Dtracker"FlockofBirds"fromAscensionTechnologyCorporation. Receivers are attached to the camera and each potential moving object. Thesereceiverssensethesixdegreesoffreedom(threetranslationalandthreerotational)withrespecttoatransmitter,whoselocationisbeingkeptfixedinworldcoordinates.

    Initially, we have relied exclusively on this magnetic technology since the trackers providepositional and orientational updates at nearly realtime speeds and operatewell in a laboratorysetup.However,magnetic tracking is not practicable in large scale, realistic setups, because thetrackingdatacaneasilybecorruptedbyferromagneticmaterialsinthevicinityofthereceiverandbecausethetrackersoperateonlyinalimitedrange.Anotherdrawbackisthelimitedaccuracyofthesensorreadings.

    2. OpticalTracking

    Opticaltrackingmethodsarebasedondetectingandtrackingcertainfeaturesintheimage.Thesecanbelines,cornersoranyothersalientfeatures,whichareeasyandreliabletodetectintheimageandcanuniquelybeassociatedwithfeaturesofthe3Dworld.Ourtrackingapproachcurrentlyusesthecornersofsquaresattachedtoobjectsorwalls(seeFigure8)totrackamovingcamera.Oncethe

  • cameraparametersarerecovered,thescenecanbeaugmentedwithvirtualobjects,suchasshelvesandchairs(seeFigure9).

    Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigureshowsacornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

    Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingtheautomaticallytrackedcameraparameters.

    This scenario is relevant to many AR applications where a user moves in the scene and thuscontinuouslychangeshis(thecamera's)viewpoint.Weuseafixedworldcoordinatesystem,thusrecomputingthecameraparametersrelativetotheworldframeineachstep.Conversely,wecouldalso recompute the position of the world system relative to the camera frame, thus using anegocentricframeofrefererence.Theadvantageoftheformerapproachisthatwecanthusexploitcertainmotioninvariantswhichmakethetrackingproblemmuchsimpler.

    Weassumethatamodelofthesceneexistsandthatweareabletoadd"fiducialmarks",suchasblacksquares,tothescenetoaidthetrackingprocess.Thesquaresareregisteredinthe3Dscenemodel.Thus,inprinciple,thesamecameracalibrationtechniquesdescribedinsection5.1.2.canbeusedtodetermine,atanypointintime,thepositionofthecamerainscene.Yet,duringthetrackingphase, we need to pay particular attention to speed and robustness of the algorithms. To ouradvantage,wecanexploittimecoherenceofuseractions:usersmoveincontinuousmotions.Wecan benefit from processing results of previous images and from an adaptivemodel of the usermotiontopredictwherethetrackedfeatureswillappearinthenextframe.Wethusdonotneedtoperformthefullcameracalibrationprocedureoneverynewincomingimage.

    Itiswellknownthatreasoningaboutthreedimensionalinformationfromtwodimensionalimagesis error prone and sensitive to noise, a fact which has to be taken into account in any imageprocessingmethodusing realvideodata. Inorder tocopewith thisnoise sensitivityweexploitphysicalconstraintsofmovingobjects.Sincewedonothaveanyaprioriknowledgeaboutforceschangingthemotionofthecameraortheobjects,weassumenoforces(accelerations)andhenceaconstant velocity. In this case a generalmotion can be decomposed in a constant translationalvelocityofthecenterofmassoftheobject,andarotationwithconstantangularvelocityaroundanaxisthroughthecenterofmass(e.g.Goldstein,1980).Thisconstitutesoursocalledmotionmodel(seeFigure10).Sowedonotonlymeasure(estimate)thepositionandorientationofthecameraandmovingobjectsasinthecaseofmagnetictrackingbutalsotheirchangeintimewithrespecttoastationaryworldframe,i.e.theirtranslationalandangularvelocity.Thisisalsoreferredtoasmotionestimation.

    Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosearotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsence

    ofanyforces. denotestheworldcoordinateframe,and denotesthecameracoordinateframe.

    The motion parameters (translational and angular velocity according to the motion model) areestimated using timerecursive filtering based on Kalman Filter techniques (e.g. BarShalom &Fortmann, 1988Gelb, 1974),where the unknown accelerations are successfullymodeled as socalledprocessnoise, inorder to allow for changesof thevelocities.The timerecursive filtering

  • processenablessmoothmotionseveninthepresenceofnoisyimagemeasurements,andenablesapredictionmeasurementupdatestepforeachvideoframe.Thepredictionallowsareductionofthesearchspaceforfeaturesinthenextvideoimageandhencespeedsuptheprocess.

    A typical drawback of opticalmethods is based on the fact thatwewant to reason about threedimensionalinformationfromtwodimensionalimagemeasurements,whichcanleadtonumericalinstabilitiesifnotperformedcarefully.Ontheotherhandthereistheadvantageoftheimageofrealobjectsbeingalmostperfectlyalignedwiththerenderedcounterpartsincethealignmenterrorcanbe minimized in the image. Optical tracking approaches can hence be very accurate. Anotheradvantageofopticaltrackingisthatitisanonintrusiveapproach,sinceitoperatesjustonvisualinformation,anditisbasicallynotlimitedtoanyspatialrange.Itisfurthermoresomehownaturalsinceitisthewaymosthumanstrackobjectsandnavigatewithinanenvironment.

    4. ObjectInteraction

    Realistic immersion of virtual objects into a real scene requires that the virtual objects behave inphysicallyplausiblemannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects,theyarenotabletomovethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyotherobjectswhilealsocastingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetweenrealandvirtualobjects, theAugmentedRealitysystemneedstohaveaverydetaileddescriptionof thephysicalscene.

    1. Acquisitionof3DSceneDescriptions

    Figure11.ModifiedEngine.ThefactthattheuserhasremovedtheaircleanerisnotyetdetectedbytheARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

    The most straightforward approach to acquiring scene descriptions would suggest the use ofgeometricmodels, e.g.,CADdata.Given suchmodels, theAR systemneeds to align themwiththeirphysicalcounterpartsintherealscene,asdescribedinsection5.2.2.Theadvantageofusingsuch models is that they can easily serve as starting points for accessing highlevel, semanticinformationabouttheobjects,asisdemonstratedinthemechanicalrepairapplication.

    However,therearesomeproblemswiththisapproach.First,geometricmodelsarenotavailableinallcases.Forexample,interiorrestorationofoldbuildingstypicallyneedstooperatewithoutCADdata. Second, available models are not complete. Since models are abstractions of reality, realphysicalobjectstypicallyshowmoredetailthanisrepresentedinthemodels.Inparticular,genericscenemodelscannotfullyanticipatetheoccurrenceofnewobjects,suchascoffeemugsontables,carsorcranesonconstructionsites,users'hands,orhumancollaborators.Furthermore,thesystemneeds to account for the changing appearances of existing objects, such as buildings underconstructionorenginesthatarepartiallydisassembled(seeFigure11).Whenusersseesuchneworchangedobjectsinthescene,theyexpectthevirtualobjectstointeractwiththeseastheydowiththerestofthe(modeled)scene.

    Computer vision techniques can be used to acquire additional information from the particularsceneunderinspection.Althoughsuchinformationgenerallylackssemanticdescriptionsaboutthesceneandthuscannotbeuseddirectlytoaugmentrealitywithhigherlevelinformation,suchas

  • theelectricwiringwithinawall, itprovides theessentialenvironmentalcontext for the realisticimmersion of virtual objects into the scene. Thus, we expect future AR systems to use hybridsolutions,usingmodeldatatoprovidethenecessaryhighlevelunderstandingoftheobjectsthataremost relevant to the tasks performed, and enriching themodelswith automatically acquiredfurtherinformationaboutthescene.

    We are investigating how stateoftheart image understanding techniques can be used in ARapplications. One particular paradigm in computer vision, shape extraction, determines depthinformationassocalled2Dsketchesfromimages.Thesearenotfull3Ddescriptionsofthescenebutratherprovidedistance(depth)estimates,withrespecttothecamera,forsomeorallpixelsinanimage.Ongoingresearchdevelopstechniquestodetermineobjectshapefromstereoimages,frommotionsequences,fromobjectshading,fromshadowcasting,fromhighlightsandgloss,andmore.Itisimportanttoconsiderwhetherandhowsuchalgorithmscanbeusedcontinuously,i.e.,whiletheuserisworkinginthescene.Alternatively,thealgorithmscouldbeusedduringtheinitialsetupphase,gathering3DsceneinformationonceandcompilingaroughsketchofthescenethatthenneedstobeupdatedwithothertechniquesduringtheARsession.Yetotheroptionsinvolvetheuseofothersensingmodalitiesbesidescameras,suchaslaserrangescannersorsonarsensors.

    Thissectiondiscussestwoapproachesweareinvestigating.

    1. DenseShapeEstimatesfromStereoData

    Stereoisaclassicalmethodofbuildingthreedimensionalshapefromvisualcues.Itusestwocalibratedcameraswithtwoimagesofthescenefromdifferentvantagepoints.Usingstereo triangulation, the 3D location of dominant object features that are seen in bothimagescanbedetermined:ifthesamepointonanobjectisseeninbothimages,rayscastfromthefocalpointsofbothcamerasthroughthefeaturepositionsintheimagesintersectin3Dspace,determiningthedistanceoftheobjectpointfromthecameras.

    Shape from stereo has been studied extensively in the computer vision literature. Thechoiceof imagefeaturedetectionalgorithmsandoffeaturematchingalgorithmsbetweenimages is of critical importance.Depending on the type ofmethods and algorithms oneuses, shape from stereo may result in sparse depth maps or dense depth maps. For ourresearch,thegoalistousethecomputed3DshapeinformationintheARapplications.Inmostifnotallsuchscenarios,theavailabilityofdensemapsareneeded.Therefore,wehavetakenanexistingalgorithm(Weng,Huang&Ahuja,1989)tocomputeadensedepthmapwhich is used in the AR context. The camera geometry is obtained by calibrating bothcameras independentlyusingoneof thecameracalibrationmethodsdescribed in section5.1.

    Thedetailsofthestereoalgorithmaregiveninthepaper(Weng,Huang&Ahuja,1989).Insummary,theheartofthealgorithmliesinthecomputationofthedisparitymap (du,dv)whichdescribesthedistancebetweenmatchedpointsinbothimages.Thisisaccomplishedby computing matches between four kinds of image features derived from the originalimages: smoothed intensity images, edge magnitudes, positive corners, and negativecorners. The positive and negative corners separate the contrast direction at a corner.Distinguishing between these four feature types improves the matching results bypreventing that incompatible image features are matched between the images, such aspositiveandnegativecorners.

  • The overall algorithm iteratively determines the (locally) bestmatch between the imagefeatures that have been computed in both images. Starting with an initial hypotheticalmatch,thematchesareiterativelychangedandimproved,minimizinganenergyfunctionwhichintegratesovertheentireimagetheinfluenceofseveralerrortermsrelatedtothequalityoftheedgematchesbetweentheleftandrightimage,aswellasasmoothnesstermwhichensuresthattherecoveredsurfaceisnotexceedinglyroughandnoisy.

    Figure12showsapairofstereoimages.ThedisparitymapscomputedfromtheseimagesareshowninFigure13andthedepthmapisshowninFigure14(a).Finally,Figure14(b)showshowthecomputeddepthmapisusedtooccludethreevirtualfloatingcubes.

    Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

    Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows(du)and(b)disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

    Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.Thebrighterpointsarefartherawayfromthecamera.(b)Thecomputeddepthmapin(a)isusedtooccludethevirtualobject(inthiscaseacube)whichhasbeenaddedinthescene.

    2. ShapefromShading

    Complementary to geometric shape extraction methods, some approaches exploit thephotometric reflection properties of objects. An image of a smooth object with uniformsurface reflectance properties exhibits smooth variations in the intensity of the reflectedlight referred toasshading.This information is usedbyhuman andother natural visionsystemstodeterminetheshapeoftheobject.Thegoalinshapefromshadingistoreplicatethis process to the point of being able to design an algorithm that will automaticallydeterminetheshapeofasmoothobjectfromitsimage(Horn&Brooks,1989).

    Thisshapeinformationcanbeusedinanumberofapplicationareaswhereknowledgeofthe spatial characteristics in a scene is important. In particular, shape from shadinginformationcan fill thegaps in sparsedepthmaps that are left bygeometrybased shapeextraction methods. Geometric extraction works best on highly textured objects wheremanyfeaturescanbematchedbetweenimages.Shapefromshading,ontheotherhand,canpropagateshapeinformationintohomogeneousareas.

    Weareinvestigatinghowthesecondderivative,orhessian,ofasmoothobjectsurfacecanbedetermineddirectlyfromshadinginformation.Themethodofcharacteristicstripswhichisoftenused for calculating shape fromshading (Horn,1986), is set in the frameworkofmoderndifferentialgeometry.Weextendthismethodtocomputethesecondderivativeoftheobjects surface, independently from the standard surfaceorientationcalculation.Thisindependently derived information can be used to help classify critical points, verifyassumptions about the reflectance function and identify effectively impossible images(Greer&Tuceryan,1995).

  • 2. MixingofRealandVirtualWorlds

    Onceappropriatescenedescriptionshavebeenobtainedinteractivelyorautomatically,theyformthebasisformixingrealandvirtualworlds.Sincethemixingmustbeperformedatinteractiverates,great emphasis has to be placed on efficiency. Depending on the representation of the scenedescriptions,differentoptionscanbepursued.

    Ifthescenedescriptionisavailableasageometricmodel,wecanhandthecombinedlistofrealandvirtualmodelstothegeometricrendererwhichwillthencomputetheinteractionsbetweenrealandvirtual objects for us. By renderingmodels of real objects in black, we can use the luminancekeyingfeatureofthevideomixertosubstitutetherespectiveareawithlivevideodata.Asaresult,theuserseesapictureonthemonitorthatblendsvirtualobjectswithlivevideo,whilerespecting3Docclusionrelationshipsbetweenrealandvirtualobjects.

    Thisisastraightforwardapproachinapplicationswheregeometric,polygonalscenedescriptionsareavailable.Ifthedescriptionsarecomputedasdepthmaps,asdescribedinsection6.1,thedepthmapsstillneedtobeconvertedintoageometricrepresentation,bytessellatinganddecimatingthedata(Schroeder,Zarge&Lorensen,1992Turk,1992).

    Alternatively, we can sidestep the tessellation and rerendering phases for real objects byinitializingtheZbufferofthegraphicshardwarewiththedepthmap(Wloka&Anderson,1995).Occlusion of the virtual objects is then performed automatically. When the virtual object isrendered,pixelsthatarefurtherawayfromthecamerathantheZvaluesinthedepthmaparenotdrawn.Bysettingthebackgroundcolortoblack,therealobjectspresentintheoriginalvideoaredisplayed in these unmodified pixels. Figure 14(a) presents three virtual cubes occluded by awoodenstandwithanengineandoccludingtheotherobjectsinarealroom,usingthedepthbasedapproach.

    These approaches have advantages and disadvantages, depending on the application. Full 3Dgeometricmodelsarebestforrealtimemovementofcameras.Polygonalapproximationstodepthmapscanbeusedoveracertain rangeofcamerapositionssince thesynthesizedscenemodel isrerendered when the camera moves. Copying the depth maps directly into the Zbuffer is thehardest approach: the map needs to be recomputed after each camera motion because the newprojectivetransformation"shifts"alldepthvaluesinthedepthmap.Thus,thisapproachonlyworkswithstationarycamerasorwithshapeextractionalgorithmsthatperformatinteractivespeeds.

    Ontheotherhand,thegeometricmodelingapproachsuffersfromaninherentdependenceonscenecomplexity. If thesceneneeds toberepresentedbyavery largepolygonalmodel, therenderingtechnologymaynotbeabletoprocessitinrealtime.Incontrast,thesizeofadepthmapdoesnotdepend on scene complexity.Which approach to use in an application depends on the overallrequirementsandthesystemdesign.

    5. CollaborativeUseofAR

    So far we were discussing techniques and solutions that make AR "work" for the single user. Objectmodeling,objectinteraction,realisticdisplayandimmersiveinterfacesallservetopresenttheuserwithaconsistentandcoherentworldofrealandvirtualobjects.

  • Whenweconsidertheapplicationscenariosdescribedaboveweareremindedofthefactthatinanyvirtualor real environment it appears natural to encounter other persons and to interact with them. VirtualenvironmentsareapromisingplatformforresearchintheCSCWarea,anddistributedmultiuserinterfacesareachallengeformanyVEsystems(e.g.theeffortsrelatedtotheVRMLproposal(Bell,Parisi&Pesce,1995)). In the context of the GRASP system, we are interested in the problem and the paradigms ofdistributedAR.Weareinvestigatingsolutionsintheareaofdistributedcomputingandexperimentwithsystemarchitecturesforcollaborativeinterfacestosharedvirtualworlds.

    1. ArchitectureforSharedAR

    Eachsystemsupportingmultiuservirtualenvironmentscanbecharacterizedbythedegreeortypeof concurrency, distribution, and replication in the system architecture (Dewan, 1995). Sharingbetweenusershastobebasedonseparabilityintheuserinterface:wecallthedatabaseofsharedlogicalobjects the "model", andcreate "views" as a specific interpretationof themodel in eachinterface.TheneedforrapidfeedbackintheuserinterfacemakesareplicatedarchitectureattractiveforAR.Thisinturnleadstoobjectlevelsharingwhereeachusercanviewandmanipulateobjectsindependently. It is necessary to manage the shared information so that simultaneous andconflictingupdatesdonot lead to inconsistent interfaces.This isguaranteedby thedistributioncomponentinourapplications.

    The model replication and distribution support allow the user interfaces of one application toexecute as different processes on different host computers. GRASP interfaces are not multithreaded,sothedegreeofdistributioncorrespondstothedegreeofconcurrencyinthesystem.Theresultingarchitecturewasimplementedandsuccessfullyusedintheinteriordesigndemonstration.

    2. ProvidingDistribution

    The replicated architecture is directly supported by the Forecast library of the GRASP system.Basedonamessagebusabstraction,Forecastprovidesaneasy,reliable,anddynamicapproachtoconstructingdistributedARapplications.

    Centraltothissupportisaonetomanyreliablecommunicationfacilitywhichcanbedescribedasadistributedextensionofahardwaresystembus.Components,situatedondifferentmachines,candynamically connect to the same distributed bus and send and receive messages over it. ThisanalogyhasbeenusedbeforeforgroupcommunicationorbroadcastsystemsanditsmessagingandselectioncapabilityarecommontosystemssuchasLindaandSun'sToolTalk(Sunsoft,1991).

    TheForecastmessagebus implementsaonetomanyFIFO(first in firstout)multicast transportprotocol.Aspecialsequencerprocessisusedtoimposeauniqueglobalorderingonmessages.Inthesimplerformoftheprotocol,nodesthatwishtobroadcastsendtheirmessagetothesequencerwhich thenuses theonetomany reliableprotocol todisseminate themessage.Auniqueglobalorderisimposedonthemessagestreamssinceallmessagespassthroughthesequencer.Nodescandetecthowtheirmessageswerescheduledbylisteningtotheglobalmessagestream.TheprotocolissimilartotheAmoebaereliablemulticastprotocol(Kaashoek&Tanenbaum,1992),exceptthatitusesreliablebufferedtransmissionbetweennodesandthesequencernodeattheexpenseofextraacknowledgments.

  • Wechoose themessagebusabstractionbecause itprovides location, invocationand replicationtransparency for applications (Architecture Projects Management, 1989) which makes theprogramming of these applications easier.GRASPprogrammers are familiarwith the concept ofmultiplelocalviewsandevents,bothofwhichwehaveextendedtoourdistributedsetting.

    TheForecastmessage bus is usedwithin our two collaborativeARdemonstrators to implementmodel replication, direct interaction between components (e.g., to send pointer trackinginformationtoremoteparticipants),andalsousinggenericfunctionslikefloorcontrolandlocking,state transfer, shared manipulators, video transmission (based on theMBONE audio and videolibrary (Macedonia&Brutzman,1994),andsynchronizationbetweenvideoand trackingevents(usingRTPstyletimestamps).

    6. Discussion

    UsingAugmentedRealityinrealisticapplicationsrequiresthecomputertobeverywellinformedaboutthe3Dworldinwhichusersperformtheirtasks.Tothiseffect,ARsystemsusevariousdifferentapproachesto obtain, register and track object and scene models. Of particular importance are different sensingdevices, such as cameras or magnetic trackers. They provide the essential realtime link between thecomputer'sinternal,"virtual"understandingoftheworldandreality.Allsuchsensorsneedtobecalibratedcarefullysothattheincominginformationisinalignmentwiththephysicalworld.

    SensorinputisnotusedtoitsfullpotentialincurrentARsystemsduetorealtimeconstraints,aswellasduetothelackofalgorithmsthatinterpretsignalsorcombineinformationfromseveralsensors.Researchfields such as computer vision, signal processing, pattern recognition, speech processing, etc. haveinvestigatedsuchtopicsforsometime.Somealgorithmsarematuringsothatconsideringtheprojectedannual increases in computer speed it should soon become feasible to consider their use in ARapplications. In particular, many applications operate under simplified (engineered) conditions so thatsceneunderstandingbecomesaneasiertaskthanthegeneralComputerVisionProblem(see,forexample(Marr,1980)).

    Weoperateat thisborderlinebetweencomputervisionandAR, injectingasmuchautomation into theprocessasfeasiblewhileusinganengineeringapproachtowardssimplifyingthetasksofthealgorithms.Inthisrespect,weemphasizethehybriduseofvariousdifferenttechniques,includinginteractiveuserinputwhereconvenient,aswellasothersensingmodalities(magnetictrackers).Thispaperhasshownhowwehave developed and explored different techniques to address some of the important AR issues. Ourpragmatic approach has allowed us to build several realistic demonstrations. Conversely, theseapplicationsinfluenceourresearchfocus,indicatingclearlythediscrepancybetweenthestateoftheartand what is needed. Tradeoffs between automation and assistance need to be further explored. Userinteraction should be reserved as much as possible to the highlevel control of the scene and itsaugmentationwithsyntheticinformationfrommultimediadatabases.Moresensingmodalitiesneedtobeexploredwhichwillallowtheusertointeractwiththecomputerviamorechannels,suchasgestureandsound.Experimentationwithheadmounted,seethroughdisplaysiscrucialaswellespeciallyinregardtothequestionwhetherandhowtheARsystemcanobtainopticalinputsimilartowhattheuserseessothatcomputervisiontechniquescanstillbeused.Theforemostconcern,however,remainstheprovisionoffast,realtimeinteractioncapabilitieswithrealandvirtualobjectsintegratedseamlesslyinanaugmentedworld.Tothisend,theaccuratemodeling,trackingandpredictionofuserorcameramotionisessential.

    ArelatedresearchdirectionleadsustoinvestigatethecollaborativeuseofAugmentedReality.Asreportedinthispaper,wehavedevelopedadistributedinfrastructuresothatallourdemonstrationscan

  • operateinacollaborativesetting.WeconsiderthecollaborativeuseofARtechnologytobeakeyinteractionparadigmintheemergingglobalinformationsociety.Thehighlyinteractive,visualnatureofARimposeshardrequirementsonthedistributedinfrastructure,anddemandsthedevelopmentofappropriatecollaborationstyles.

    AugmentedReality, especially in a collaborative setting, has the potential to providemuch easier andmoreefficientuseofhumanandcomputerskillsbymergingthebestcapabilitiesofboth.Consideringtherapid researchprogress in this field,we expect futuristic scenarios like collaborative interior design, orjointmaintenanceandrepairofcomplexmechanicaldevicestosoonbecomerealityfortheprofessionaluser.

    Acknowledgments

    ThisworkwasfinanciallysupportedbyBullSA,ICLPLC,andSiemensAG.WewouldliketothankthedirectorofECRC,AlessandroGiacalone,formanystimulatingdiscussionsregardingpotentialapplicationscenariosfordistributed,collaborativeAugmentedReality.ManycolleaguesatECRC,especiallyStefaneBressanandPhilippeBonnet,contributedsignificantly to thesuccessful implementationandpresentationof the InteriorDesignandMechanicalRepair demonstrations, providing other key pieces of technology (data base access) thatwere notdiscussedinthispaper.

    References

    Ahlers, K.H., Crampton, C., Greer, D., Rose, E., & Tuceryan, M. (1994). Augmented vision: A technicalintroductiontotheGRASP1.2system.TechnicalReportECRC9414,http://www.ecrc.de.

    Ahlers,K.H.,Kramer,A.,Breen,D.E.,Chevalier,P.Y.,Crampton,C.,Rose,E.,Tuceryan,M.,Whitaker,R.T.,&Greer,D.(1995).Distributedaugmentedrealityforcollaborativedesignapplications.Proc.Eurographics95.

    ArchitectureProjectsManagement.(1989).ANSA:AnEngineersIntroductiontotheArchitecture.APMLimited,PoseidonHouse,CambridgeCB3ORD,UnitedKingdom,Nov.

    Azuma,R.,&Bishop,G. (1994). Improving static and dynamic registration in an optical seethroughdisplay.ComputerGraphics,July,194204.

    Bajura,M., Fuchs, H., &Ohbuchi, R. (1992).Merging virtual objects with the real world: Seeing ultrasoundimagerywithinthepatient.ComputerGraphics,July,203210.

    Bajura,M.,&Neumann,U. (1995).Dynamic registrationcorrection inaugmentedrealitysystems. Proc. of theVirtualRealityAnnualInternationalSymposium(VRAIS95),189196.

  • BarShalom,Y.,&Fortmann,T.E.(1988).TrackingandDataAssociation.AcademicPress,NewYork.

    Baudel, M., & BeaudouinLafon, M. (1993). Charade: Remote control of objects using freehand gestures.CommunicationsoftheACM,37(7),2835.

    Bell, G., Parisi, A., & Pesce, M. (1995). The virtual reality modeling language, version 1.0 specification.http://vrml/wired.com/vrml.tech/

    Betting,F.,Feldmar,J.,Ayache,N.,&Devernay,F.(1995).Aframeworkforfusingstereoimageswithvolumetricmedicalimages.Proc.of the IEEEConferenceonComputerVision,VirtualRealityandRobotics inMedicine(CVRMed95),3039.

    Caudell,T.,&Mizell,D.(1992).Augmentedreality:Anapplicationofheadsupdisplaytechnologytomanualmanufacturingprocesses.Proc.oftheHawaiiInternationalConferenceonSystemSciences,659669.

    Deering,M.(1992).Highresolutionvirtualreality.ComputerGraphics,26(2),195202.

    Dewan,P.(1995).Multiuserarchitectures.Proc.EHCI95.

    Drascic, D., Grodski, J.J., Milgram, P., Ruffo, K., Wong, P., & Zhai, S. (1993). Argos: A display system foraugmentingreality.Formalvideoprogramandproc.oftheConferenceonHumanFactorsinComputingSystems(INTERCHI93),521.

    Feiner,S.,MacIntyre,B.,&Seligmann,D.(1993).Knowledgebasedaugmentedreality.CommunicationsoftheACM,36(7),5362.

    Fournier, A. (1994). Illumination problems in computer augmented reality. Journe INRIA, Analyse/SynthseDImages,Jan,121.

    Gelb,A.(ed.)(1974).AppliedOptimalEstimation.MITPress,Cambridge,MA.

    Gleicher,M.,&Witkin,A.(1992).Throughthelenscameracontrol.ComputerGraphics,July,331340.

    Goldstein,H.(1980).ClassicalMechanics,AddisonWesley,Reading,MA.

    Gottschalk, S., & Hughes, J. (1993). Autocalibration for virtual environments tracking hardware. ComputerGraphics,Aug.,6572.

  • Greer,D.S.,&Tuceryan,M.(1995).Computingthehessianofobjectshapefromshading.TechnicalreportECRC9530,http://www.ecrc.de.

    Grimson,W.E.L.,Ettinger,G.J.,White,S.J.,Gleason,P.L.,LozanoPerez,T.,Wells,W.M.III,&Kikinis,R.(1995).Evaluatingandvalidatinganautomatedregistrationsystemforenhancedrealityvisualizationinsurgery.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),312.

    Grimson,W.E.L.,LozanoPerez,T.,Wells,W.M.III,Ettinger,G.J.,White,S.J.,&Kikinis,R.(1995).Anautomaticregistrationmethodforframelessstereotaxy,imageguidedsurgery,andenhancedrealityvisualization.Proc. oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),430436.

    Grimson,W.E.L.(1990).ObjectRecognitionbyComputer.MITPress,Cambridge,MA.

    Henri,C.J.,Colchester,A.C.F.,Zhao,J.,Hawkes,D.J.,Hill,D.L.G.,&Evans,R.L.(1995).Registrationof3Dsurfacedataforintraoperativeguidanceandvisualizationinframelessstereotacticneurosurgery.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),4758.

    Holloway,R. (1994).AnAnalysis ofRegistrationErrors in a SeeThroughHeadMountedDisplay System forCraniofacialSurgeryPlanning.Ph.D.thesis,UniversityofNorthCarolinaatChapelHill.

    Horn,B.K.P.(1986).RobotVision.MITPress,Cambridge,MA.

    Horn,B.K.P.,andBrooks,M.J.(1989).ShapefromShading.MITPress,Cambridge,MA.

    Janin, A., Mizell, D., & Caudell, T. (1993). Calibration of headmounted displays for augmented realityapplications.Proc.oftheVirtualRealityAnnualInternationalSymposium(VRAIS93),246255.

    Kaashoek,M.F.,&Tanenbaum,A.S. (1992). FaultTolerance usingGroupCommunication.Operating SystemsReview.

    Kancherla, A.R, Rolland, J.P.,Wright, D.L.,& Burdea, G. (1995). ANovel Virtual Reality Tool for TeachingDynamic 3D Anatomy. Proc. of the IEEE Conference on Computer Vision, Virtual Reality and Robotics inMedicine(CVRMed95),163169.

    Kramer, A., & Chevalier, P.Y. (1996). Distributing augmented reality. Submitted to Virtual Reality AnnualInternationalSymposium(VRAIS96).

    Lorensen, W., Cline, H., Nafis, C., Kikinis, R., Altobelli, D., & Gleason, L. (1993). Enhancing reality in theoperatingroom.Proc.oftheIEEEConferenceonVisualization,410415.

  • Lowe,D.(1985).PerceptualOrganizationandVisualRecognition.KluwerAcademic,Norwell,MA.

    Macedonia, M.R., & Brutzman, D.P. (1994). MBONE provides audio and video across the internet. IEEEComputer,April.

    Marr,D.(1980).Vision:AComputationalInvestigationintotheHumanRepresentationandProcessingofVisualInformation.Freeman,SanFrancisco.

    Mellor, J.P. (1995). Realtime camera calibration for enhanced reality visualizations. Proc. of the IEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),471475.

    Milgram, P., Zhai, S., Drascic, D., & Grodski, J.J. (1993). Applications of augmented reality for humanrobotcommunication.Proc.oftheInternationalConferenceonIntelligentRobotsandSystems(IROS93),14671472.

    Peria,O.,Chevalier,L.FranoisJoubert,A.,Caravel,J.P.,Dalsoglio,S.,Lavallee,S.,&Cinquin,P.(1995).Usinga3DpositionsensorforregistrationofSPECTandUSimagesofthekidney.Proc.of the IEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),2329.

    Schroeder,W.,Zarge,J.&Lorensen,W.(1992).Decimationoftrianglemeshes.ComputerGraphics,26(2),6570.

    SunSoft(1991).TheTooltalkService.Technicalreport,SunSoft,June.

    Tuceryan, M., Greer, D., Whitaker, R., Breen, D., Crampton, C., Rose, E., & Ahlers, K. (1995). Calibrationrequirementsandproceduresforamonitorbasedaugmentedrealitysystem.IEEETransactionsonVisualizationandComputerGraphics,1,255273.

    Turk,G.(1992).Retilingpolygonalsurfaces.ComputerGraphics,26(2),5564.

    Uenohara,M.&Kanade,T.(1995).Visionbasedobjectregistrationforrealtimeimageoverlay.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),1322.

    Wellner,P.(1993).Interactingwithpaperonthedigitaldesk.CommunicationsoftheACM,36(7),8796.

    Weng,J.,Huang,T.S.,&Ahuja,N.(1989).Motionandstructurefromtwoperspectiveviews:Algorithms,erroranalysis,anderrorestimation.IEEETransactionsonPatternAnalysisandMachineIntelligence,11(5),451476.

    Whitaker,R.,Crampton,C.,Breen,D.,Tuceryan,M.,&Rose,E.(1995).Objectcalibrationforaugmentedreality.Proc.Eurographics95.

  • Wloka,M.&Anderson,B.(1995).Resolvingocclusioninaugmentedreality.Proc.oftheACMSymposiumonInteractive3DGraphics,512.

    TableofContents

    1.Introduction

    2.PreviousWork

    3.ApplicationScenarios

    3.1CollaborativeInteriorDesign

    3.2CollaborativeMechanicalRepair

    4.SystemInfrastructure

    5.SpecificationandAlignmentofCoordinateSpaces

    5.1CalibrationofSensorsandVideoEquipment

    5.1.1ImageCalibration

    5.1.2CameraCalibration

    5.1.3MagneticTrackerCalibration

    5.2RegistrationofInteractionDevicesandRealObjects

    5.2.1PointerRegistration

    5.2.2ObjectRegistration

    5.3TrackingofObjectsandSensors

    5.3.1MagneticTracking

    5.3.2OpticalTracking

    6.ObjectInteraction

    6.1Acquisitionof3DSceneDescriptions

    6.1.1DenseShapeEstimatesfromStereoData

    6.1.2ShapefromShading

    6.2MixingofRealandVirtualWorlds

    7.CollaborativeUseofAR

    7.1ArchitectureforSharedAR

    7.2ProvidingDistribution

    8.Discussion

  • ListofFigures

    Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surroundedbytwovirtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtuallampoccludesthetable.

    Figure2.Augmentedengine.

    Figure3.TheGRASPsystemhardwareconfiguration.

    Figure4.TheGRASPsystemsoftwareconfiguration.

    Figure5.Thecameracalibrationgrid.

    Figure6.3Dpointingdevice.

    Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoarealmodelengineusinganimagebasedcalibration(a),butwhenthemodelisturnedanditsmovementstracked(b),thegraphicsshowthemisalignmentinthecamera'szdirection.

    Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigureshowsacornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

    Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingtheautomaticallytrackedcameraparameters.

    Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosearotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsenceofany

    forces. denotestheworldcoordinateframe,and denotesthecameracoordinateframe.

    Figure11.ModifiedEngine.ThefactthattheuserhasremovedtheaircleanerisnotyetdetectedbytheARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

    Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

    Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows(du)and(b)disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

    Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.Thebrighterpointsarefartherawayfromthecamera.(b)Thecomputeddepthmapin(a)isusedtooccludethevirtualobject(inthiscaseacube)whichhasbeenaddedinthescene.