on-world computing - robert xiao

91
Robert Xiao Proposal – March 29, 2017 i On-World Computing Enabling Interaction on Everyday Surfaces Robert Xiao Dissertation Proposal Human-Computer Interaction Institute Carnegie Mellon University Pittsburgh, PA March 29 th , 2017

Upload: khangminh22

Post on 11-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

RobertXiaoProposal–March29,2017

i

On-WorldComputingEnablingInteractiononEverydaySurfaces

RobertXiao

DissertationProposal

Human-ComputerInteractionInstituteCarnegieMellonUniversity

Pittsburgh,PA

March29th,2017

RobertXiaoProposal–March29,2017

ii

AbstractComputersarenowubiquitous.However,computersanddigitalcontenthaveremainedlargelyseparatefromthephysicalworld–usersexplicitlyinteractwithcomputersthroughsmallscreensandinputdevices,andthe“virtualworld”ofdigitalcontenthashadverylittleoverlapwiththepractical,physicalworld.Mythesisworkisconcernedwithhelpingcomputingescapetheconfinesofscreensanddevices,tospilldigitalcontentoutintothephysicalworldaroundus.Inthisway,Iaimtohelpbridgethegapbetweentheinformation-richdigitalworldandthefamiliarenvironmentofthephysicalworldandallowuserstointeractwithdigitalcontentastheywouldordinaryphysicalcontent.Iapproachthisproblemfrommanyfacets:fromthelow-levelworkofprovidinghigh-fidelitytouchinteractiononeverydaysurfaces,easilytransformingthesesurfacesintoenormoustouchscreens;tothehigh-levelquestionssurroundingtheinter-actiondesignbetweenphysicalandvirtualrealms.Toachievethisend,buildingonmypriorwork,Iproposetwophysicalembodimentsofthisnewmixed-realitydesign:alightbulb-sizedinfobulbcapableofprojectinganinteractionzoneontoeverydayenvironments,andahead-mountedaugmented-realityhead-mounteddisplaymodifiedtosupporttouchinteractiononarbitrarysurfaces.

RobertXiaoProposal–March29,2017

iii

TableofContentsChapter1. Introduction.....................................................................................................1

1.1 Overview........................................................................................................................1

1.2 InputSensing..................................................................................................................1

1.3 InteractionTechniquesforOn-WorldInterfaces...........................................................2

1.4 DocumentStructure.......................................................................................................3

Chapter2. Background......................................................................................................5

2.1 ComputingontheWorld................................................................................................5

2.1.1 ProjectedinterfacesinandontheWorld..............................................................5

2.1.2 User-DefinedInterfaces.........................................................................................7

2.2 AugmentedDesktops.....................................................................................................7

2.3 On-WorldDisplay...........................................................................................................9

2.3.1 SpatialAugmentedReality.....................................................................................9

2.3.2 MobileAugmentedReality.....................................................................................9

2.3.3 Head-MountedAugmentedReality......................................................................10

2.4 TouchTrackingonLargeSurfaces................................................................................10

Chapter3. InitialExplorations..........................................................................................12

3.1 AccessingandInteractingwithInfrastructureintheWorld:EM-Sense.......................12

3.2 FacilitatingAcross-DeviceInteractionwithLargeDisplaysintheWorld:CapCam......14

3.3 FacilitatingAcross-World,Large-DisplayInteraction:UbiCursor.................................16

3.4 AdHocTouchSensingontheWorld:Toffee................................................................18

Chapter4. On-WorldProjectionandTouchSensing.........................................................21

4.1 Introduction..................................................................................................................21

4.2 Interaction....................................................................................................................22

4.2.1 TriggeringInterfacesandInterfaceDesign...........................................................23

4.3 SystemImplementation...............................................................................................23

4.3.1 HardwareandSoftwareBasics.............................................................................23

4.3.2 One-TimeProjector/DepthCameraCalibration.................................................24

4.3.3 BasicContactSensing...........................................................................................25

4.3.4 SoftwareStructures..............................................................................................26

4.3.5 InstantiatingInteractors.......................................................................................26

RobertXiaoProposal–March29,2017

iv

4.3.6 GeometryRectificationforInputandOutput......................................................27

4.3.7 InteractorLibrary..................................................................................................29

4.4 ExampleApplications...................................................................................................30

4.4.1 LivingRoom..........................................................................................................30

4.4.2 OfficeDoor...........................................................................................................30

4.4.3 OfficeDesk...........................................................................................................31

4.4.4 Kitchen..................................................................................................................31

4.5 Limitations....................................................................................................................34

4.6 Discussion.....................................................................................................................35

4.7 Futurework..................................................................................................................35

4.8 Conclusion....................................................................................................................36

Chapter5. EnablingResponsiveOn-WorldInterfaces.......................................................37

5.1 Introduction..................................................................................................................37

5.2 ElicitationStudy............................................................................................................38

5.3 DistillingInteractiveBehaviors.....................................................................................40

5.3.1 ApplicationLifecycle.............................................................................................40

5.3.2 LayoutControl......................................................................................................41

5.3.3 CohabitationBehaviors........................................................................................41

5.4 InteractiveBehaviorImplementations.........................................................................42

5.4.1 ConventionalInteractions....................................................................................43

5.4.2 Summoning..........................................................................................................43

5.4.3 Resizing.................................................................................................................44

5.4.4 Deleting................................................................................................................44

5.4.5 RepositioningandReorienting.............................................................................44

5.4.6 Snapping...............................................................................................................44

5.4.7 Following..............................................................................................................45

5.4.8 Detaching.............................................................................................................45

5.4.9 Evading.................................................................................................................45

5.4.10 Collapsing.............................................................................................................46

5.5 TechnicalImplementation............................................................................................46

5.5.1 Hardware..............................................................................................................47

RobertXiaoProposal–March29,2017

v

5.5.2 DisambiguatingObjectandHumanMovement...................................................47

5.5.3 TouchTracking.....................................................................................................48

5.5.4 HandlingIrregularSurfaces..................................................................................49

5.5.5 EdgeFinding.........................................................................................................49

5.5.6 Optimization-BasedLayout..................................................................................50

5.6 Conclusion....................................................................................................................51

Chapter6. RefiningOn-WorldTouchInput.......................................................................52

6.1 Summary......................................................................................................................52

6.2 Implementation............................................................................................................54

6.2.1 BackgroundModeling..........................................................................................55

6.2.2 InfraredEdgeDetection.......................................................................................56

6.2.3 IterativeFlood-FillSegmentation.........................................................................57

6.2.4 TouchPointExtraction.........................................................................................59

6.2.5 TouchContactDetection......................................................................................59

6.2.6 TouchTrackingProperties....................................................................................59

6.3 ComparativeTechniques..............................................................................................60

6.3.1 Single-FrameBackgroundModel.........................................................................60

6.3.2 MaximumDistanceBackgroundModel...............................................................61

6.3.3 StatisticalBackgroundModel...............................................................................61

6.3.4 SliceFindingandMerging....................................................................................61

6.4 Evaluation.....................................................................................................................62

6.4.1 Tasks.....................................................................................................................62

6.5 ResultsandDIscussion.................................................................................................64

6.5.1 Crosshair...............................................................................................................65

6.5.2 MultitouchSegmentation....................................................................................65

6.5.3 ShapeTracing.......................................................................................................66

6.6 Conclusion....................................................................................................................66

Chapter7. ProposedWork...............................................................................................68

7.1 ExploringEmbodiments...............................................................................................68

7.1.1 InfoBulb................................................................................................................68

7.1.2 WornAR...............................................................................................................69

RobertXiaoProposal–March29,2017

vi

7.2 RobustInputSensing....................................................................................................70

7.2.1 Host-Surface-BasedTouchTracking.....................................................................70

7.2.2 ImprovedHoverDisambiguation.........................................................................70

7.2.3 ReducingLatency..................................................................................................71

7.2.4 HandGestures......................................................................................................71

7.3 DevelopingOn-WorldApplications..............................................................................72

7.3.1 APIs&SDKs..........................................................................................................72

7.3.2 AppManagement.................................................................................................73

7.3.3 LabDeployment...................................................................................................73

Chapter8. References......................................................................................................74

RobertXiaoProposal–March29,2017

vii

ListofFiguresFigure3.1.EM-SensePhone“PrintDocument”contextualcharm..............................................12

Figure3.2.Examplefull-screenapplications................................................................................13

Figure3.3.CapCampairingandinteractionprocess....................................................................15

Figure3.4.CapCamairhockey.....................................................................................................16

Figure3.5.UbiCursorlow-resolutionfull-coveragedisplay(LRFC)..............................................17

Figure3.6.Toffeetaptrackingprocess........................................................................................19

Figure3.7.Toffee-enabledmusicplayer......................................................................................20

Figure4.1.SampleinteractioninWorldKit..................................................................................21

Figure4.2.AuserdefinesaninteractorinWorldKit....................................................................23

Figure4.3.AshortthrowprojectorwithmountedKinect...........................................................24

Figure4.4.WorldKittoucheventprocessing...............................................................................25

Figure4.5.SampleinteractorclassessupportedbyWorldKit.....................................................28

Figure4.6.Ausersetsupasimpleofficestatusapplicationonhisdoor.....................................31

Figure4.7.ExamplecodeforasinglebuttonWorldKitapplication.............................................32

Figure4.8.SimpleWorldKitofficeapplication.............................................................................33

Figure4.9.WorldKitKitchenapplication......................................................................................34

Figure5.1.Variousdigitallyaugmenteddesksfromtheacademicliterature..............................38

Figure5.2.Examplereal-worlddesksofourparticipants............................................................39

Figure5.3.Samplearrangementofthepaperprototypesintheelicitationstudy......................40

Figure5.4.Summoninganapplication.........................................................................................41

Figure5.5.Resizinganddeletinginteractions..............................................................................42

Figure5.6.Movingandsnappinginteractions.............................................................................43

Figure5.7.Followinganddetachinginteractions........................................................................45

Figure5.8.Evadingandcollapsinginteractions...........................................................................46

Figure5.9.Ourproofofconceptprojector-camerasystemfittedintoalampshade..................47

Figure5.10.TouchtrackingstepsinDesktopography.................................................................48

Figure5.11.Desktopographytrackingfingertipsonthedesk......................................................49

Figure6.1.Comparisonofdepth-camera-basedtouchtrackingmethods...................................53

Figure6.2.DIRECTsystemsetup..................................................................................................54

Figure6.3.Touchtrackingprocessforfivefingerslaidflatonthetable.....................................55

RobertXiaoProposal–March29,2017

viii

Figure6.4.Touchtrackingprocessforafingerangledat60ºvertically......................................56

Figure6.5.CannyedgedetectionontheIRimage.......................................................................57

Figure6.6.TasksperformedbyusersintheDIRECTstudy..........................................................62

Figure6.7.ToucherroranddetectionrateforDIRECTandcompetingmethods........................63

Figure6.8.Toucherrorafterposthocoffsetcorrection..............................................................64

Figure6.9.95%confidenceellipsesforcrosshairtask.................................................................64

Figure7.1.AnearlyprototypeoftheInfoBulbconcept...............................................................68

RobertXiaoProposal–March29,2017

ix

ListofTablesTable4.1.WorldKitinput-orientedinteractortypes....................................................................29

RobertXiaoProposal–March29,2017

1

CHAPTER1. INTRODUCTION

1.1 OverviewComputingisfinallyubiquitous,afterdecadesofworkincomputerscienceandengineering.To-day, this ubiquity comes in the form of highly sophisticated mobile computing devices likesmartphonesandlaptops.However,interactionswiththesedevicesareconfinedtotheirsmallscreens,limitingthesizeandsophisticationofpossibleinteractions.Morecritically,thecompu-tationalpowerisstrictlylimitedtothedigitalrealm,andcannotbeappliedtotheworldevenimmediatelyaroundthem.

Incontrast,thephysicalenvironmentoffersanexpansivecanvasforinteractions,enablinghighlyexpressive,comfortable,contextualandnaturalmeanstointeractwithcontentandotherpeople.Augmentingthephysicalenvironmentwithcomputationalcapabilitiesoffersawiderangeofpos-sibilities.Thegoal,then,istoallowcomputingtoescapethenarrowconfinesofthepresent-daysmallscreensanddevices,andspillinteractionoutontothesurfacesandspacesaroundus,sothatitmayaugmentoureverydayactivities.

Oneapproachtoachievethisgoalistoreplaceeverysurfaceintheenvironmentwithaninter-activecomputerscreen.Althoughthiswouldcertainlybringubiquitouscomputingintotheenvi-ronment,itwouldalsobeprohibitivelyexpensivetoinstallandintrusive,bothsociallyandaes-thetically.

Inthisthesis,Idescribemyeffortstoenable“world-scale”computationbyprojectinginteractivecontentontoeverydaysurfacesintheenvironment,allowinguserstointeractwithrelevant,con-textualinformationsituateddirectlyintheirworld,withouttheneedtophysicallyreplacepartsoftheenvironment.Iwillfocusontwoprimarychallengesthatneedtobeaddressedtoachievethisvision:1)inputsensingand2)interactiontechniquesforon-worldinterfaces.

1.2 InputSensingFirst, theworld-scale computermustbeable to senseuser input.Classically, special-purposecontrollerdevicessuchasmiceandkeyboardssenseduserinputusingdedicatedcircuitry.How-ever,thesedevicescanbe inappropriateforad-hocon-world input:whenanysurfacecanbeinteractive,itisnotidealtoasktheusertofindtheirphysicalon-worldkeyboardwhenevertheyneedtoentertext.Morerecently,theadventofnaturaluserinteractionhaspromotedtheuseofhandgestures,bodymovementsandspeechascontroller-freeinputtechniques.Usingafixedsensorobservingtheuser,theseinputtechniquesallowausertoprovideinputwithoutneedingtoholdoruseaseparatedevice.However,thesetechniquessufferfromalackofhaptic(physical)feedback,aswellaslimitedprecisionandhighfatigue,allofwhichprecludetheirprolongeduseforon-worldinteraction.

RobertXiaoProposal–March29,2017

2

Touchinterfaceshavebecomeubiquitousforsmallscreensduetothepopularityoftouchscreen-basedsmartphonesandtablets,andbecausetouch isanatural,expressivemodalityforcom-puterinput.Mosttouchinterfacesusespecially-designedpanelstosensetheelectricalorphysi-calcharacteristicsofatouchcontact,butaugmentingsurfaceswithtouchpanelsremainsexpen-siveandcanbeintrusivetoinstallinsomeenvironments.

Theintroductionofdigitalprojectorsandlow-costdepthcameratechnologiesraisesthepossi-bilityoftransformingtheseeverydaysurfacesintolarge,touch-sensitivecomputingexperiences.Whilefree-spacehandandfingertrackingresearchspansseveraldecadesofresearch,startingwithseminalworkbyKruegerintheVideoPlaceSystem[Krueger1985],comparativelylittlere-searchhasexamined fingerand touch trackingonordinary,unmodifiedsurfaces.Thiscanbeattributedtothedifficultchallengeoffirstsegmentingafingerfromthebackgroundtoextractitsspatialpositionandthenundertakingtheevenmorechallengingtaskofsensingwhenafingerhasphysicallycontactedasurface(vs.merelyhoveringclosetoit).

Theadventofinexpensivedepthcamerasofferedapromisingpotentialsolutionforaddressingthischallenge.EarlyworkbyWilsonetal.[Wilson2010a]demonstratedthepotentialofthisap-proachfordetectingtouchesonarbitrarysurfaces.Whilepriorapproacheshavedemonstratedthatdepth-basedtouchtrackingshouldbeviable,fullexplorationofthedesignspacerequiresinputsensing,whichexhibitshighstabilityandpositionalaccuracy,aswellasreliabletouchseg-mentation/detectionwithbothlowfalsepositivesandlowfalsenegatives.

Unfortunately,attainingthisveryhighaccuracystrainsthecapabilitiesofeventhelatestgener-ationofdepthcameras.Thedepthresolutionandnoisecharacteristicsofcurrentgenerationsen-sorsmeansthatfingertipssimplymergeintothesurfaceatdistancesneededtocoverreasonablysizedworksurfaces,makingprecisetouchtrackingextremelydifficult.Thishasmadeitchalleng-ingtomovebeyondthefirstproof-of-conceptresearchsystemstopracticaluseinrealdeploy-ments. InChapter4, Ipresentafirstproof-of-conceptsystemwhich implementsdepth-basedtouchtracking,while inChapter6, I refinethisearliersystem intoapractical,accurate touchtrackingsystemforon-worldcomputation.

Providingsolid,accuratetouchtrackingoneverydaysurfacesmakesitpossibletotransformanysurfaceintoalargetouchscreen,markingthefirststeptowardsatrueon-worldcomputationalsystem.Now,thesecondmajorchallengeisforuserstointeractwithcomputationalcontentonthesesurfaces,andforthesystemtointeractwithphysicalobjectsintheenvironment.

1.3 InteractionTechniquesforOn-WorldInterfacesInteractionwithon-worldcontentismarkedlydifferentfrominteractionwithatypicaldesktopcomputer.Onedifferenceisthattherecanbemanydifferentsurfacesintheenvironment,eachofwhichmighthostinteractivecontent.Thesystemmustberesponsibleforsensingthesesur-facesanddeterminingwhichsurfacesmaybesuitableforcontent(e.g.largeenough,flatenough,orientedcorrectly).Anappropriatesurfacethenneedstobeselectedforaninteractiveelement(e.g.applicationinterface),eithermanuallybytheuserusingasummoning,launchingorinstan-tiationinteractiontechnique,orautomaticallybythesystemusingalayoutalgorithmwhichmayaccountforpriorpositioning,userpreference,surfacecharacteristics,interfacesizeandshape

RobertXiaoProposal–March29,2017

3

needs,andsoon.InChapter4,Iexploremanualselectionandinstantiationtechniquesinthecontextofaprojectedon-worldinteractivesystem,whileinChapter5,Iexploreoptimization-basedtechniquesforautomaticallypositioninginterfacesintheenvironment.

Next,thesystemmustrespondtoandcoexistwithartifactsinthephysicalenvironment.Insteadoftheclean,perfectworldoftherectangulardisplay,anon-worldcomputationalsystemmustcontendwithawidevarietyofsurfacesandobjects,andwithsurfacesthatareoftenclutteredandmessy.Thisisoftenoverlookedinotherwork–inthefutureinterfacesenvisionedinmanypriorsystems,desks,tablesandworkspacesareclearofphysicalartifactslikekeyboards,mice,mugs, papers, knickknacks, and other contemporary and commonplace items. Furthermore,thesespacesareconstantlyinflux,withitemsmoving,stacking,appearinganddisappearingasusersinteractwiththem.

Omittingtheseitemsoftensimplifiestheimplementation,butmakesthemlesspracticallyappli-cabletoreal-worldsituations.Failingtoaccountfortheseobjectsmakesapplicationsandsys-temsbrittle–forexample,amugplacedatopavirtual,projectedinterfacemightinjectspurioustouchinput,orabookplacedoveraninterfacemightoccludeanddisablethesystem.Further,becausephysicalobjectscannotmoveorchangesizeontheirown,theburdenofresponsivenessfallstothedigitalelements.Thus,digitalapplicationsmustemployavarietyofstrategiestosuc-cessfullycohabitaworksurfacewithphysicalartifacts.InChapter5,Icontemplatestrategiesthatsystemscanusetoresponsivelyhandlechanges in theenvironment, includingtechniques forcoexistingwithphysicaldeskobjects.

1.4 DocumentStructureInthefollowingchapters,Iwillintroducemyresearchintosolvingeachoftheseareas,showingthatitisindeedpossibletodevelopasystemthatsupportstouchinteractionwithprojectedcon-tentonsurfacesintheenvironment.Chapter2providesanoverviewoftheliteratureandpriorworkdomainsthatintersectwithmyproposedwork.Chapter3chartsmyinitialexplorationsintoon-worldinterfaces,startingwithmyexplorationsintoenvironmentsensingandinteraction(3.1),andlarge-displayinteraction(3.2),andfinishingwithmorerelevantexplorationsintoon-worldprojection(3.3)andtouchsensing(3.4).Chapter4describesmyfirstcompleteon-worldinterac-tivesystem,WorldKit,designedtoexplorethedesignandimplementationofprojectedsensorsandinterfaces.WorldKitexaminesbothinputsensingandinteractiontechniques,andservesasastartingpointforfurtherexploration.InChapter5,Iexploreon-worldinteractiontechniquesingreaterdepth,seekingtobuildasystemthatcouldnaturallyinteractwithandrespondtothephysicalenvironmentinamorenuancedway.Chapter6addressestheinputsensingproblemwiththeDIRECTtouchtrackingsystem,whichprovidestouchtrackingoneverydaysurfacesthatisonparwithphysicaltouchscreens.

Finally,inChapter7,Iproposethedevelopmentoftwophysicalembodimentsofmywork.Thefirstembodimentisbasedonanaugmented-realityheadsetinwhichphysicalsurfacesarevirtu-allyaugmentedinthehead-mounteddisplay.Users,wearingahead-mounteddisplay,seevirtualcontentoverlaidonphysicalsurfaces,andcanreachoutandinteractwiththecontentusingthe

RobertXiaoProposal–March29,2017

4

DIRECTinputsensingpipeline.Thisprojectexplorestouchsensingfromtheperspectiveofady-namic,movinghead-mounteddisplay,andinteractiontechniquesthatcrosstheboundarybe-tweenon-surfaceandin-airinteractions,andbetween2Dand3Dcontent.Thesecondembodi-ment,aphysical“informationlightbulb”,or“infobulb”,isalightbulb-sizedpackageincorporatingaprojector,cameraandcomputer,whichilluminatesthespacewith“interactivelight”–interac-tiveelementsthatrespondtotouchinput.Usingtheseembodiments,Iproposefurtherexplora-tionsintoimprovedtouchsensingandon-worldapplicationdevelopment,tocreatetwocom-pleteandpracticallyusableon-worldcomputingsystems.

RobertXiaoProposal–March29,2017

5

CHAPTER2. BACKGROUNDMypresentworkintersectswithmanydiverseareasofhuman-computerinteractionresearch,whichInowreview.First, Iwillreviewworkrelatedtomyproposedinteractionstyleandap-proach,specificallyprioron-worldcomputingliteratureandworkonaugmenteddesktops.Next,Icoverworkthatintersectswithmytechnicalapproach,specificallycoveringprojectionmappingandon-worldtouchcontacttracking.

2.1 ComputingontheWorldThenotionofhavingcomputingeverywherehasbeenagrandchallengefortheHCIcommunityfordecades[Underkoffler1999,Weiser1999].Today,usershaveachievedubiquitouscomputingnotthroughubiquityofcomputing infrastructure,butratherbycarryingsophisticatedmobiledeviceseverywherewego(e.g., laptops,smartphones)[Harrison2010a].Thisstrategyisbothcosteffectiveandguaranteesaminimumlevelofcomputingquality.However,byvirtueofbeingportable,thesedevicesarealsosmall,whichimmediatelyprecludesawiderangeofapplications.

Incontrast,theenvironmentaroundusisexpansive,allowingforlargeandcomfortableinterac-tions.Moreover,applicationscanbemultifaceted,supportingcomplextasks,andallowingformultipleusers.Andperhapsmostimportantly,theenvironmentisalreadypresent–wedonotneedtocarryitaround.Thesesignificantbenefitshavespawnednumerousresearchsystemsforinteractingbeyondtheconfinesofadeviceandontheactualsurfacesoftheworld.

Abroadrangeoftechnicalapproacheshasbeenconsideredforappropriatingtheenvironmentforinteractiveuse,includingacoustic[Harrison2008]andelectromagneticsensing[Cohn2011].Apopularalternativehasbeencamera/projectorsystems.Byusinglight,bothforinput(sensing)andoutput(projection),systemcomponentscanbeplacedoutoftheway,yetprovidedistrib-utedfunctionality.Thisisbothminimallyinvasiveandpotentiallyreducesthecostofinstallation(i.e.notrequiringsubstantialwiringforsensors).

2.1.1 ProjectedinterfacesinandontheWorldSeminalworkonsuchsystemswasinitiatedinthelate1990s.Anearlyproject,TheIntelligentRoom[Brooks1997],eloquentlydescribedtheirobjectiveas:“Ratherthanpullpeopleintothevirtualworldofthecomputer,wearetryingtopullthecomputeroutintotherealworldofpeo-ple.”Thesystemusedcamerastotrackusers,fromwhicharoom’sgeometrycanbeestimated.Apairofprojectorsallowsonewalltobe illuminatedwith interactiveapplications.Additionalcameraswereinstalledonthiswallatobliqueangles,allowingforfingertouchestobedigitized.

Ofnote,TheIntelligentRoomrequiredall interactivesurfacesbepre-selectedandcalibrated.ThegoaloftheOfficeoftheFuture[Raskar1998]wastoenableuserstodesignateanysurfaceasa“spatiallyimmersivedisplay.”Thiswasachievedbycapturingthe3Dgeometryofsurfacesthroughstructuredlight.Withthisdata,interfacescouldberectifiedappropriately,andpoten-tiallyupdatediftheenvironmentwasdynamic(ascould[Jones2010]).Theauthorsalsoexperi-mentedwithheadtracking(viaaseparatemagnetically-drivensystem)toprovideinterfacesthat

RobertXiaoProposal–March29,2017

6

appearedcorrectfromtheuser’sviewpoint(i.e.egocentricallycorrect),evenwhenprojectingonirregularsurfaces.Althoughcalibrationtoasurfacewouldbeautomatic,theworkdoesnotde-scribeanyusermechanismsfordefiningsurfaces.Onceapplicationswererunningonsurfaces,thesystemreliedonconventionalmeansofinput(e.g.,keyboardandmouse).

The LuminousRoom [Underkoffler 1999]was another early explorationofprojector/camera-driveninteraction.Itwasuniqueinthatitenabledsimpleinputoneverydayprojectedsurfaces.Throughcomputervision,objectscouldberecognizedthroughfiducialmarkers.Italsosuggestedthatthesilhouetteofhandscouldbeextracted,andincorporatedintointeractiveapplications.Thesystemusedasingleconventionalcamera,sopresumablyhover/occlusioncouldnotbeeas-ilydisambiguatedfromtouch.LikeTheIntelligentRoom,thissystemalsorequiredpre-calibrationandconfigurationofsurfacesbeforetheycouldbeused.

TheEverywhereDisplaysproject[Pinhanez2001]usedasteerablemirrorinconjunctionwithaprojectortooutputdynamicgraphicsonavarietyofofficesurfaces.Tocorrectfordistortion,acamerawasusedinconcertwithaknownprojectedpattern.Usingthismethod,a3Dscenecouldbeconstructedfordesiredprojectionsurfaces.Theauthorsspeculatethattouchsensingcouldbeaddedbystereocamerasensingorexaminingshadows.

More recently,Bonfire [Kane2009]–a laptopmountedcamera/projectionsystem–enabledinteractiveareasoneithersideofthelaptop.Becausethegeometryofthesetupisknownapriori,thesystemcanbecalibratedonce,allowinggraphicstoberenderedwithoutdistortiondespiteobliqueprojection.Touch interactionwasachievedbysegmenting the fingersbasedoncolorinformation,andperformingacontouranalysis.Thedesk-boundnatureoflaptopsmeansBonfirealsointersectswith“smartdesk”systems(seee.g.,[Koike2001,Wellner1993]),whichtendtobestatic,controlledinfrastructure(i.e.,aspecialdesk).Althoughbothsetupsprovideopportuni-tiesforinteractivecustomization,thecontextissignificantlydifferent.

Theadventoflow-costdepthsensingledtoaresurgenceofinteractiveenvironmentprojects.Asingledepthcameracanviewalargeareaandbeusedtodetecttoucheventsoneverydaysur-faces[Wilson2010a,Wilson2007].LightSpace[Wilson2010b]usedanarrayofcalibrateddepthcamerasandprojectorstocreatealive3Dmodeloftheenvironment.Thiscanbeusedtotrackusersandenable interactionwithobjectsandmenus in3Dspace.LightSpacecanalsocreatevirtualorthographiccamerasinaspecifiedvolume,whichcanbeusedforexample,tocreatethinplanarvolumesthatcanbeusedasmultitouchsensors.AsimilarprojectcalledOASIS[Intel2012]describessimilarcapabilitiesandgoals,althoughmanydetailshavenotbeenmadepublic.

OmniTouch[Harrison2011]isaworndepthcameraandprojectionsystemthatenablesmulti-touchfingerinteractiononadhocsurfaces,includingfixedinfrastructure(e.g.,walls),handheldobjects(e.g.,books),andevenusers’bodies.Applicablesurfaceswithinthesystem’sfieldofview(~2m)aretrackedandanapproximatereal-worldsizeiscalculated,allowinginterfacestobeau-tomaticallyscaledtofit.Orientationisestimatedbycalculatinganobject’saveragesurfacenor-malandsecondmoment,allowingforrectifiedgraphics.Userscan“clickanddrag”onsurfaceswithafinger,whichsetsaninterface’slocationandphysicaldimensions–averysimpleexampleofuser-defined interfaces,aswediscuss in thenextsection.Finally, [Jones2010]allowedfor

RobertXiaoProposal–March29,2017

7

interactiveprojectionsontophysicalsetupsconstructedfrompassiveblocks;userinteractionisachievedwithanIRstylus.

2.1.2 User-DefinedInterfacesAnimportantcommonalityoftheaforementionedsystemsisalackofend-usermechanismsfordefining interactiveareasor functionality.There is,however,a large literatureregardinguserdefinedinteractions,goingbackasfarascommandlineinterfaces[Good1984]andextendingtothepresent,withe.g.,unistrokecharacters[Wobbrock2005]andtouchscreengestures[Nielsen2004,Wobbrock2009].Morecloselyrelatedtoourworkisuser-driveninterfacelayoutandcom-position. Forexample,end-users canauthor interfacesby sketchingwidgetsandapplications[Landay2001],whichisfarmoreapproachablethantraditionalGUIdesigntools.Researchhasalsoexaminedrun-timegenerationofuserinterfacesbasedonavailableI/Odevices,thetaskathand,anduserpreferencesandskill[Gajos2010].

Whenmovingout into thephysicalworld,easilydefining interfaces isonlyhalf theproblem.Equally challenging is providing easy-to-use end-user tools that allow instrumentation of thephysicalworld.Sensors,microprocessorsandsimilarrequireadegreeofskill(andpatience)be-yondthatofthetypicaluser.Hardwaretoolkits[Arduino,Greenberg2001,Hartmann2006,Lee2004]werebornoutofthedesiretolowerthebarriertoentryforbuildingsensordrivenappli-cations.Therehavealsobeeneffortstoenableenduserstoeasilycreatecustom,physical,inter-activeobjectswithofftheshelfmaterials,forexample,Styrofoamandcardboard[Akaoka2010,Avrahami2002,Hudson2006].

Mostcloselyrelatedtoourtechnicalapproacharevirtualsensingtechniques–specifically,ap-proachesthatcansensetheenvironment,butneednotinstrumentit.Thislargelyimpliestheuseofcameras(thoughnotexclusively;seee.g.,[Cohn2011,Harrison2008]).Byremotesensingone.g.,avideofeed,thesesystemscansidestepmanyofthecomplexitiesofbuilding interfacesphysicallyontotheenvironment.

OnesuchprojectisEyepatch[Maynes-Aminzade2007],whichprovidesasuiteofcomputervisionbasedsensors, includingtheability torecognizeandtrackobjects,aswellasrespondtousergestures.Thesecomplexeventscanbestreamedtouser-writtenapplications,greatlysimplifyingdevelopment.Slit-TearVisualizations[Tang2008],althoughnotusedasinputsperse,arecon-ceptuallyrelated.Theinterfaceallowsuserstodrawsensingregionsontoavideosteam,which,throughasimplevisualization,allowuserstoreadilydistinguishenvironmentalevents,suchascarspassing.Similarly,LightWidgets[Fails2002]allowsuserstoselectregionsoneverydaysur-facesusingalivevideofeedforsensing.Regionscaninstantiateoneofthreedifferentwidgettypes:buttons,linearslidersandradialdials.Sensingisachievedwithapairofcamerassetapart(todisambiguateocclusionfromtouchonsurfaces),withuserfingersdetectedbyfindingskin-coloredblobs.

2.2 AugmentedDesktopsMuchoftheworkrelatingtoon-worldinteractioncanbefoundintheaugmenteddesktopliter-ature.Theconceptofanaugmenteddesktop–eitherprojected[Wellner1993]orthroughAR/VR

RobertXiaoProposal–March29,2017

8

technologies[Mulder2003]–goesbacktoatleastthe1970’swithKnowlton’sopticallysuperim-posedbuttonarray [Knowlton1977].However, the fullvisionof theaugmenteddeskdidnotappearuntiltheearly90s,withseminalsystemssuchasXeroxPARC’sdigitaldesk[Newman1992,Wellner1993]andIshii’sTeamWorkStation[Ishii1990].Theconceptwasrapidlyexpandeduponinthe90’swithkeysystemsincludinginteractiveDESK[Arai1995],EnhancedDesk[Koike2001],I/OBulb[Underkoffler1999],IlluminatedLight[Underkoffler1998],OfficeoftheFuture[Raskar1998]andtheEverywhereDisplaysProjector[Pinhanez2001].

Recentadvances inelectronicsandnetworkinghavecreatedopportunities to refine theaug-menteddeskexperience.MagicDesk[Bi2011]prototypedanaugmenteddeskinterfaceusingaphysical touchscreenas thedesk surface. IllumiShare [Junuzovic2012]offersa sophisticated,end-to-enddesktopremotecollaborationexperience,whileBonfire[Kane2009]takesthecon-ceptmobile with cameras and projectors operating behind the lid of a laptop.MirageTable[Benko2012]mergesphysicalandvirtualobjectsonatabletop,withasimplephysics-basedin-teractionapproach.Newerprojects,suchasLuminAR[Linder2010]andARLamp[Kim2014],putforwardlight-bulb-likeimplementationsinattemptstoachievethetechnicalvisionproposedintheI/OBulb[Underkoffler1999].

Systemsthatsuperimposerectifiedinformationontophysicalartifacts(e.g.,[Newman1992,Un-derkoffler1998,Wellner1993])mightbedescribedasusing“following”or“snapping”.However,thereisanimportantdifferencebetweenprojectedcontentthatmerelytrackswithphysicalob-jects,andaninterfacethatattachestoanobject3Dgeometryandfollowsitsmovements.Theclosestrelatedworkisthe“binding”behaviordescribedinLivePaper[Robertson1999].Otherefforts,suchasWorldKit[Xiao2013],aimtobootstrapon-worldapplicationdevelopmentbyof-feringanSDKtoabstractawaymanyofthecomplexitiesofoperatingoneverydaysurfaces(e.g.,touchtracking,rectifiedprojectedoutput).Therehavealsobeenrecentdesign-orientedeffortstostudye.g.,augmenteddeskusageinthewild[Hardy2012]aswellassuperiordeskformfactors[Wimmer2010].

Finally,ObjecTop[Khalilbeigi2013]proposesasetofinteractivebehavioursthatcanbeusedtointeractwithoccludedinterfacesunderneathphysicalobjects,includinghighlighting,reposition-ingandgroupingoccludedobjects.AlthoughObjecTopusedanopticalmultitouchtableandfi-ducially-trackedplanarobjects,theinteractionsarestillapplicabletoprojectedinteractions.Inthiswork,wedescribethetechnicalapproachneededforvirtual-physicalcohabitationinatrue3Dsettingwithoutanyinstrumentationoftheobjectsorsurfaces,andfurtherextendObjecTopwithinteractionsforsummoninginteractiveapplicationsandautomaticallyevadingphysicalob-stacles.

Behaviorssurroundingphysicaldesksandworkplaceshavelongintriguedresearchersfromfieldsincludingmanagementsciences,culturalanthropologyandergonomics(e.g.,[Malone1983,Sel-len2003,Vyas2012]).Morerecently,HCIresearchershavestudieddeskpracticetobetterun-derstandhowtosupportandintegratedigitalworkflows(e.g.,[Bondarenko2005,Gebhardt2014,Hardy2012,Malone1983,Steimle2010]).Whilethispriorworkprovidesgreatinsightintothecultureofdeskpractice,ittendstooverlooktheminutiaeofsmall-scale,desk-levelinteractions.

RobertXiaoProposal–March29,2017

9

2.3 On-WorldDisplayToprovideacompleteinteractiveexperience,someformofdisplayisneededtooverlayvirtualcontentontoordinarysurfacesintheenvironment.Thisisgenerallyknownasaugmentedreality(AR)–theaugmentationofthephysicalrealitywithvirtualimagery.Thereareseveralpossibleapproaches,althoughthemostcommonlyseenmethodsarespatialAR,mobileARandimmer-sive(head-mounteddisplay)AR.

2.3.1 SpatialAugmentedRealityAcommonpathtoon-worlddisplayistoprojectcontentontothesurfacesusingdataprojectors,placingthevirtualcontentdirectlyonthephysicalobjects.Thisisalsocommonlyknownaspro-jectionmappingwithinthevisualeffects,artandadvertisingdomains,enablingcomplexvisualshowsandinteractiveexhibitstobeimplementedusingsimplematerialsandprojectionunits.Spatialaugmentedrealityhasa longhistory,andthe interestedreader isdirectedto[Bimber2005]foramorecompletediscussionofthistopic.

Afewapproachesforprojectingontothecomplexandoftenirregulargeometryofeverydayen-vironmentsareespeciallyrelevanttomyproposedwork.Forexample,TheOfficeoftheFuture[Raskar1998]proposedusing3Dheadtrackingandoffice-widedepthsensingtoprojectimageryonto irregular surfaces, such that itwouldappearperspectivelycorrect fromtheuser’sview.iLamps[Raskar2003]usedstructuredlighttosensethe3Dgeometryofaprojectionsurface(e.g.multiplewallsorcurvedsurfaces)andusesthisdatatominimizevisualdistortionofprojectionoutput.Finally,depthcamerashavemadeiteasiertosensethegeometryofanenvironmentandperformprojectionmapping.Thisenablesreal-timerectificationontomovingtargets,asshownine.g.,OmniTouch[Harrison2011].

2.3.2 MobileAugmentedRealityAnotherapproachforaugmentedrealityistheuseofhandhelddevicestodisplayvirtualcontentoverlaidonlivecameraimagery,viavideosee-through.Usersmayinteractwiththevirtualcon-tentthroughthehandheld’stouchscreenorbyphysicallymovingthehandheld.Thisapproachhasrecentlygainedprominenceduetothepopularityandubiquityofmobilephones.

Anearlyimplementationofmobileaugmentedrealitywasexploredin[Wagner2003],inwhichaPDAhandheldwasmodifiedwithacameraaddontosupportaugmentedvideo.Commonap-plicationareas forsuchhandheldaugmentedrealitysystems includee.g., supportingphysicalnavigation[Mulloni2011],mobilegaming(e.g.augmentedtabletopgames),orprovidinglow-cost shared virtual experiences [Wagner 2005]. However, the need to constantly hold thehandhelduptoemploythecameraprecludesconvenientuseofthisapproachineverydaycon-texts,inwhichinteractionswithbothhandsaredesirable.Inmypresentwork,Imainlyfocusoninteractionswiththesurfacesdirectly,withouttheindirectionintroducedbymobileaugmentedrealitysystems.

RobertXiaoProposal–March29,2017

10

2.3.3 Head-MountedAugmentedRealityFinally,recenttechnologicaldevelopmentshavemadehead-mountedaugmentedrealitydevices,orhead-mounteddisplays(HMDs)possible.Suchdevicestypicallytaketheshapeofglassesorhelmet-likedevices,withtranslucentdisplaysoverlayingvirtualcontentontopofthephysicalworld.Whilethesesystemswerepreviouslyavailableasheads-updisplaysforspecializedappli-cations(e.g.militarypilotinterfaces),technologicalimprovementshavebroughtsuchdevicesinrangeofordinaryconsumers.Augmentedrealityglasses(e.g.theEpsonMoverio)andheadsets(e.g.theMicrosoftHoloLens)haverecentlybecomeavailabletoconsumers.

HMD-basedaugmentedrealitymakesitpossibletoplacevirtualcontentwithinaphysicalspace,aswellasoverlayingexistingsurfacesinanenvironment,creatingimmersiveexperiences.Forthis reason,HMD-basedaugmented reality systemsareoften said toprovide immersiveaug-mentedreality,orIAR.WhilethefieldofIARinteractionsisrelativelynew,explorationssuchas3Dcollaborativevideochat[Chen2015]suggestpowerfulcapabilitiesandpromiseforthisplat-forminthefuture.

2.4 TouchTrackingonLargeSurfacesTherearemanydifferentapproachesfortouchtrackingonlargesurfaces.Thesimplestapproachistocreateaspecialpurposesurface,usinge.g.cameras[Han2005,Matsushita1997]orcapac-itivesensors[Lee1985].Alternatively,existingsurfacescanberetrofittedwithsensors,suchasacousticsensorstodetectthesoundofatap[Paradiso2002],orinfraredemittersandreceiverstodetectocclusionfromafinger.Therearealsomethodsthatcanoperateonadhoc,uninstru-mentedsurfaces.Thesesystemsmostoftenuseopticalsensors(e.g.,cameras[Koike2001]orLIDAR[Paradiso2000]).

DetectingwhetherafingerhascontactedasurfaceischallengingwithconventionalRGBorin-frared cameras,whichhas inspired several approaches.PlayAnywhere [Wilson2005]demon-stratedatouchtrackingapproachbasedonanalyzingtheshadowscastbyafingernearthesur-face.Sugitaetal. [Sugita2008]detect touchesby tracking thevisualchange in the fingernailwhen it ispressedagainstasurface.TouchLight [Wilson2004]usesastereopairofcameras,detectingtoucheswhenthefingerimagespassbeyondavirtualplane.Manyothersystemsusefingerdwelltimeoranexternalsensor(accelerometer[Kane2009],microphone,oracousticsen-sor[Paradiso2002,Xiao2014])todetecttouchevents.

Mostrelatedtoourpresenttechniquearedepthcamera-basedtouch-trackingsystems.Depthcamerassensethephysicaldistancefromthesensortoeachpointinthefieldofview,makingitpossible(inconcept)toinnatelysensewhetherafingerhascontactedasurfaceornot.Broadly,thesesystemscanbeplacedintotwocategories:

Backgroundmodelingapproachescomputeandstoreamodelorsnapshotofthedepthback-ground.Touchesaredetectedwherethelivedepthdatadiffersfromthebackgrounddepthmapinspecificways.Wilson[Wilson2010a]usesabackgroundsnapshotcomputedfromthemaxi-mumdepthpointobservedateachpixeloverasmallwindowoftime.KinectFusion[Izadi2011]usesabackgroundmodeldevelopedbyanalyzingthe3Dstructureofthescenefrommultiple

RobertXiaoProposal–March29,2017

11

angles(SLAM),effectivelyproducingastatisticallyderivedbackgroundmap.WorldKit[Xiao2013]alsousesastatisticalapproach,computingthemeanandstandarddeviationforeachpixel,mod-elingboththebackgroundandnoise.Finally,MirageTable[Benko2012]capturesabackgroundmeshandrendersforegroundhandsasparticles.

Fingermodelingapproachesattempttosegmentfingersbasedontheirphysicalcharacteristics,andgenerallydonotrequirebackgrounddata.OmniTouch[Harrison2011]usedatemplate-find-ingapproachtolabelfinger-likecylindricalsliceswithindepthimages.Theslicesarethenmergedintofingers,andfinallytouchcontacts.FlexPad[Steimle2013]detectedandremovedhandsbyanalyzingthesubsurfacescatteringoftheKinect’sstructuredinfraredlightpattern,allowingthebackgroundtobeuniquelysegmented.

Surprisinglyfewsystemsattempttofusedepthsensingwithothersensingmodalitiesfortouchtracking.Oftheexistingliteratureonsensorfusiondepth-sensingsystems,onlytheDanteVisionproject[Saba2012]usesamultisensoryapproachfortouchtracking,combiningdepthsensingwiththermalimaginginfraredcamera.Thisisusedtoimprovetouchcontactdetectionaccuracy(athermalimprintisleftonthesurfaceuponphysicaltouch),thoughattheexpenseofcontinu-oustrackingandhighcontactlatency(~200ms).

RobertXiaoProposal–March29,2017

12

CHAPTER3. INITIALEXPLORATIONSMyinterestinon-worldinteractionwasinformedandshapedbymyinitialforaysintohuman-computerinterfaces.Inthischapter,Idescribeafewofthekeyprojectsalongthisresearchtra-jectory.IstartbyexploringthegeneraltopicofinteractionswithobjectsintherealworldwithEMSense,thennarrowmyfocustointeractionsacrossdisplays intheworldwithCapCamandUbiCursor.Tocomplementinteractionswithobjects,IalsoconsiderinteractionswithsurfacesinToffee,settingthestagefortrueon-worldinteractionsintheremainderofthisproposal.

3.1 AccessingandInteractingwithInfrastructureintheWorld:EM-Sense

Figure 3.1. EM-Sense Phone “Print Document” contextual charm.

While the user is reading a document (a), they can tap the phone on a printer to bring up a “print” charm (b). Activating the charm spools the document to the printer (c), which prints it immediately (d).

Oneapproachtodevelopingon-worldinteractionistointeractwithintelligentdevicesalreadyembeddedintheenvironment.Wearesurroundedbyanever-growingecosystemofconnectedand computationally-enhancedappliances, from smart thermostats and lightbulbs, to coffeemakersandrefrigerators.Themuch-laudedInternetofThings(IoT)revolutionpredictsbillionsofsuchdevices inuseby thecloseof thedecade [Gartner2015].Despiteofferingsophisticatedfunctionality,mostIoTdevicesprovideonlyrudimentaryon-devicecontrols.Thisisbecause1)itisexpensivetoincludee.g.,largetouchscreendisplaysonlow-cost,mass-markethardware,and2)itischallengingtoprovideafull-featureduserexperienceinasmallformfactor.Instead,mostIoTappliances relyonusers to launcha special-purposeapplicationon their smartphonesorbrowsetoaspecificwebpageinthecloudorontheirlocalareanetwork.Quintessentialexamplesinclude“smart”lightbulbs(e.g.,PhilipsHue),mediadevices(e.g.,Chromecast),Wi-Ficameras(e.g.,Dropcam)andinternetrouters.

Clearly,thismanuallaunchingapproachwillnotscaleasthenumberofIoTdevicesgrows.Ifwearetohavescoresofthesedevicesinourfuturehomesandoffices—asmanyprognosticate—willwehavetosearchthroughscoresofapplicationstodimthelightsinourlivingroomorfindsomething towatchonTV?What isneeded isan instantandeffortlessway toautomaticallysummonrichuserinterfacecontrols,aswellasexposeappliance-specificfunctionalitywithinex-istingsmartphoneapplicationsinacontextuallyrelevantmanner.

RobertXiaoProposal–March29,2017

13

Weexploredtwoapproachestomitigatethisinteractionbottleneck.Themoststraightforwardoption is toautomatically launchmanufacturers’applications instantlyuponcontactwith theassociatedappliance.Forexample,touchingasmartphonetoathermostatlaunchesthethermo-stat’sconfigurationapp(Figure3.2).Inthiscase,thecurrentlyrunningapponthesmartphoneisswappedoutforanewfullscreenapp.Alternatively,thephonecanexposewhatwecallcontex-tualcharms—smallwidgetsthatallowtherunningsmartphoneapplicationtoperformactionsonthetouchedappliance.Forexample,whenreadingaPDF,touchingthephonetoaprinterwillrevealanon-screenprintbutton(Figure3.1).

Figure 3.2. Example full-screen applications.

Left to right: top row: refrigerator, television, thermostat, lightbulb. Bottom row: door lock, projector, and wireless router.

Thisgeneralvisionof rapidandseamless interactionwithconnectedapplianceshasbeenex-ploredmanytimesinpriorwork(e.g.,[Hodes1997,Olsen2008,Schmidt2012]),andinthisworkwesetouttopracticallyachieveit.Wecreatedfull-stackimplementationsforseveralofourex-ampleapplicationstoshowthattheinteractionsarerealizabletoday.IncaseswhereapplianceshaveproprietaryAPIs,wecanautomaticallylaunchthemanufactures’smartphoneapp.ForIoTdeviceswithopenAPIs(fortunatelythetrend),contextualcharmscanexposeappliance-specificfunctionalityacrossthesmartphoneexperience.

To recognizeapplianceson-touch,wehad to significantlyextend the technicalapproachpro-posedbyLaputetal. inEM-Sense[Laput2015]—asmartwatchthatdetectedelectromagneticemissionsofgraspedelectricalandelectromechanicalobjects.Critically,ourtechnicalapproachrequiresnomodificationorinstrumentationofappliances,andcanthereforework“outofthebox”withalready-deployeddevices.

RobertXiaoProposal–March29,2017

14

Insummary, inDeusExMachina, Iexploredanovelsystemthatallows instantrecognitionofuninstrumentedelectronicappliances,whichinturnallowsustoexposecontextualfunctionalityviaasimpletap-on-deviceinteraction.Wedemonstratedsubstantiallybetteraccuracythanpriorwork(98.8%recognitionof17unmodifiedappliancesinauserstudy),whilerunningentirelyonanaugmentedsmartphone.Inadditiontoconventionalfull-screencontrolapplications,wecon-tributed“contextualcharms”,anewcross-device interactiontechnique.Finally, incontrasttomostpriorwork,wecreatedtrulyfunctionalimplementationsformanyofourexampledemosusingexistingIoTcapabilities.

3.2 Facilitating Across-Device Interaction with Large Displays in the World:CapCamAlthoughsmallsmartphonesarepowerful,especiallywhenaugmentedwithtechniqueslikeDeusExMachina,theystillcannotdoeverything.Forone,theycannotenablethekindsofexpansive,free-formcreativityandideationofferedby largecanvases,suchaswhiteboardsandTV-sizedtouchscreens.Indeed,onepossiblecomputationalfutureistheintegrationofsmartwhiteboardsandwall-sizedtouchscreensintoeverydaycontexts.

Largetouchscreendisplays,suchaspublickiosks,digitalwhiteboardsandinteractivetabletopshave become increasingly popular as prices have fallen. Similarly, mobile devices, such assmartphonesandtablets,haveachievedubiquity.Whilesuchdevicesarereasonablysmartontheirown,cross-deviceinteractionsholdmuchpromiseformakinginteractiveexperiencesevenmorepowerful[Hinckley2004].

However,interactingacrossdevicesisrarelystraightforward.Althoughmanymobiledevicessup-porte.g.,Bluetoothpairing,suchpairingoptionsaregenerallytime-consumingandcumbersome(i.e.,ontheorderof5seconds,seeTable1).Often,usersmustconfirmormanuallyentercon-nection parameters (e.g., device, network identifier) and security parameters (e.g., PIN) [Re-kimoto1997].Short-rangeNFC,anemergingtechnology,aimstomitigatemanyoftheseissues.However, it requiresspecifichardwareonbothdevices,andmore importantly,only indicatesdevicepresence,notposition(unlessreceiversaretiled intoamatrix [Seewoonauth2009]orcombinedwithanothermethod, likeopticalfiducialtags[Bazo2014]).Thus, itonlyallowsforcoarsedevice-to-devicepairing,precludingmetadatasuchasspatialpositionandrotationofde-vices,aswellasrichmulti-deviceexperiences.Moreover,NFCisnotcommonlyavailableonlargerdevices,suchaslaptops,tablets,andinteractivesurfaces–theclassofsurfaceswechieflytarget.

RobertXiaoProposal–March29,2017

15

Figure 3.3. CapCam pairing and interaction process.

A: CapCam is used to pair two devices, a “cap” device (background is a large touchscreen display) and a “cam” device (here, a smartphone). B: The phone is pressed to the display. C: The phone body creates a characteristic signal on the touchscreen’s capacitive sensor. D: CapCam extracts the shape, position and orientation of the phone from this capacitive image. E: CapCam encodes pairing data (e.g., IP, port and password) as a flashing color pattern, rendered beneath the phone body. F: The phone’s rear camera captures the pattern, and uses it to establish a conventional two-way wireless link (e.g., WiFi). With both devices paired and communicating, interactive applications can be launched, such as this virtual keyboard.

Inresponse,wedevelopedCapCam,anewtechniquethatprovidesrapid,ad-hocconnectionsbetweentwodevices.CapCampairsa“cap”devicewithacapacitive touchscreentoa“cam”devicewithacamerasensor(Figure3.3A).Forexample,typicalsmartphonesandtabletscanbepairedwitheachother,andthesedevicescanbepairedtoeven largertouchscreens,suchassmartwhiteboardsandtouchscreenmonitors.CapCamusesthecapdevice’stouchscreentode-tectandtrackthecamdevice(Figure3.3,CandD),andrenderscolor-modulatedpairingdatathatiscapturedbythecamdevice’srearcamera(Figure3.3E).

Thispairingdatacontainsconfiguration informationnecessarytoestablishabidirectional link(e.g.,IPaddress,portandpassword).Inthisway,CapCamprovidesaunidirectionalcommunica-tionmechanismfromthetouchscreentothecamera,whichisthenusedtobootstrapafullbidi-rectional,high-speedlink(Figure3.3F).BecauseCapCamalsoprovidesprecise,continuousspatialtracking,wecanenablerichsynergisticapplicationsutilizingboth(ormany)devicesatonce.

Overall,webelieveCapCamexhibitssixdesirableproperties–itenableszero-configurationpair-ingviaautomaticallytransmittedpairingcodes;itisrapid,capableofestablishinglinksinroughlyonesecond;anonymous,inthatitrequiresnoidentifyinginformationtobeexchanged;pairingisexplicitlyinitiatedbyusersthroughapurposefulpressingofadevicetoahostscreen;itenablestargetedinteractionsonsaidscreenviapositiontracking;and,itallowsformultipledevicestobepairedandusedonthesamecapdevicesimultaneously.

Althoughmanypriorsystemshaveindependentlyaddressedpairingorspatialinteraction,fewhavecombinedtheseintoasinglesystem.CapCamprovidesbothpairingandspatialinteractionasphasesofasingleinteractivetransaction,enablingrapid,adhocinteractions,e.g.walkinguptoapublicdisplayandinitiatingrich,spatialinteractionsnearlyinstantaneously.

Wealsodeveloped several applications and interactionsenabledbyCapCam (suchas a two-playerairhockeygame,seeninFigure3.4),andperformedanevaluationofthetechnicalaspectsofourapproach,includingpairinglatency,pairingcodebandwidthandbiterrorrateacrossthreeexemplarydevices.

A B C D E F

id=17, x=200, y=174, a=15°, ... time ... rx/tx

CM

RobertXiaoProposal–March29,2017

16

Figure 3.4. CapCam air hockey.

(A) Players are invited to join the game. (B) When players press their phones to the table, CapCam rapidly and anony-mously pairs the phones to the display. When two phones are paired, the game begins. (C) Players use their physical phones to deflect the virtual puck. CapCam tracks the phones and their orientations on the screen. (D) Sounds, vibrations and player-specific information appear on the phone, directed by the game through the paired connection.

3.3 FacilitatingAcross-World,Large-DisplayInteraction:UbiCursorIftheentireenvironmentconsistsofaconstellationofsmallandlargedisplaysanddevices,howcanyouinteractacrossmultiplesuchdevices?Multi-displayenvironments(MDEs)aresystemsinwhichseveraldisplaysurfacescreateasingledigitalworkspace,eventhoughthephysicaldis-playsthemselvesarenotcontiguous.TherearemanydifferenttypesofMDE:dual-monitorcom-putersareasimple(andnowubiquitous)example,butmorecomplexenvironmentsarealsonowbecomingfeasiblesuchascontrolroomswithmultiplemonitorsinmultiplelocations,meetingroomswithwallandtabledisplays,orad-hocworkspacesmadefromlaptopsandmobiledevices.

OnemainprobleminMDEsisthatofmovingthecursorfromonedisplaytoanother[Nacenta2009].Thisisessentiallyatargetingtask,butonethatdiffersfromstandardtargetinginthatthevisualfeedbackisfragmentedbasedonthelocationsandsizesofthephysicaldisplays.Insomesituations,displaysmaybefarapart,ormaybeatdifferentanglestooneanotherortotheuser.ThecompositionoftheMDEandthearrangementofphysicaldisplayscanhavelargeeffectsonpeople’sabilitytomovebetweenvisiblesurfaces.

TherearetwocommonwaysinwhichMDEworkspacescanbeorganized:‘warping’andperspec-tive-etherapproaches.Warpingmeanstransportingthecursordirectlyfromonedisplaytoan-other,withoutmoving through the physical space betweenmonitors. Several techniques forwarpinghavebeendeveloped,suchasstitching(whichwarpsthecursorasitmovesacrossspe-cificedgesofdifferentdisplays)[Benko2007],wormholes(whichwarpthecursorwhenitmovesintoaspecificscreenregion),warpbuttons (inwhichpressingasoftwareorhardwarebuttonmovesthecursortoeachdisplay)[Benko2005],ornameddisplays(inwhichtheuserselectsthedestinationdisplayfromalist).

Warpingtechniquescanbefastandeffectiveforcross-displaymovement.However,theysufferfromanumberofproblems.Warpingrequiresthattheuserrememberanadditionalmapping(edges,holes,buttons,ornames),whichmighttaketimetolearn;insometechniques(suchasstitching),themappingsmaybecomeincorrectwhentheusermovestoanewlocationintheenvironment.Warptechniquesarealsodistinctly lessnatural thanregularmousemovement:theyintroduceanextrastepintostandardtargetingactions,andmakeitmoredifficultforthe

RobertXiaoProposal–March29,2017

17

usertoplanandpredicttheresultofballisticmovements[Nacenta2008].Finally,theinstanta-neousjumpsofwarpingtechniquescausemajortrackingandinterpretationproblemsforotherpeoplewhoaretryingtofollowtheactionintheMDE.

Figure 3.5. UbiCursor low-resolution full-coverage display (LRFC).

Left: schematic of the LRFC display. By reflecting onto the spherical mirror, the projector can project onto almost any surface. Right: the movements of the mouse cause a change in the orientation of perspective cursor’s defining ray.

Perspective-ether techniques for cross-display movement are a different approach that ad-dressestheseproblems.Inthisapproach,theentireenvironmentisconsideredtobepartoftheworkspace,includingthespacebetweenthedisplays(i.e.,‘mouseether’[Baudisch2004]).Thevisiblepartsoftheworkspace,correspondingtothephysicaldisplays,arethenarrangedbasedonwhattheusercanseefromtheircurrent locationandperspective.Perspective-etherMDEviewsprovideaworkspaceinwhichcursormovementbehavesastheuserexpects,andinwhichthearrangementofdisplayscorrespondsexactlytowhattheuserseesinfrontofthem.

Thenaturalmappingofaperspective-etherview,however,comesatthecostofhavingtoincludethe‘ether’(i.e.,thereal-worldspacebetweenmonitors)inthedigitalworkspace.Thisimpliesthatinordertogetfromonedisplaysurfacetoanother,usersmustmovethroughadisplaylessregionwherethereisnodirectfeedbackaboutthelocationofacursor.Thisisnotamajorprob-lemwith ray-castingsolutions (e.g., ‘laserpointing’),butdoesaffect indirectpointingdevicessuchasmiceortrackpads.Onecurrentsolutionistousetheavailabledisplaysurfacestoprovideindirectfeedbackaboutthelocationofthecursor–thatis,eachdisplayprovidesfeedback(suchasanarroworhalo)toindicatethelocationofthecursorindisplaylessspace.

Althoughindirectfeedbackfornon-displayedtargetscanbeeffective(e.g.,[Gustafson2008]),itdoesrequirethattheuserperform(sometimescomplex)estimationandinferencetodeterminethe cursor’s actual location, making cross-display movement more difficult than movement

E r A

RobertXiaoProposal–March29,2017

18

withinadisplay.Toaddresstheproblemsofindirectcursorfeedback,weproposeasimplesolu-tion:providedirectvisualfeedbackaboutthelocationofthecursorin‘displayless’space.

Wehavebuiltanoveldisplaysystem,calledaLow-ResolutionFull-Coverage(LRFC)displaythatcanaccomplishthissolutioninanymulti-displayenvironment.TheLRFCsystemusesadatapro-jectorpointedatahemisphericalmirrortoblankettheentireroomwithaddressable(althoughlowresolution)pixels.UsinganLRFCdisplaytoprovidefeedbackaboutthecursorintheemptyspacebetweenmonitorsresultsinatechniquecalledUbiquitousCursor(orUbiCursor).Thepro-jectoronlydrawsthecursorinthespacebetweenphysicalmonitors,andusessimpleroommeas-urementstoensurethatthecursorisshowninthecorrectlocationfortheuser.Theresultisfast,accurate,anddirectfeedbackaboutthelocationofthecursorin‘displayless’space.Ourgoalisnottoturntheentireroomintoasurfaceforshowingdata[Welch2000]–onlytoprovideinfor-mationaboutobjectsthatarebetweenphysicaldisplays.

Totestthenewtechnique,weranastudyinwhichparticipantscarriedoutcross-displaymove-menttaskswiththreetypesofMDE:stitched,perspective-etherwithindirectfeedback,andper-spective-etherwithdirectfeedback(i.e.,UbiquitousCursor).OurstudyshowedthatmovementtimesweresignificantlylowerwithUbiCursorthanwitheitherstitchingorindirectfeedback.Thisworkisthefirsttodemonstratethefeasibilityoflow-resolutionfull-coveragedisplays,andshowsthevalueofprovidingdirectcursorfeedbackinmulti-displayenvironments.

Multi-displayenvironmentspresenttheproblemofhowtosupportmovementofobjectsfromonedisplaytoanother.WedevelopedtheUbiquitousCursorsystemasawaytoprovidedirectbetween-displayfeedbackforperspective-basedtargeting.InastudythatcomparedUbiquitousCursorwith indirect-feedbackHalosandcursor-warpingStitching,weshowedthatUbiquitousCursorwassignificantlyfasterthanbothotherapproaches.Ourworkshowsthefeasibilityandthevalueofprovidingdirectfeedbackforcross-displaymovement,andaddstoourunderstand-ingoftheprinciplesunderlyingtargetingperformanceinMDEs.

Our initialexperienceswithUbiquitousCursorsuggestseveraldirections for furtherresearch.First,weplantotesttheUbiCursortechniquewithmorerealisticMDEtasks;inparticular,wewillexploretheeffectsofhavingdifferentC:DratiosintheprojecteddisplayandtheMDEdisplays.Second,wewillfurtherinvestigatetheprinciplesuncoveredinourstudy(effectsofanglediffer-encesbetweendisplays,performancethresholdsforthedifferenttechniques,theeffectsofdif-ferentdisplayandtargetsizes,andtheuseofthetechniquewithotherinputdevices).Third,wewillexploretheotherpossibilitiespresentedbytheideaofalow-resolutionfull-coveragedisplay,whichcanenableaugmentationofandinteractionwithreal-worldobjectsinsidethescopeoftheprojecteddisplay.

3.4 AdHocTouchSensingontheWorld:ToffeeUbiCursorsolvestheproblemofprojectingontotheworld,butnaturallywewanttoaskifitispossibletoreachoutandtouchtheprojections.Touchsensingontheworldisanentirelysepa-rateproblem,andsoToffeewasbornoutofadesiretoexperimentwithon-worldtouchsensing.

RobertXiaoProposal–March29,2017

19

Figure 3.6. Toffee tap tracking process.

A: When a finger taps equidistant to two sensors, such that arrival time t1=t2, it is only possible to infer that the finger tapped somewhere along a line (dashed). B: If the finger lies closer to one sensor, this function becomes hyperbolic. C: In a four-sensor setup, six hyperbolas can be computed; intersections are solutions for the originating touch location. D: Real world data of the scenario in C plotted in matplotlib. E: A visualization of total squared error.

Inorderformobiledevicestofit intoourpocketsandbags,theyaregenerallydesignedwithsmallscreensandphysicalcontrols.Simultaneously,humanfingersarerelativelylarge(andareunlikelytoshrinkanytimesoon).Thishasledtotherecurringproblemoflimitedsurfaceareafortouch-basedinteractivetasks.Thus,thereisapressingneedtodevelopnovelsensingapproachesandinteractiontechniquesthataimtomitigatethisfundamentalconstraint.

Oneoptionistotransientlyappropriatesurfaceareafromtheenvironmentaroundus[Harrison2010b].Thisallowsdevicestoremainsmall,butopportunisticallyprovidelargeareasforaccurateandcomfortableinput(andpotentiallygraphicaloutputife.g.,projectorsareused).Tables,inparticular,havepresentedanattractivetargetforresearchers.Foremost,mobiledevicesoftenresideontables,enablingseveraladhocsensingapproaches(e.g.,vibro-acoustic[Harrison2008,Kane2009]andoptical[Butler2008,Kratz2009]).Moreover,unlikee.g.,paintedwalls,tablesareacceptedareasforwork,havedurablesurfaces,andtypicallyhavesurfaceareaavailableforin-teractiveuse.

WepresentToffee,asystemthatallowsdevicestoappropriatetablesandotherhard,flatsur-facestheyareplacedonforadhocradialtapinput.Thisisachievedusinganovelapplicationofacoustictimedifferencesofarrival(TDOA)analysis[Bancroft1985,Caffery1998,Carter1981,Ishii1999,Leo2002,Paradiso2005,Paradiso2002]–bringingthetechnique,forthefirsttime,tosmalldevicesinawaythatiscompatiblewiththeirinherentmobility.Atahighlevel,Toffeeallowsuserstodefinevirtual,adhocbuttonsonatable’ssurface,whichcanthenbeusedtotriggeravarietyofinteractivefunctions,includingapplicationlaunching,desktopswitching,mu-sicplayercontrol,andgaming.Leveragingthelargesurfaceareaofthetable,userscanpoten-tiallytriggersimplefunctionalityeyesfree.

Ourstudyresultssuggestthatresolvingatrue,2Dposition(angleanddistance)isnotsufficientlyrobusttoenableaccurateinteractiveuse.However,angleestimationisrobust,withanaverageerrorof4.3°onalaptopsizedsetup.Thus,wesuggestthatinteractionsshouldbebuiltaroundradialinteraction,similartobezelinteractions[Ashbrook2008]andperipheralfree-spaceges-turing[Harrison2009].Ourexampleapplicationsarebuiltusingthisinteractionparadigm,and

RobertXiaoProposal–March29,2017

20

allowuserstomakeuseoftheexpandedenvelopeofinteractivespacesurroundinglaptops,tab-lets,andsmartphones.

Figure 3.7. Toffee-enabled music player.

Music player controls can be bound to regions around the laptop, e.g., volume up and down.

RobertXiaoProposal–March29,2017

21

CHAPTER4. ON-WORLDPROJECTIONANDTOUCHSENSING

4.1 IntroductionCreatinginterfacesintheworld,whereandwhenweneedthem,hasbeenapersistentgoalofresearchareassuchasubiquitouscomputing,augmentedreality,andmobilecomputing.InthischapterIdiscusstheWorldKitsystem,whichsupportsveryrapidcreationoftouch-basedinter-facesoneveryday surfaces. Further, it supportsexperimentationwithother interaction tech-niquesbasedondepth-andvision-basedsensing,suchasreactingtotheplacementofanobjectinaregion,orsensingtheambientlightinoneareaofaroom.

Figure 4.1. Sample interaction in WorldKit.

Using a projector and depth camera, the WorldKit system allows interfaces to operate on everyday surfaces, such as a living room table and couch (A). Applications can be created rapidly and easily, simply by “painting” controls onto a desired location with one’s hand - a home entertainment system in the example above (B, C, and D). Touch-driven inter-faces then appear on the environment, which can be immediately accessed by the user (E).

WorldKitdrawstogether ideasandapproachesfrommanysystems.Forexample, itdrawsonaspectsofEverywhereDisplays[Pinhanez2001]conceptually,LightWidgets[Fails2002]experi-entially,andLightSpace[Wilson2010b]technically.Basedonverysimplespecifications(inthedefaultcase,justasimplelistofinteractortypesandactioncallbacks)interfacescanbecreatedwhichallowuserstoquiteliterallypaintinterfacecomponentsand/orwholeapplicationswher-evertheyareneeded(Figure4.1)andthen immediatelystartusingthem. Interfacesareeasyenoughtoestablishthattheusercould,ifdesired,produceaninterfaceontheflyeachtimetheyenteredaspace.ThisflexibilityisimportantbecauseunlikeanLCDscreen,theworldaroundisever-changingandconfiguredinmanydifferentways(e.g.,ourlabisdifferentfromyourlivingroom).Fortunately,wecanbringtechnologytobeartoovercomethisissueandmakebestuseofourenvironments.

LikeLightSpace[Wilson2010b],oursystemmakesuseofaprojectorandinexpensivedepthcam-eratotracktheuser,sensetheenvironment,andprovidevisualfeedback.However,oursystemdoesnotrequireadvancecalibrationofthespacesitoperatesin–itcansimplybepointedatnearlyanyindoorspace.Further,withaprojectorslightlysmallerthantheoneusedinourpro-totype,itcouldbedeployedinavolumesimilartoamodernlaptop,andwithlikelyfuturehard-wareadvances(e.g.,improvedpico-projectorsandsmallerdepthcameras)itmaybepossibleto

RobertXiaoProposal–March29,2017

22

implementitinatrulymobileform.Inaddition,oursystemprovidesanextensiblesetofabstrac-tionswhichmakeiteasyandconvenienttoprogramsimpleinterfaceswhilestillsupportingex-plorationofnewinteractiontechniquesinthisdomain.

Inthenextsection,wewillconsiderhowusersmightmakeuseofthesecreatedinterfaces.Wethenturntoimplementationdetails,discussingthehardwareused,sensingtechniquesandotherlowleveldetails.Wewillthenconsiderhowthesebasiccapabilitiescanbebroughttogethertoprovideconvenientabstractionsforpaintanywhereinteractiveobjects,whichmakethemverysimilartoprogrammingofconventionalGUIinterfaces.Wethenconsideraspectsofthesoftwareabstractionsthatareuniquetothisdomainanddescribeaninitiallibraryofinteractorobjectsprovidedwithoursystem.Severalexampleapplicationswebuiltatopthis libraryarealsode-scribed.Weconcludewithareviewofrelatedwork,notingthatwhileprevioussystemshaveconsideredmanyoftheindividualtechnicalcapabilitiesbuiltintooursysteminoneformoran-other,theWorldKitsystembreaksnewgroundinbringingthesetogetherinahighlyaccessibleform.Thesystemenablesbotheasyandfamiliarprogrammaticaccesstoadvancedcapabilities,aswellasanewuserexperiencewithdynamicinstantiationofinterfaceswhenandwheretheyareneeded.

4.2 InteractionAcoreobjectiveofoursystemistomakeitsimpleforuserstodefineapplicationsquicklyandeasily,suchthattheycouldfeasiblycustomizeanapplicationeachtimetheyusedit.Thedefaultinteractionparadigmprovidedbyoursystemallowsusersto“paint”interactiveelementsontotheenvironmentwiththeirhands(Figure4.1andFigure4.2).Applicationsbuiltuponthissystemarecomposedofoneormorein-the-worldinteractors,whichcanbecombinedtocreateinterac-tiveapplications.

Bydefault,whenaninterfaceistobedeployedorredeployed,alistofinteractortypesandac-companyingcallbackobjectsisprovided–oneforeachelementoftheinterface.Theapplicationinstantiates each interactor using a specified instantiationmethod, defaulting to user-drivenpaintedinstantiations.Fortheseelements,thesystemindicatestotheuserwhatinteractoristobe“painted”.Theuserthenrunshisorherhandoveradesiredsurface(Figure4.1AandFigure4.2A,B).Livegraphicalfeedbackreflectingthecurrentselectionisprojecteddirectlyontheenvi-ronment.Whensatisfied,theuserliftstheirhand.Thesystemautomaticallyselectsanorienta-tionfortheinteractor,whichcanoptionallybeadjustedbydraggingahandaroundtheperipheryoftheselectedarea,asshowninFigure4.2D.Thiscompletesthesetupforasingleinteractor.Theuserthenpaintsthenextinterfaceelement,andsoon.

RobertXiaoProposal–March29,2017

23

Figure 4.2. A user defines an interactor in WorldKit.

This sequence shows how a user can define the location, size and orientation of an interactor. First, the user starts painting an objects area (A, B), defining its location and size. Objects have a default orientation (C), which can be reoriented by dragging along the periphery (D). Finally, the interactor is instantiated and can be used (E).

Onceallelementshavebeeninstantiated,theinterfacestartsandcanbeusedimmediately.Theentirecreationprocesscanoccurveryquickly.Forexample,thelivingroomapplicationsequencedepictedinFigure4.1canbecomfortablycompletedwithin30seconds.Importantly,thisprocessneednotoccureverytime–interactorplacementscanbesavedbyapplicationsandreusedthenexttimetheyarelaunched.

Thisapproachoffersanunprecedentedlevelofpersonalizationandresponsivenesstodifferentusecontexts.Forexample,atypicallivingroomhasmultipleseatinglocations.Withoursystem,ausersittingdowncouldinstantiateacustomtelevisioninterfaceusingsurfacesintheirimme-diatevicinity.Ifcertaincontrolsaremorelikelytobeusedthanothers(e.g.,channelswitching),thesecanbeplacedclosertotheuserand/ormade larger.Other functionscouldbeomittedentirely.Moreover,userscouldlayoutfunctionalitytomatchtheirergonomicstate.Forexample,if lyingonasofa,thearmrests,skirtorbackcushionscouldbeusedbecausetheyarewithinreach.

4.2.1 TriggeringInterfacesandInterfaceDesignTriggeringtheinstantiationofaninterface,includingthedesignthereofcanbeachievedseveralways.Oneoptionisforthesystemtobespeechactive.Forexample,theusercouldsay“activateDVR”tobringupthelastdesignedinterfaceor“designDVR”tocustomanewone.Alternatively,afreespacegesturecouldbeused,forexample,ahandwave.Asmartphonecouldalsotriggerinterfacesandinterfacedesign,allowingforfinegrainselectionoffunctionalitytohappenonthetouchscreen,anddesigntohappenontheenvironment.Finally,aspecial (visibleor invisible)environmental“button”couldtriggerfunctions.

4.3 SystemImplementation

4.3.1 HardwareandSoftwareBasicsOursystemconsistsofacomputerconnectedtoaMicrosoftKinectdepthcameramountedontopofaMitsubishiEX320U-STshort-throwprojector(Figure4.3).TheKinectprovidesa320x240pixeldepthimageanda640x480RGBimage,bothat30FPS.Itcansensedepthwithinarangeof50cmto500cmwitharelativeerrorofapproximately0.5%[Khoshelham2012].Ourshort-

RobertXiaoProposal–March29,2017

24

throwprojectorhasapproximatelythesamefield-of-viewasthedepthcamera,allowingthetwounitstobeplacedinthesamelocationwithoutproducingsignificantblindspots.AsshowninKinectFusion[Izadi2011],thedepthscenecanberefinedoversuccessiveframes,yieldingsupe-rioraccuracy.

The softwarecontrolling the system isprogrammed in Javausing theProcessing library [Pro-cessing].Itrunsone.g.,aMacBookProlaptopwitha2GHzIntelCorei7processorand4GBofRAM.Thesystemrunsataround30FPS,whichistheframerateofthedepthcamera.

Figure 4.3. A short throw projector with mounted Kinect.

4.3.2 One-TimeProjector/DepthCameraCalibrationWecalibratethejoinedcamera-projectorpairusingacalibrationtargetconsistingofthreemu-tuallyperpendicularsquaresoffoamcore,50cmonaside,joinedatacommonvertex.Thesevennon-coplanarcornersofthistargetaremorethansufficienttoestablishthenecessaryprojectivetransformbetweenthecameraandprojector,andtheextradegreesoffreedomtheyprovideareusedtoimproveaccuracyviaasimpleleast-squaresregressionfit.

Aslongasthedepthcameraremainsrigidlyfastenedtotheprojector,thecalibrationaboveonlyneedstobeperformedonce(i.e.,atthefactory).Thesetupcanthenbetransportedandinstalledanywhere–thedepthsensor isusedtoautomatically learnaboutnewenvironmentswithoutrequiringexplicitstepstomeasureor(re-)calibrateinanewspace.Iftheenvironmentchangestemporarilyorpermanentlyafterinterfaceshavebeendefinedbyauser(e.g.,asurfacebeing

RobertXiaoProposal–March29,2017

25

projectedonismoved),itmaybenecessarytore-defineaffectedinterfaces.However,ourinter-activeapproachtointerfaceinstantiationmakesthisprocessextremelylightweightforevennov-iceusers.

Figure 4.4. WorldKit touch event processing.

User touches an interactor placed on a surface (A - view from Kinect). The depth differences from the background are computed (B - green, no significant difference; dark green, differences over 50mm; blue, candidate touch pixels). The candidate contact pixels (red) are masked by the interactor’s depth-image mask (white) (C). The contact pixels are trans-formed into the interactor’s local coordinate system, providing an orthographic view (D). For output, interactor graphics are warped into the projector’s image space (E), so that they appear correctly on a surface (F).

4.3.3 BasicContactSensingOursystemreliesonsurfacecontactsensingfortwodistinctpurposes.First,whencreatingin-terfaces,touchesareusedtodefineinteractor location,scaleandorientationontheenviron-ment.Thisrequiresglobaltouchsensing.Second,manyinteractortypes(e.g.,binarycontactin-puts),aredrivenbysurfacecontact(i.e.,touchorobjectcontactorpresence)data.Toachievethis,wemasktheglobalscenewitheachinteractor’sbounds;datafromthisregionaloneisthenpassedtotheinteractorforprocessing.Insomecases(e.g.,countinginteractor),additionalcom-putervisionoperationsarecompletedinternally(e.g.,connectedcomponentsforblobdetection).

Toachievethehighestqualitysensingpossible,weemployseveralstrategiestofilterthedepthimage.First,whenthesystemstartsup,wecapture50consecutivedepthframesandaveragethemtoproduceabackgroundprofile.Notethatthisimpliesthatthescenemustbestationaryandina“background”configurationwhenthesystemisinitialized.Itisalsopossibletoautomat-icallyaccumulateabackgroundimageovera longerperiodoftimetoprovidesomeabilitytohandledynamicreconfigurations,e.g.,movedfurniture,butoursystemdoesnotcurrentlydothis.Within the background image, the standard deviation at each pixel location across the

RobertXiaoProposal–March29,2017

26

framesisusedasanoiseprofile.Subsequently,eachobserveddepthvalueisdividedbythecom-putedbaselinedeviationat thatpixel,andvaluesthataregreaterthan3standarddeviationsfromthemeanareconsideredsignificant(Figure4.4B).Significantpixelswhichdifferfromthebackgroundsurfacebyatleast3mmandatmost50mmareconsideredcandidatecontactpixels.Wethenperformblobdetectionacrossthesecandidatesusingaconnected-componentsalgo-rithmtofurthereliminateerroneouspixelsarisingfromnoiseinthedepthsensor.

Thisprocessyieldsanumberofcontactblobimagesinthedepthcamera’scoordinatespaceovereachinteractor(Figure4.4C,red).Eachimageisprojectivelytransformedintothelocalcoordi-natesystemofthecorrespondinginteractor(Figure4.4D,red).Fromthere,theblobsarepassedtothecorresponding interactor for typespecific interpretation.For instance, thecounting in-teractorfromthesystemlibrarysimplyusesthenumberofblobsintersectingthatinteractor,themultitouch interactorextracts theX-Y locationsofeachblob,and theareacontact interactordeterminesthetotalnumberofpixelsacrossallblobswithinthatinteractor.Customtypesex-tendedfromthelibraryclassesarefreetoperformadditionalprocessingforspecialpurposes.

4.3.4 SoftwareStructuresTheWorldKitsystemprovidesasetofprogrammingabstractionsthataimbothtomakeitverysimpletocreatesimpletomoderatelycomplexinterfacesandtoallowcustominteractiontech-niquestobequicklycreatedandexploredinthisnewdomain.Manyaspectsofthesystemstruc-turearedesignedtobeascloseaspossibletotheabstractionsnowprovidedinnearlyallcon-ventionalGUIinterfacetoolkits.

Forexample,interfacesareconstructedastreesofobjects,whichinheritfromabaseinteractorclass(whichhasbeencalledacomponent,orwidgetinvariousothersystems).Thatclassestab-lishesthecentralabstractionforthesystemanddefinesanAPI(anddefaultimplementations)foravarietyofinterfacetaskssuchas:hierarchy(parent/child)management,event-orientedin-puthandling,damagetracking,layout,interface(re)drawing,etc.Sinceourgoalistostayasclosetoexistingabstractionsaswecan,weexpectthatmanyaspectsofthesystemwillalreadybefamiliartodevelopers.SeeFigure4.7foracompletesampleapplication.

Tocreateanewinteractortype(primitive),thedeveloperextendsthebaseinteractorclass(oranexistinginteractorclasswithsimilarfunctionality),addingneweventsources,drawingcom-mandsandinteractionlogic.Thisisfunctionallysimilartohowdeveloperswouldcreatenewin-teractorsine.g.,JavaSwing.

Inthefollowingsectionsweonlyconsidertheaspectsofthesystemthataredifferentfromtyp-icalsystems(e.g.,interactorinstantiationbyendusers)orrequirespecialtreatmentinsideoursystemtomakethemappearordinary(e.g.,rectificationbetween2Ddrawingandinputspacesandsurfacesinthe3Dworld).

4.3.5 InstantiatingInteractorsOnemajordifferencebetweenWorldKitabstractionsandtypicalGUItoolkitsisinhowinteractorsare instantiated. Inconventionalsystems,thedetailsof instantiationaretypicallydetermined

RobertXiaoProposal–March29,2017

27

simplybytheparameterstotheconstructorforaninteractor(whichmaycomefromaseparatespecificationsuchasanXMLdocument,and/orareoriginallydeterminedwithavisual layouteditor).

Incontrast,inWorldKitweprovidethreeoptionsforinteractorinstantiation:painted,linked,andremembered.Bydefault, interactorsusepainted instantiation–allowingtheusertoestablishtheirkeypropertiesby“painting”themontheworldasdescribedbelow.Forthebaseinteractorclass,keypropertiesincludesize,position,andorientation,butthismaybedefineddifferentlyinspecializedsubclasses.Alternately,thedevelopermayaskforlinkedinstantiation.Inthatcase,asmallbitofcodeisprovidedtoderivethekeypropertiesfortheinteractorfromanotherinstan-tiatedinteractor.Thisallows,forexample,onekeyinteractortobepaintedbytheuser,andthenarelatedgroupofcomponentstobeautomaticallyplacedinrelationtoit.Finally,rememberedinstantiationcanbeperformedusingstoreddata.Thisdatacancomefromtheprogram(makingitequivalenttoconventionalinteractorinstantiation)orfromadatastructuresavedfromapre-viousinstantiationofthesameinterface.Thisallows,forexample,aninterfaceelementtobeplaced“wheretheuserlastleftit”.

Forpaintedinstantiations,usersdefineinteractorsize,locationandinitialorientationbyusingahandpaintinggestureoverthesurfacewheretheywishittoappear(Figure4.2A).Duringthisprocessanareafortheinteractorisaccumulated.Ateachstepthelargestcontactblobovertheentiredepthimageisconsidered.Ifthisblobislargerthanapresetthreshold,theblobisaddedtoamaskfortheinteractor.Ifnoblobislargerthanthethreshold,wedeterminethattheusermusthaveliftedtheirhandfromthesurface,andtheaccumulatedmaskissavedastheuser’spaintingselection.Notethatthismaskisdefinedoverthedepthimage(i.e.,indepthimageco-ordinates).We then take the (x,y,depth)points in thedepth image indexedby themaskandtransformthemintoaworld-spacepointcloud.AveragingthesurfacenormalsoverthispointcloudproducestheZ-axisoftheplanarregiontobeassociatedwiththeinteractor.TheX-andY-axeslieinthisplane,andtheirdirectioniscontrolledbytheinteractor’sorientation.

TheinitialorientationalignstheY-axiswiththeY-axisofthedepthimage,whichroughlycorre-spondstothedirectionofgravityifthedepthsensorismountedhorizontally.Asmentionedpre-viously,theusercanadjusttheorientationbytouchingtheinteractor(Figure4.2D).

4.3.6 GeometryRectificationforInputandOutputToprovideaconvenientAPIfordrawingandinputinterpretation,thegeometryofeachinterac-torinoursystemisestablishedintermsofaplanarregionin3Dspace,derivedasindicatedabove.BasedontheX-,Y-andZ-axesoftheinteractorinthedepthcamera/projectorcoordinatesystem,wederivearectificationmatrixmappingdepthimagecoordinatesintoalocalcoordinatesystemfortheinteractor.Thislocalcoordinatesystemallowsthedevelopertothinkaboutinteractiondrawingandinputinsimple2Dorsurfaceterms.Full3Dinformationisavailableforusebyad-vancedinteractorclassesifdesired.

Forinputprocessing,theunderlyingdepthandRGBimagesareupdated30timespersecond.Foreachupdateweperformcontactblobextractionasoutlinedearlier.Foreachinteractor,wethenintersectboththecontactblobpointcloudandthefulldepthimagewiththeinteractor’sdepth-

RobertXiaoProposal–March29,2017

28

imagemask(Figure4.4C,white).ThisproducesrawdepthandRGBimagesaswellascontactareaslimitedtotheregionovertheinteractor.Therectificationmatrixisthenappliedtoindivid-ualinteractordepth,RGB,andcontactimagestoproducerectifiedimages(Figure4.4D),i.e.im-agesrepresentedintheinteractor’slocal2Dcoordinatesystem.Inarectifiedimage,onepixelwidthcorrespondstoaknownunitofdistanceontherealworld.Finally,contactareasarefurtherprocessedtoproducesimplifiedtouchevents.Allofthis informationisthenpassedtothein-teractor(s)concerned.

Figure 4.5. Sample interactor classes supported by WorldKit.

Our system provides a library of interactor classes which can be extended to perform many tasks. These including a binary contact interactor (A), percentage contact interactor (B), multitouch surface (C), object counting interactor (D), a linear axis interactor (E), as well as a simple output-only interactor (F). See also Table 4.1.

Atthispoint,theinteractormayperformadditionalspecializedprocessingdependingontheirtype.For instance,abrightness interactorfromour librarywillcalculate itssensedbrightnessvaluebasedontherectifiedRGBimage,acontactinteractorwillupdateitstouchstateandfirepressed/releasedeventsifapplicable,andamultitouchinteractorwillactbasedonthecontactblobsvisibleinitsrectifiedimage.

Eachinteractormayalsoproduceoutputinordertoindicateitscurrentstateandprovidefeed-backduringinteraction.Tofacilitateeasydrawing,thesystemprovidesaconventionaltwo-di-mensionaldrawingcontext(aPGraphicsobjectwithintheProcessingsystem)whichispassedto

RobertXiaoProposal–March29,2017

29

an interactor’sdraw()methodasneeded.Thisdrawingcontextobject is transformedso thatinteractordrawingisspecifiedinrealworldunits(e.g.,millimeters)andorientedtocorrespondtotheinteractor’sreal-worldorientation(e.g.,alignedwithitsderivedplanarcoordinatesystemasdescribedabove).Thesystemtakesdrawcommandsonthesegraphicssurfacesandautomat-icallytransformsthemintotheprojector’simagespacefordisplay(Figure4.4E).Thus,whenpro-jected,interfacesrendercorrectlyonsurfacesregardlessofprojectorperspective,honoringin-terfacelayoutanddimensions(Figure4.4F).Finally,becauseweareprojectingimageryontoreal-worldobjects,headtrackingisnotrequired.

Table 4.1. WorldKit input-oriented interactor types.

4.3.7 InteractorLibraryAsapartofoursystem,wecreatedaninitialinteractorlibrarytosupportvariouscapabilitiesoftheplatform(Figure4.5).Partofthislibraryisasetofreusableinput-orientedbaseclasses(listedinTable4.1).Fromthesebaseclasses,wederivedasetoftraditionalUIelementsfeaturingbothinputandoutput,suchasbuttonsandsliders.

Asexamples:thebinarycontactinteractordetectseventsonasurfacebyexaminingthesetofcontactblobsreportedtoit.Ifthetotalpixelcountfortheseexceedsasmallthreshold(tofilteroutnoise),theinteractordetectsacontact/touch.Theareacontactinteractoradditionallypro-videstheproportionofdepthvaluesthatareconsideredtobeincontactrange,allowingittomeasurethecontactedarea.Thepresenceinteractordetectswhetherabackgroundobjectisstillpresentinitsoriginalconfiguration.Forexample,thiscanbeusedtosenseifadoorhasbeenopenedorascreenhasbeenretracted.

Acounting interactorcountsand reports thenumberofdistinct contactblobson its surface.Linearaxisinteractorsdetectthepositionofatouchalongoneaxis,whichcanbeusedtoimple-mentavarietyofslidingcontrols,andmultitouchinteractorsreporttheXandYpositionsofeach

Interactor Type Type of Associated Value Binary contact True or False Area contact Percentage of coverage Presence True or False Contact counting Number of items (contact blobs) Linear axis touch Centroid of touch (1D along axis) Two axis touch X/Y centroid of touch Radial input touch Angle to centroid of touch Multitouch input X/Y centroid of multiple touches Brightness Average brightness of surface Color Average color of surface

RobertXiaoProposal–March29,2017

30

individualblob.BrightnessinteractorsusetheRGBcamerafeedtodetectchangesinthebright-nessofthesensedarea.Similarly,color-sensinginteractorsmeasuretheaveragecolor.

Asinmostsystems,interactortypesinoursystemareorganizedintoaclasshierarchyandmaybeextendedfromclassesinthelibrarybyoverridingmethodsassociatedwithvariousinteractivetasks.Forexample,advancedusersmightoverrideanappropriateclasstoperformnewormoreadvancedinputprocessingsuchashandcontouranalysisforuseridentification[Schmidt2010]orrecognitionofshoes[Augsten2010].

4.4 ExampleApplicationsToillustratetheutilityandcapabilityofoursystem,wedescribeseveralexampleapplicationsbuiltusingtheaccompanyinglibrary.PleaseseetheaccompanyingVideoFigureforademonstra-tionofeach.

4.4.1 LivingRoomThetelevisionremotecontrolisfrequentlymisplacedorlostinatypicalhome,leadingtomuchconsternation.WithWorldKit,anysurfacecanhostthecontrols.Thisapplicationinstantiatesalinearinteractortoadjusttheroom’sbrightness,aradialinteractortoadjusttheTVvolumeandaDigitalVideoRecorder(DVR)interfacetoselectashowofinterest(Figure4.1).Additionally,byaddingapresenceinteractortothesofa,wecanevendetermineiftheuserissittingandshoworhidetheinterfaceasneeded.

4.4.2 OfficeDoorAclosedofficedooroftengivesnohintsabouttheinterruptiblestateoftheoccupant.AsimpleapplicationoftheWorldKitsystemallowstheoccupanttoconveytheirstatusquicklywhenthedoorisclosed.Ontheinsideoftheoffice,alargepresenceinteractorisdrawnonthecloseddoor,andanumberofsmallerstatusbuttonsaredrawntotheside(Figure4.6).Whenthedoorisopen,thestatusbuttonsarehidden,andtheexteriorindicatorshowsnothing.Withthedoorclosed,thestatusbuttonsappear.Theexteriorindicatorreflectsthechosenstatusbutton;thus,forin-stance,selectingthe“InMeeting”statusmightcause“I’minameeting;pleasedonotdisturb”toappearontheoutside.

RobertXiaoProposal–March29,2017

31

4.4.3 OfficeDesk

Figure 4.6. A user sets up a simple office status application on his door.

This sequence illustrates a user setting up a simple notification message application on an instrumented office (A). First, a user “paints” a presence interactor on an office door (B), which can detect if the door is open or closed. The user then paints three contact interactors, labeled “in meeting”, “working” and “just knock”, onto the wall adjacent to the door (C, D and E). When the door is open, the application is invisible (F). When the door is closed, three buttons appear, which are user selectable (G).

Thisapplicationusesatriggeringcontactinteractor(inthiscasepositionedoverakeyboard)toactivateacalendardisplay(whichitselfusesalinearinteractorforscrolling)anda2Dpositioninteractorforasimplewhiteboard(Figure4.8).Whentheuserplaceshishandsonthekeyboard,thecalendarappears.Theusermaythenscrollthedisplaythroughthecalendardaybysimplydraggingupanddown.Removingthehandsfromthekeyboardcausesthecalendartodisappear.Userscanalsodrawonthewhiteboard,whichisalwaysvisibleregardlessofthetriggerstate.

4.4.4 KitchenInthekitchen,itoftenbecomesachoretokeeptrackofalltheingredientsneededforacomplexrecipe.Toaddress this,wecreateda simple recipehelper interface.Theapplicationpromptsuserstoselectasuitably-sizedflatsurface(e.g.kitchencounter)topreparetheiringredients.Theuserselectsthedesiredrecipe,andtheapplicationautomatically laysoutasetof interactorswithinthatflatsurfacetoholdeachingredient(Figure4.9).

RobertXiaoProposal–March29,2017

32

Interactorsarecustomizedforeach ingredienttomeasuretheamountorpresenceofthere-questedingredient.Forinstance,iftherecipecallsforasmallnumberofcountableitems(e.g.,eggs,wholeonions),acounting interactorcanbeused. Ingredients thatcannotbemeasuredeasilyintheframeworkcanbereplacedbycontactinteractors,whichsimplyrecordthepresenceorabsenceoftheingredient.

Theflexibilityofoursystemenablestheinterfacetobequicklyreconfiguredtosuitdifferentsetsofingredientsandquantities,withoutanycumbersomecalibrationorinstrumentation.

Figure 4.7. Example code for a single button WorldKit application.

The application depicted in Figure 4.6 consists of three buttons.

import worldkit.Application; import worldkit.interactors.Button; import worldkit.interactors.ContactInput.ContactEventArgs; import worldkit.util.EventListener; public class OneButtonApp extends Application { Button button; public void init() { button = new Button(this); button.contactDownEvent.add( new EventListener<ContactEventArgs>() { @Override public void handleEvent(Object sender, ContactEventArgs args) { System.err.println("Got a button event!"); } }); button.paintedInstantiation("OneButton"); } /* Boilerplate */ public static void main(String[] args) { new OneButtonApp().run(); } }

RobertXiaoProposal–March29,2017

33

Figure 4.8. Simple WorldKit office application.

The whiteboard behind the user is an interactor. The calendar to the right of the user is visible only when the user’s hands are on the keyboard.

RobertXiaoProposal–March29,2017

34

Figure 4.9. WorldKit Kitchen application.

Various interactor types are composed into an ingredient management interface.

4.5 LimitationsAsitiscurrentlyimplemented,WorldKithastwonotabledrawbacks:theresolutionofsensingandgraphicscanbelowerthanoptimal,andtheusermayoccludethedepthsensorand/orpro-jectorincertainconfigurations.

Inthecurrentsystem,theprojectordisplaysimageryataresolutionof1024x768overapoten-tiallywidearea.Incaseswherethisareaislarge,thisresultsinalossofvisualdetail.Wefeelthatthisisnotaninherentflawintheapproach,butratheratechnologicallimitationthatwillimprovewithtimeasprojectorsdevelopincreasedresolution.SimilarlytheKinectdepthcameraisalsolimited inspatial, temporalanddepthresolution.However, futuredepthcameraspromise toovercomethese limitations.Finally,wenotethatforthe interactionspresented inthispaper,lackofresolutiondidnotsignificantlyimpedetheusabilityoftheresultingapplications.

Usersmayoccludetheprojectorand/ordepthcameraduringnormaloperation;thisisfunda-mentallyalimitationofusingasingleprojectorandcamerasetup.Inoursystem,weavoiduser

RobertXiaoProposal–March29,2017

35

confusionbypositioningtheKinectontopoftheprojector(i.e.,verycloselyaligningeffectiveviewanddisplayfrustums).Thisensuresthatusersreceivefeedbackintheformoftheirownshadowiftheyoccludetheviewofthecamera.

4.6 DiscussionAsindicatedabove,manyoftheunderlyingtechnicalcomponentswebringtogetherinoursys-temhavebeenconsideredinpriorwork,andinsomecasesmorethanonetechnicalapproachhasbeenofferedovertime.Specifically,wewereinitiallyinspiredbythevisionsputforwardintheIntelligentRoom,LuminousRoom,andEverywhereDisplaysprojects.Thebenefitsandgoalsofhavinginteractivityeverywherewereclearlyarticulatedintheseearlyworks.However,theinstantiationandmodificationofinteractivefeatureswasabsentorlimited.Ingeneral,exampleapplicationswerecustombuiltbytheauthors,carefullyconfiguredandcalibrated,andlargelyinflexible.

Eyepatch,Slit-TearVisualizations andLightWidgetsput forwardelegantapproaches toallowend-userstoquicklydefinesimpleinterfacesonavideostreamoftheenvironment.Weextendthisideatodefininginterfacesinsitu–directlyontheenvironment,withouttheneedforacon-ventionalcomputer.Furthermore,weexpandthesuiteofcontrols,allowingforricherinterac-tions,includingmultitouchinputandnon-humantriggerslikedoorsclosing.

Moreover, our systemprojects coordinated graphical feedback onto user-defined areas. ThisbuildsonandprovidesreusableabstractionsforthetechnicalapproachespresentedinOfficeoftheFuture,LightSpace,andOmniTouch.Oursystemtakes intoaccountthegeometryofuser-definedsurfaces,providingnotonlyrectifiedprojectedoutput(sographicsappearcorrectlyforallusers),butalsoorthonormalprocessingofinputdata(providinghigheraccuracyandregularcoordinatesystems).

Onlywithallofthesefeaturesinplace(andwiththefunctionalityandlowcostofthemostrecenthardwareadvances)couldwebegintothinkaboutmechanismsthataresuitableforenduserstodefineinteractivefunctionsoneverydaysurfacesinahighlydynamicfashion.Thishasallowedustoproducearobust,extensible,andusablesystemthatmakesinterfacesaccessiblewhereeverandwhenevertheyareneededbyanend-user.

4.7 FutureworkAlthoughoursystemprovidesaverygeneralsetofinteractivecapabilities,thereareseveralar-easforfuturework.First,wehaveyettofullyexplorethedesignspaceofinteractorsresidingonreal-worldsurfaces.Basedonawiderexplorationofthisspaceenabledbyourinitialextensibletool,weanticipatethatamoresubstantialandcomplete libraryof interactors (builtwiththeexistingextensionmechanism)willincreasethevarietyofapplicationsthatcouldbesupported.

Anotherareaofpotentialfutureworkinvolvestheexpansionfromsurfaceinteractionintofreespaceinteraction.Inthisareathereareinterestingchallengesindetermininghowinputspacesmightbedelimitedandhowend-usersmightquicklyandeasilyinstantiateinteractors.Inaddition,

RobertXiaoProposal–March29,2017

36

therearesignificantinteractiontechniquedesignchallengesforthistypeofinteraction.Forex-ample,itisasyetunclearwhatabasesetofinteractiontypesforthisspacemightinclude.Fur-thermore,therearebasicchallengessuchashowtoprovidefeedbackwithoutanobvioussurfacetoprojectcoordinatedgraphicalfeedbackonto.

As theunderlyinghardware forbothprojectionanddepth sensing improves, additional chal-lengesandopportunitiesmayariseinimprovedfiltering,detection,recognition,anddisplay.Forexample,withhigherresolutiondepthcameras,itmaybepossibletoincludedetailedfingerges-turesasapartofinteraction,butrecognitionofthesewillbechallenging.Inaddition,recognizingclassesofeverydayobjectscouldintroducesubstantialnewcapabilitiesintothistypeofsystem.

Finally,anadditionalareaforfutureresearchliesintheexpansionofthesetechniquestoothermodalities.Forexample,theuseofaudioinputandoutputinconjunctionwithtouchofferspo-tentialbenefits.

4.8 ConclusionI describedmyWorldKit system,which allows interactive applications to flourish on the realworld.Thissystemprovidesaconvenientandfamiliarsetofprogrammingabstractionsthatmakethisnewdomainaccessiblefordevelopmentandexperimentation.Further,itsupportsapplica-tionsthatcanbereadilyinstantiatedandcustomizedbyuserswithunprecedentedease.Addi-tionally,thisapproachovercomesmanychallengesinherentinsensingontheenvironmentandwithlow-resolutiondepth-sensing.Asdiscussedinourfuturework,forthcomingimprovementsindepthcameraswillonedayenabletouchscreen-qualityinteractionontheworldaroundus.Thesystemalsoprojectscoordinatedgraphicalfeedback,allowingcollectionsofinteractorstolargelyoperateasiftheyweresophisticatedbutconventionalGUIapplications.

RobertXiaoProposal–March29,2017

37

CHAPTER5. ENABLINGRESPONSIVEON-WORLDINTER-FACES

5.1 IntroductionDigitallyaugmenteddesksareaparticularlyinterestingimplementationofworld-scaleinterfaces,andhavebeenexploredoftenintheliterature.Seminalworkemergedintheearly90’s,mostnotablyXeroxPARC’sDigitalDesk[Newman1992,Wellner1993,Wellner1991].Sincethen,doz-ensofsystemshavebeenproposedandbuilt,demonstratingsuperpositionofcontentontophys-icalartifacts[Robertson1999,Wellner1993],theuseofphysicalobjectsfortangibleinteraction[Fails2002,Underkoffler1998],insituremotecollaboration[Junuzovic2012,Underkoffler1999,Wellner1993],and,moregenerally,interactiveapplicationsonthedesksurface[Kane2009,Kim2014,Pinhanez2001,Xiao2013].

However,anotablecommonalityofthesefuturisticsystemsistheminimalistnatureofthedesksurfacesused–oftenlackingkeyboards,mice,mugs,papers,knickknacks,andothercontempo-raryandcommonplaceitems(Figure5.1).Today’sdesksurfacesplayhosttoawidevarietyofitemsofvaryingshapesandsizes.Moreover,theseobjectsrarelyconformtoagridorevencom-monorientation.Desksarealsoconstantlyinflux,withitemsmoving,stacking,appearinganddisappearing.Exampleeventsincludeslidingalaptopoutthewaytomakeroomfornewwork,orrestingafreshcupofcoffeeontotheworksurface.

Ifdigitaldesksdonotaccountforthesebasicphysicalactions,applicationscanbecomebrittle(e.g.,doesamugplacedontopofavirtualkeyboardinjectspurioustouchinput?)andinaccessi-ble(ifabookisplacedoveraninterface,howdoesoneretrieveit?).Further,becausephysicalobjectscannotmoveorchangesizeontheirown,theburdenofresponsivenessfallstothedigitalelements.Thus,digitalapplicationsmustemployavarietyofstrategiestosuccessfullycohabitaworksurfacewithphysicalartifacts.

Tohelpclosethisgap,weconductedanelicitationstudywithtenparticipantsattheirpersonaldeskstounderstandhowapplicationscouldrespondtodifferentevents.Wethenderivedalistoftenfundamentalinteractivebehaviorsthatdesk-boundvirtualapplicationsshouldexhibittoberesponsive.Todemonstratethesebehaviorscanbeachievedpracticallyandinrealtime,webuiltaproof-of-conceptsystemwiththenecessarytechnicaladvancestosupporteachbehavior.Thissystemhadtomovebeyondpriorworkinseveralkeyways;forexample,oursystemrequiresnocalibrationtotheworld,allowingthedeskscenetobe influx(i.e., there isnonotionofa“background”).Further,ourtouchtrackingapproachdistinguisheshuman“objects”(arms,hands,fingers)fromotherobjects.Thisabilityiscriticalforresponsiveinterfaces,whichmustrespondtousermovementandinputdifferentlyfromchangestothephysicalenvironment(e.g.,inter-facesshouldevadefromyourcoffeemug,butnotfromyourhands).

RobertXiaoProposal–March29,2017

38

Figure 5.1. Various digitally augmented desks from the academic literature.

Clockwise from top left: Digital Desk [Wellner 1993], Hardy [Hardy 2012], LuminAR [Linder 2010], IllumiShare [Ju-nuzovic 2012], AR Lamp [Kim 2014], Everywhere Displays [Pinhanez 2001], Enhanced Desk [Koike 2001], Bonfire [Kane 2009], I/O Bulb [Underkoffler 1999]. Note the general lack of items in the interactive area. The few physical objects that do have digital interactivity are either tagged (e.g., fiducial markers), are special tangibles, or require a custom interactive table (e.g., FTIR).

5.2 ElicitationStudyTohelpidentifyusefulinteractivestrategies,werecruitedtenparticipants(threefemale,meanage31)foraone-hourstudy.Thisstudywasconductedatparticipants’desks(seeFigure5.2forsomeexamples)forecologicalvalidity.Commondesk-bounditemsincludedcomputers,moni-torsandrelatedaccessories;deskphones;stacksandfilesofpapers;carrieditemssuchaswallets,phonesandkeys;personalitemssuchasmemorabilia,photographsandgifts;coffeemugs;andbooks.

RobertXiaoProposal–March29,2017

39

Figure 5.2. Example real-world desks of our participants.

Westartedwithageneralinterview.Participantsrepeatedlyarticulatedthatfrequently-accesseditemsgravitatedtothefrontandcenter,whileotheritemsmovedtotheperiphery.Ingeneral,however,theentiredesksurfacewasinflux,withperhapsonlyacomputermonitorandafewperipheralitems(e.g.,pictureframes)beingstableonthetimescaleofmonths.Althoughoursamplesizewassmall,itreinforcedourassumption(andfindingsfrompriorstudies)thatdeskstendedtobeclutteredanddynamic.

RobertXiaoProposal–March29,2017

40

Figure 5.3. Sample arrangement of the paper prototypes in the elicitation study.

a: The map is placed to the side, for reference but not frequent use. b: The calendar is placed nearby so it can be glanced at. c: The music player is snapped to the keyboard for quick access. d: The number keypad extends the keyboard.

Next–and theprimarypurposeof thestudy–was toelicit interactivebehaviors thatdigitalapplicationsshouldsupport.Westructuredthiselicitationstudy[Nielsen2004]aroundathink-aloudexercisewithpaperprototypes.Specifically,participantsweregivenpapercutoutsoffourcommonapplications–calendar,map,musicplayer,andnumberkeypad–andaskedtoimaginethemasthoughtheywerefullyinteractive(Figure5.3).

Participantsthenplacedthe(paper)applicationsontheirworksurfaceastheysawfit,thinkingaloudabouttheir reasoning forchoosingparticular locations.Wealsopromptedthemwithaseriesofhypotheticalsituations,suchas“whatshouldhappentothisapplicationifyouputyourphonedownhere?”,“…pushthecuptotheleft?”,“…packyourlaptopinyourbag?”.Participantscouldmove,animate,foldorotherwisemanipulatethepaperinterfaces,orexplainverballywhatshouldoccurinresponse.Thefacilitatorrecordedcommentsforlateranalysisandaffinitydia-gramming.

5.3 DistillingInteractiveBehaviorsFollowingthestudy,wedistilledourwrittennotesandparticipantquotesintothree,broadfunc-tionalcategories:

5.3.1 ApplicationLifecycleSummoning:Participantsarticulatedavarietyofpotentialstrategiesforsummoningapplications,includingspecialgestures(e.g.“doubletappingthedesk”),persistentbuttons(“startbutton”,

RobertXiaoProposal–March29,2017

41

“dashboard”,“dock”),spokencommands,andusingaconventionalcomputingdevice(“dragginganinterfaceoffcomputerdesktopormobilephone”,“keyboardshortcut”).

Closing:Severalmethodsfordismissinginterfaceswereproposedbyparticipants:occludinganinterfacewithanobject(tomakeit“goaway”),shrinkinganinterfacesubstantially,movinganinterfacetoadedicated“trashcan”area,or invokingaspecialgesture(e.g.“scrunchingup”),throwing an object (“flinging” it), or, whimsically, miming a fireball-throwing gesture at it(“hadouken”).

Figure 5.4. Summoning an application.

a-c: The user taps twice with four fingers to bring up a launcher. d: The user moves to select the desired application. e: After lifting the fingers, the application is created.

5.3.2 LayoutControlRepositioning: Participants uniformly expected tobe able to reposition interfacesbyholdingtheirfingersorhandontheinterfaceanddraggingtheinterfacearound.

Reorienting:Objectsondeskstendedtonotconformtorectilineargrids;rather,objectsweretypicallyorientedaccordingtoaradialpatterncenteredontheuser.Weobservedthatusersgenerallyrotatedinterfacestofacethemselves,andtwousersfurtherwantedtheabilitytoro-tateinterfacestofacevisitors(e.g.toshowamaptosomeone).

Resizing:Mostparticipantsexpectedtobeableto“pinch”toresizeinterfaces,thoughfourpar-ticipantsnotedthatthepinchwouldbeambiguous(pinchingcontentvs.resizingwindow).Theseparticipantssuggestingresizingbyusingthecorners,akintodesktopapplications.Finally,someparticipantsnotedthattheywouldpreferunusedapplicationstobeshrunkandplacedoutoftheway,toberetrievedandexpandedondemand.

5.3.3 CohabitationBehaviors

Snapping:Participants typically placed thenumber keypadormusicplayer controls near thecomputerkeyboardormouse.Fiveparticipantsalsodescribedwantingto“snap”or“link”theseinterfacestothekeyboard,lettingitbehaveasthoughitwasanextensionofthekeyboard.

RobertXiaoProposal–March29,2017

42

Following:Further, fourparticipantsexpectedthatthesesnapped interfaceswouldautomati-cally“follow”themovementsoftheirassociatedphysicalobject,unlesstheinterfacewasman-uallyrepositioned.

Detaching:Participantsnotedthatinterfacesshould“unsnap”iftheywerecoveredupormanu-ally“pulledapart”or“tornoff”fromtheobjecttheyweresnappedto.

Evading:Whenaskedtodescribewhatinterfacesshoulddowhenoccluded,participantsweredivided.Sixparticipantsexpectedinterfacesto“popelsewhere”or“runaway”whenoccluded,withtwofurthersuggestingthatinterfacescouldshrinkdowntofittheremainingspace,andfourparticipantsfurtherexpectingtheinterfaceto“nudgeaway”or“adjust”inresponsetopartialocclusions.Conversely,fourparticipantsnotedthattheywouldsimplyexpecttheinterfacestomisbehaveorignoreinputifoccluded.

Collapsing: If theavailabledeskspaceshrunkuntil interfacescouldnot findsufficientsurfacearea,participantsexpectedthemtoshrinksubstantiallyordisappearentirely.

Thesetenverbsformthebasisoftheinteractiontechniqueswebelievearenecessarytosupportresponsiveinterfacesinmixedphysical-virtualdeskcontexts.Notably,whilesomeofourverbsaretakenfromcomputerdesktopwindowingoperations(resizing,repositioning,closing),severalaredesignedspecifically forcohabitationwithphysicalobjects (snapping, following,evading).Next,wedescribeourproof-of-conceptsystemthatprovidesalltenbehaviorsinrealtime,withnofiducialtags(orequivalent),onconventionaldesks,usingcommodityhardware.

5.4 InteractiveBehaviorImplementationsWebuiltadesktopprojectionandsensingsystemthatoffersexampleembodimentsofourteninteractivebehaviors,whichwenowdescribe.Thespecifictechnicalimplementationsofthesefeaturesaredescribedindetailinthenextsectionofthepaper.

Figure 5.5. Resizing and deleting interactions.

a: The user grabs the resize handle in the corner. b: Some interfaces can responsively alter their layout depending on their size; here, the calendar switches from week view to day view. c: Making the interface very small causes it to iconify. d: Shrinking the interface further will close it.

RobertXiaoProposal–March29,2017

43

5.4.1 ConventionalInteractionsTosupportconventionalmultitouchinteraction,wereserveone-andtwo-fingerinteractionforinteractingwiththeapplicationcontentitself.Userscanthereforeusefamiliartouchgestureswiththeapplicationcontent,suchasone-fingerswipingandscrolling,andtwo-fingerpinchingandrotating.Thus,allthetechniquesforapplication-levelmanipulationusethreeormorefingerstoavoidambiguity. Importantly, the interactivebehaviorswedescribearenotspecific to theparticularmanipulationschemeemployed.Forexample,onealternativeinputmethodcouldbetotreatdigitalitemsmorelikephysicalobjects,andusephysicalmetaphorsforinterfacemanip-ulation,suchaspushing,throwingandstacking(asseenine.g.,BumpTop[Agarawala2006]andShapeTouch[Cao2008]).

Figure 5.6. Moving and snapping interactions.

a-b: Three fingers on the interface activates dragging. c: Dragging the interface near an edge highlights the edge in orange. d: Upon releasing the interface, it snaps to the edge, which is highlighted in green.

5.4.2 SummoningAcentralfeatureofall interactivesystemsisthegeneralabilitytotriggeractions,asubsetofwhich is summoning applications. Conventional GUIs typically feature static application bars,docks,menus,launchers,orotherschemes,whichserveasareliableandreadilyaccessibleac-cesspoint.However,aphysicaldesktopmaynothavesufficientspace, letaloneauniversallyavailablesummoningpoint.Thus,wehadtoconsiderseveralmechanismsforinstantiatingappli-cations.

Several mechanisms were suggested during our study: special-purpose launcher buttons ortaskbars(akintothe“Start”buttononWindowscomputers);special-purposehandgestures(likeadouble-taponthesurface,oraspreadingofthefingers);orinstantiationviatransplantationfromthecomputeritself(dragginganinterfaceoffofthecomputerscreenandontothedesk).Ultimately,wedecidedtoimplementaspecial-purposehandgestureforsummoning,asitena-bledus to immediately specify the location foran interfacewithout requiringacomputerbepresent,andavoidedpermanentlydevotingpreciousdeskspaceforaspecial-purposebutton.Wechosetouseadouble,four-fingertapasthetriggeringgesture,asthisgestureisrarelycasu-allyoraccidentallyinvoked.

Executingthetriggeringgesture(tappingtwicewithfourfingersonanunoccupiedspaceonthedesk)causesaradialmenutoappearattheapproximatecentroidofthefingers(Figure5.4a-c).

RobertXiaoProposal–March29,2017

44

Theusercanthenmovethefingerstowardsthedesiredapplicationandlifttheirfingerstocon-firmtheselection(Figure5.4d).Alternately,theuserscansimplyliftthefingerswithoutmovingtocancelthemenu.Inacompleteimplementation,oneofthemenuoptionswouldpermitvoiceortextinputtosearchforlessfrequentlyusedapplicationsnotlistedontheradialmenu.Con-firmingtheselectionofanapplicationcausestheapplicationtoappearatthecenterofthemenu(Figure5.4e).Theusercanthenrepositionandresizetheapplication.

5.4.3 ResizingDuringourstudy,everyparticipantsuggestedusingapinchgesturetoresizetheinterface.How-ever,somenotedthatthiswouldbeambiguouswithrespecttothecontent(whichisnotusuallyanissuesincethepinchgestureistypicallyusedonmobiledevicesthatonlyshowoneapplicationatatime).Further,apinchgesturedoesnotnaturallypermittheaspectratiotobeadjusted.

Wethereforeborrowafamiliarmechanismformanualresizing–adraggable1.5cmgreyboxinthelower-rightcorneroftheapplication(Figure5.5a).Becauseapplicationscan(andmust)existatavarietyofsizesondesktops,applicationsideallyimplementresponsivelayoutschemes[Zeid-ler2013](Figure5.5a,b).Ifapplicationsareresizedbelowareasonablelimit(e.g.lessthan4cmineitherdirection),applications“iconify”,displayingonlyaniconoftheapplicationandtheresizebox(Figure5.5c).Thispermitsapplicationstobestoredonthedeskwithouttakingupsubstantialspace.

5.4.4 DeletingWeconsideredseveralpossibledeletionschemes,somesuggestedbyourstudy.Theseincludedspecial-purposegestures (e.g. “scrunchingup” the interfaceor “flicking” the interfaceaway),specialdragtargets(e.g.a“trashcan”),orexplicitclosebuttons(likethosefoundoncomputerdesktopwindows).Ultimately,wechosetosimplyextendresizingtoprovidedeletioncapabilities:applicationscanbedeletedsimplybyresizingthembelow1.5cmineitherdirection(Figure5.5d).Thisapproachissimpleandrequiresnoadditionalbuttonsorgestures.

5.4.5 RepositioningandReorientingInordertorepositionapplicationsmanually,weimplementeda“drag”strategy.Userscanpressthreefingerstotheapplicationanddragasdesired,orrotatethehandtoreorienttheapplication(Figure5.6a,b).Foriconifiedapplications,whicharesmall,asinglefingerisusedfordragging(asthecontentishidden,thereisnointeractionambiguity).

5.4.6 SnappingUserscansnapapplicationstothetopologicaldiscontinuities(i.e.,“edges”)ofphysicalobjectsonthedesk,suchasthesidesoflaptops,books,keyboard,stacksofpaper,edgesofworksurfacesandsoon.Thesesnapcandidatesaredetectedwiththeedgefindingalgorithmdescribedlater.

Tosnapaninterfacetoanedge,usersdragtheinterfaceneartothetarget(Figure5.6b).Thenearestobjectedgetothe interface(upto50mmaway)willbehighlighted inyellow(Figure5.6c).Whenanobjectedgeishighlighted,releasingthedragwillcausetheinterfacetosnapto

RobertXiaoProposal–March29,2017

45

thehighlightededge,aligningtheinterfaceedgetothenearestobjectedgebyrepositioningandreorientingtheinterfaceasnecessary.Aftersuccessfullysnapping,thesnappedinterfaceedgewillbehighlightedtodenotethesnappedstate(Figure5.6d).

5.4.7 FollowingAs noted during our study, users expect snapped interfaces to follow the objects they aresnappedto.Inoursystem,this“following”behaviorisimplementedbyupdatingtheobjectedgestentimespersecond(alongsidethedesktopography),andrepositioningsnappedinterfacestothenewedgenearest to thepreviousedge (inbothpositionandrotationangle).Thiscausesinterfacestosimplyfollowtheobjectsnaturally(Figure5.7a-c).Toavoidjitterduetonoiseintheedgemeasurements,interfacesonlyfollowiftheedgemovesbymorethan4mm.

5.4.8 DetachingUsersmayalsowanttodisablesnappingorfollowingbehaviors,inwhichcasetheycandetachtheinterface.Thiscanbeachievedbyeitherdraggingtheinterfaceofftheobject(Figure5.7d),orbymovingtheobjectrapidlyawayfromtheinterface(i.e.,“tearing”itoff).Interfaceswillde-tachiftheedgemovesmorethan40mmawayinasingleframeupdate,whichcorrespondstoa“detachvelocity”of400mm/s.

Figure 5.7. Following and detaching interactions.

a-c: Once a virtual interface is snapped to an object, it should follow if the object is moved. d: The snapped interface can be detached by dragging it off.

5.4.9 EvadingInterfacesthataresuddenlyoccludedwillmoveoutofthewayandseekanotheremptyspacetoappearon(Figure5.8b-c).Thiscouldhappenif,e.g.usersrearrangetheirdesk,orshuffleitemsaround.Whenthisoccurs,thenexttopographyupdatewilldetectasuddenincreaseintheinter-face’s“roughness”,andwillincludetheinterfaceinthesubsequentoptimizationpass.Thisrelo-catestheinterfacetoanearbyopenspaceonthedesk(anyrelatively-flatareawithlowrough-ness),whileavoidingotherinterfaces.Multipleinterfacesmayparticipateinoptimizationsimul-taneously.

RobertXiaoProposal–March29,2017

46

5.4.10 CollapsingOuroptimization-basedlayoutalgorithm(discussedinthenextsection)willalsoautomaticallyshrinkaninterfaceiftheavailablespaceisinsufficienttoplaceadisplacedinterface.Thispermitsaninterfaceto“squeeze”intoaspacewhereitcanbedisplayed(Figure5.8c). Iftheavailablespaceisverylimited,theinterfacemaysimplyiconify;theusercanthenchoosetoclearoffspaceandre-expandtheinterfaceasdesired.

Figure 5.8. Evading and collapsing interactions.

a: An interface is positioned somewhere. b: The interface is occluded by an object. c: When the user moves his hands away, the interface evades the occlusion, and also collapses to fit available space.

5.5 TechnicalImplementationAchievingourteninteractivebehaviorsrequiredcombiningfeaturesdescribedinmanydisparateresearcheffortsintoasinglenovelsystem,asnopriorsystemhasthenecessaryfeatureset.Morespecifically,oursystemprovidesrectifiedprojectionontoirregularsurfaces.Tosupportourco-habitationbehaviors,itwasnecessarytoperformobjecttrackingandedgefindinginreal-time.Wealsoemployanoptimization-basedlayoutschemetoautomaticallyfindlocationcandidatesfor evading interfaces. For finger touch tracking, we extend the approach described in Om-niTouch[Harrison2011],whichallowsfingerstobedisambiguatedfromotherobjects.Wenowdescribetheseprocessesingreaterdetail.

RobertXiaoProposal–March29,2017

47

Figure 5.9. Our proof of concept projector-camera system fitted into a lampshade.

5.5.1 HardwareOurhardwareprototypeconsistsofapaireddepthcamera (AsusXtion)andpocketprojector(700-lumenOptomaML750),aconfigurationsimilartothatfoundin[Harrison2011,Xiao2013],(Figure5.9,right).Thehardwarewasenclosedinaceiling-hunglampshadefixture(Figure5.9,left)suspended90cmaboveadesksurface.Atthisdistance,theprojectedimageis62x39cminsize.Thecameraandprojectorarerigidlyaffixedtogetherandcalibratedwithrespecttoeachother(once,e.g.,“atthefactory”)usingaseven-pointcalibrationroutine[Xiao2013].Thehard-wareisdesignedtobeeasilyportableandfunctionalonanydesk,withnocalibrationofthedesksurfacerequired.

Thedepthcameraproduces320x240pixeldepthimagesat30framespersecond.Naturalvaria-tionduetonoiseis4-7mmatadistanceof90cm[Khoshelham2012].Oursoftwarewasdevel-opedusingtheOpenFrameworksC++library,andrunsona2011MacbookPro.Thecompleteimplementation requires less than15%of theCPU in fulloperation. It runs touch trackingatcameraframerate(30fps),geometryupdatesat10fps,andinterfacedraw/updatelogicatpro-jectorframerate(60fps).

5.5.2 DisambiguatingObjectandHumanMovementInordertoenableinteractionontheever-changingdesksurface,oursystemneedstoreadilyandaccurately distinguish fingers and hands from other objects. For example, to implement theevadebehavior,weneedtohaveinterfacesmoveawayfromobjectsthatareplacedontop,butnotmoveaway fromhandsor fingers thatappearover the interface.Additionally, interfacesmustbeabletorespondtohandandfingerinputwithoutbeingaffectedbyotherobjectsthatmaybemovinginthescene.

Wethereforedetecthumanarms(whicharecharacteristicallylongandcylindricalinshape)inadditiontofingersduringourtouchtrackingprocedure,andusethearmandfingerdatatolabel

RobertXiaoProposal–March29,2017

48

fingers,handsandarms.Thesehuman“objects”arethenexcludedfrommosttopographicalcon-siderations.Sincethe field-of-viewof thedepthsensor isapproximately twiceas largeas theprojectionareainoursystem,wecanguaranteethatfingersappearingwithintheprojectionareawillalwaysbeaccompaniedbyvisiblearmsinthelargerdepthimage.

5.5.3 TouchTrackingWedidnotwanttoaugmentourdeskswithadditionalsensors(sincethesystemwasintendedtooperateoveranyunmodifieddesksurface),andthereforeoptedtouseadepthcameratotracktouches.Further,whilebackgroundsubtractionand in-air touchtrackingenablereliablehandtracking(by identifyingandremovingthebackground),wedeterminedthatsuchanap-proachwouldbeinfeasibleinadeskenvironmentwhereobjectsareliabletomove.

Therefore,wedecidedtoextendthe“cylinder-finding”fingertrackingalgorithmproposedinOm-niTouch[Harrison2011].Specifically,weextractandcombinebothhorizontalandverticalcylin-derslices(Figure5.10b,redandbluelines),enablingthedetectionoffingers(Figure5.10b,white)oriented inanydirection(Figure5.11).Weadditionallyreject finger-likeobjects (e.g.,pensormarkers)bydetectingcylindricalarmslicesandrequiringvalidfingerstobeconnectedtoarms(seeprevioussection).Importantly,thisalgorithmrequiresnocalibration,nobackgroundsub-traction,andmakesnodecisionsbasedon(e.g.,potentiallylightingdependent)colorinformation.Touchtrackinglatencyisroughly40ms(limitedbytherefreshrateofthedepthcamera).

Figure 5.10. Touch tracking steps in Desktopography.

a: Depth gradients (red=dx, blue=dy). b: Unfiltered cylinder slices (red, blue), slices connected into cylinders (white lines) and touch-detection flood-fill (green). c: Detected touches overlaid on depth image. White circles represent the base of the finger.

RobertXiaoProposal–March29,2017

49

Figure 5.11. Desktopography tracking fingertips on the desk.

The touch tracking algorithm can find finger tips even when the hand is flat against the surface.

5.5.4 HandlingIrregularSurfacesThe“desktopography” issensedbyconvertingeachdepthpixel in theprojectionarea intoaheightvalue,whichisusedtoconstructamesh.Thetopographymeshisthenre-renderedfromanorthographicview,andthedepthcomponentofthisrenderingisextractedintoaheightmap,whichconvertsdesktopcoordinatesintoheightvaluesabovethedesksurface.Thisheightmapisusedtoidentifyflatareassuitableforplacinginterfaces.

Themeshisalsousedforprojectionmapping.Thedesktopimageryiswarped(byUV-mappingthedesktopgraphicsontothemesh)suchthattheresultingprojectedoutputappears“ortho-graphicallycorrect”nomatterhowtallorirregularly-shapedtheobjectsatopthedeskare.Thisensuresthatprojectionsonflatobjectsmaintainaconsistentscale(sothate.g.onevirtualdesk-topmillimeterisalwaysonemillimeteronaflatsurfaceregardlessofsurfaceheight).

Thesystemwillattempttoupdatethedesktopographytentimespersecond,butwillskiptheupdateiftheuserisinteractingwithapplicationsorfingersaredetectedoverthetopofaninter-face.Inthisway,thesystemavoidsocclusionandinterferenceeffectsfromtheuser’shandswhilemaintaininganacceptableupdatefrequency.Aftercompletingatopographyupdate,thesystemwillsearchforedgesinthenewtopography,andtheninitiateaninterfaceoptimizationpasstosupportevadingandfollowingbehaviors.

5.5.5 EdgeFindingTosupportoursnappingandfollowingbehaviors,oursystemrequiredtheabilitytodetectedgesofphysicalobjects.TheprocessstartsbyfilteringthedeskheightmapwithaCannyedgefilter[Canny1986]toextractdepthdiscontinuities.Theparametersweretunedtoextractdiscontinu-itiesofatleast7mmwithoutexcessivenoise(thesmallestpossibledifferencegiventhenoise

RobertXiaoProposal–March29,2017

50

levelofthedepthsensor).ThebinaryedgeimageisthenfedtoOpenCV’scontour-findingroutine,whichconnectsadjacentedgepixelsintocontiguouscontours.

Contoursarethenfilteredtoremoveartifacts(e.g.duplicatedanddoubled-upcontourpaths)andshortnoise-inducedcontours.Wethenextractsegmentsfromthecontoursbywalkingoverthepixels ineachcontourandoutputtingasegmentwhenever thecontourabruptlychangesdirection(bymorethan45ºinan8-pixelwindow).Shortsegments(lessthan20mmlong)aredeleted,andtheremainingsegmentsareconvertedtostraightlinesbyusinglinearinterpolation.

5.5.6 Optimization-BasedLayoutTosupportevadingandcollapsingbehaviors,thesystemmustbeabletoautomaticallyidentifyopenspacesonthesurface,andtodecidewhethertheavailablespaceissufficient.However,theirregularandcomplextopologyofdesksyieldsinnumerablecornercasesthatmakebuildingarule-basedorheuristicallydrivenlayoutapproachunwieldyandbrittle.Thus,weuseanoptimi-zation-basedlayoutenginetosupportevadingandcollapsinginterfaces.

Optimizationhaslongbeenusedforspatialproblemssuchasgraphvisualization[DiBattista1998]andVLSIlayout[Cong1996]andmorerecentlyforproblemsinperceptuallyoptimizeddisplaygeneration[Agrawala2001,Lee2005].Morerelevantly,optimizationbasedtechniqueshavealsobeen employed for automatic generation of user interface layouts (see e.g., [Bodart 1994,Fogarty2003,Gajos2008,Sears1993]).Asfarasweareaware,ourworkisthefirsttouseopti-mization-basedlayoutforon-worldinteractivepurposes.

Ourapproachdefinesacertain“cost”toeveryinterfaceconfiguration,dependingonthedesktopographyandinterfacepositions.Itthenaimstominimizethetotalcostoftheconfiguration,subjecttoanumberof“penalties”whichguidetheprocessawayfromundesirablebehavior(e.g.toavoidlargejumpsinposition).Furthermore,theoptimization-basedlayoutenginecanbeused,forexample,toautomaticallypositionnewinterfaceelements(iftheyaresummonede.g.usingavoicecommandalone)ortosupportrecallingasetofinterfacesontoachangeddesklayout.

Thelayoutengineusesthetopographicalheightmaptodeterminethesurfaceroughnessasfol-lows.Theroughnessateachdeskpointiscalculatedasthestandarddeviationoftheheightmapina41x41millimetersquarewindowcenteredonthedeskpoint.Thisisefficientlycomputedbyusingslidingwindowsoversummed-areatables [Crow1984].The“roughness”ofaparticularinterfaceareaistheaverageroughnessacrosseachpointinthearea;thisisefficientlycalculatedbyusingapolygonscanlineapproachcoupledwithasummed-areatableoftheroughnessmap.

Everytimethetopographyisupdated(uptotentimesasecond),theroughnessmapisupdated,theinterfacepositionsareadjustedtofollowanysnappededgeboundaries(tosupportfollowingbehaviors),andtheoptimizationalgorithmisexecuted.Theoptimizergeneratesandevaluatesanumberofcandidateinterfaceconfigurations,startingwiththeinitialconfiguration.Interfaceswhichwererecentlypositionedbyhand,orwhoseroughnesshasnotsignificantlyincreasedsincethemostrecentoptimizationrunareexcludedfromoptimization.

Thecostofaparticularconfigurationiscomputedfromthetotalsumoftheinterfacearearough-nessvaluesandanassortmentofpenalties(costincreases)todiscourageparticularbehaviors.A

RobertXiaoProposal–March29,2017

51

penaltyisimposedformovinganobjectatall,whichbiasesthealgorithmtowardskeepingob-jectsstill.Anotherpenaltyisappliedbasedonthedistancemovedandanyapplicablesizechange,whichbiasesthealgorithmtowardssmallmovements(preservingspatiallocality).Finally,alargepenaltyisimposedforanyoverlapbetweeninterfaceobjectareas,topreventthealgorithmfromoptimizingtwointerfacesontoeachother.

Theoptimizationitself isperformedusingasimulatedannealingalgorithm.Thesimulatedan-nealingalgorithmmaintainsa“temperature”parameterthatdecreasesateachiterationofthealgorithm.Ineachiteration,eachinterfaceobjectinthecurrentbestconfigurationis“mutated”togeneratealistofnewcandidateplacements;thesemutationscanconsistofmoving,rotatingorresizingobjects.Theexpectedmagnitudeofeachmutationiscontrolledbythe“temperature”parameter,withhighertemperaturesresultinginmoreextrememutations(e.g.longerdistancemovements).Candidatesarecombinedintonewconfigurationsandtestedforacceptanceac-cordingtothecurrenttimestep’s“temperature”parameter(whichcontrolsthesimulatedan-nealingalgorithm’swillingnesstoacceptnew“best”configurations).Theconfigurationsaregen-eratedexhaustivelystartingwiththecandidateshavingthelowestroughnessvalues.Ifthispro-cess yields toomany combinations, the algorithm switches to generating configurations ran-domlytoavoidcombinatorialexplosion.

Theoptimizationalgorithmterminatesafterreachingapredefinednumberofiterations,orifitrunsformorethan60ms.Thisboundstheoptimizationalgorithmandpreventsitfromrunningendlessly.Thebestconfigurationfoundsofaristhenreturned(whichmaybethesameastheinitialconfiguration,ifnoalternativewassuperior).

5.6 ConclusionIhavepresentedasetofteninteractionbehaviorsfordeskinterfacesdesignedtofacilitatedigitalinterfaces co-existingwith and responding tophysical objects. Thesebehaviorswerederivedfromanelicitationstudyusingpaperprototypes.Ialsodescribedtheimplementationofaproof-of-conceptprojector-camerasystem,designedwithauniquesetoftechnicalfeaturesnecessarytoimplementourbehaviors.Ingeneral,theinteractionbehaviorspresentedinthispaperserveasabasisforamorecomplete,moderndigitaldesksystem,inwhichtheworksurfaceanditscontentsarenotreplacedwithdigitalequivalents,butaugmentedtoaddnewcapabilities.Whilesimilarideashavebeenexploredinpriorwork,wedrawthemtogetherinaholisticandgroundedmanner,andadditionallydemonstratetechnicalfeasibility.

RobertXiaoProposal–March29,2017

52

CHAPTER6. REFININGON-WORLDTOUCHINPUT

6.1 SummaryInthischapter,IdescribeDIRECT(DepthandIREnhancedContactTracking),anewtouch-track-ingapproachthatmergesdepthandinfraredimagedata(fromasinglesensor)toprovidesignif-icantlyenhancedfingertrackingoverpriormethods(Figure6.1andFigure6.2).Infraredimageryprovidesprecisefingerboundaries,whiledepthimageryprovidesprecisecontactdetection.Ad-ditionally,theuseofinfrareddataallowsthesystemtomorerobustlyrejecttrackingerrorsaris-ingfromnoisydepthdata.ThisapproachallowsDIRECTtoprovidetouchtrackingprecisiontowithinasinglepixelinthedepthimage,andoverall,fingertrackingaccuracyapproachingthatofconventionaltouch-screens.Additionally,ourapproachmakesnoassumptionsaboutuserposi-tionororientation(incontrasttoallpriorsystems,whichrequireaprioriknowledgeoffingerorientationtocorrect forundetectedfingertips),nordoes it requirepriorcalibrationorback-groundcapture.

RobertXiaoProposal–March29,2017

53

Figure 6.1. Comparison of depth-camera-based touch tracking methods.

Our method (DIRECT) is shown in green and labeled 0. Comparison methods are single-frame model (1/red), maximum distance model [Wilson 2010a] (2/orange), statistical model [Izadi 2011, Xiao 2013] (3/yellow) and slice finding [Harrison 2011] (4/cyan).

RobertXiaoProposal–March29,2017

54

Figure 6.2. DIRECT system setup.

Left: the Kinect depth camera is mounted 1.6 m from the table surface. Top right: the projector and Kinect are rigidly mounted and calibrated to each other (but not the surface). Bottom right: The table surface functions as a touchscreen.

Wedescribe indetail the technical implementationofDIRECTanddiscuss its capabilitiesanduniquecharacteristicsrelativetoothertouchtrackingapproaches.Wefurthercontributeamulti-techniquecomparisonstudy–thefirstevaluationofitstype–wherewecomparefourpreviouslypublishedmethodsagainstoneanother,aswellasagainstDIRECT.DIRECTreadilyoutperformsthesepriortechniqueswithrespecttoviabledistance,touchprecision,touchstability,multi-fin-gersegmentation,andtouchdetectionfalsepositivesandfalsenegatives.

6.2 ImplementationWeimplementedourtouchtrackingalgorithmandcomparisontechniquesinC++ona2.66GHz3-coreWindowsPC,withaKinectforWindows2providingthedepthandinfraredimagery.TheKinect2isatime-of-flightdepthcamera,whichusesactiveinfraredilluminationtodeterminethedistancestoobjectsinthescene.Itprovides512x424pixeldepthandinfraredimagesat30framespersecond.ABenQW1070projectorwitharesolutionof1920x1080 isalsomountedaboveourtestsurface(awoodentable)toprovidevisualfeedback.

RobertXiaoProposal–March29,2017

55

TheKinect2ismounted1.60metersabovealargetablesurface,andtheprojectoris2.35metersabovethesurface (Figure6.2).At thehorizontaledgesof theKinect’s fieldofview, thetablesurfaceis2.0metersfromtheKinect.BoththeprojectorandKinectaresecurelymountedtotheceiling,andwerecalibratedtoeachotherusingmultipleviewsofaplanarcalibrationtarget.

Thepresentconfigurationallowstheprojectortoprojecta1.0×2.0meterimageontothetablesurface,withtheKinectcapableofsensingobjectsacrosstheentireprojectedarea.Atthisdis-tance,eachprojectedpixelis1.0mmsquare,andeachKinectdepthpixelis4.4mmsquareatthetablesurface.Thus,evenwiththissecond-generationsensor,atypicalfingertiprestingonthetableislessthan3depthimagepixelswide,underscoringthesensingchallenge.

Ourapproachcombinesbackgroundmodelingandanthropometricmodelingapproaches.Morespecifically,DIRECTmodelsboththebackgroundandtheuser’sarms,hands,andfingersusingseparateprocesses.Ourprocessingpipelineiscarefullyoptimizedsothatitrunsatcameraframerate(30FPS)usingasinglecoreofthePC.

Thetouch-trackingpipelineisillustratedinFigure6.3andFigure6.4.Figure6.3showsahandlaidflatonthetable,whichischallengingfordepth-basedtouchtrackingapproachesbecausethefingertipsgenerallyfusewiththebackgroundduetosensorimprecisionandnoise(Figure6.3a).InFigure6.3d,weshowthatDIRECTcanstillsegmentthefingers(labeledindifferentcolors)rightdowntothefingertip.Figure6.4showsanotherchallengingcase-asingleextendedfingerraised60º.Thisisproblematicbecausethereareveryfewdepthpixelsavailableforthefingertipitself.Asbefore,DIRECTisabletosegmentthecrucialfingertip.

Figure 6.3. Touch tracking process for five fingers laid flat on the table.

(a) depth image, (b) infrared image, (c) infrared edges overlaid on z-score map, (d) segmentation result. In (d), arm pixels are cyan, hand pixels are blue-green, and finger pixels are combined with fingertip pixels and shown in various shades of green.

6.2.1 BackgroundModelingOursystemusesastatisticalmodelofthebackground,inspiredinpartbytheimplementationinWorldKit[Xiao2013].Ateverypixel,wemaintainarollingwindowoffivesecondsofdepthdata

RobertXiaoProposal–March29,2017

56

andcomputethemeanandstandarddeviation.Thismodelallowsustoestablishbothahighlyaccuratemeandepthbackground,aswellasanoiseprofileateverypixelinthescene.

Withtheserollingwindows,wesupportdynamicupdatingofthebackgroundmodel.Ifthestand-arddeviationexceedsacertaindepth-dependentthreshold(accountingforhigheraveragenoisefurtherfromthesensor),thepixelistemporarily“stunned”anditsbackgroundmodelmeanandstandarddeviationwillbeheldconstantuntil themovingaveragedropsbelowthethreshold.This approach accurately tracks long-term changes in the environment (e.g., objects movedaround),whileignoringshort-termchanges(e.g.,actively-interactinghandsandfingers).Inourpresentimplementation,thebackgroundisupdatedinaseparatethread,runningat15fpstoavoidexcessivelyfrequentupdates.Thistypeofdynamicbackgroundupdatingiscrucialforlong-runningsystemstodealwithshiftsintheenvironment(e.g.,movementofobjectsresidingonthesurface),yetmostexistingsystems(e.g.,[Wilson2010a,Xiao2013])useonlyasinglestaticbackgroundmodelcapturedduringinitialsetup.Wenotethathighlystationaryhandsandfingerscouldbe“integratedintothebackground”withthisapproach,thoughweobservedthat,inprac-tice,usersrarelystaystationaryforseveralsecondsovertopofactivetouchinterfaces.

Figure 6.4. Touch tracking process for a finger angled at 60º vertically.

(a) depth, (b) infrared, (c) edges and depth map z-scores, and (d) filled blobs. Refer to Figure 6.3 for a full color key.

6.2.2 InfraredEdgeDetectionWeusetheinfraredimageprimarilytodetecttheboundarybetweenthefingertipandthesur-roundingsurface.Assuch,ourfirststepistodetectedgesintheinfraredimage.TheKinect2’sinfraredimagecomesfromthesamesensorasthedepthdata,andthusitispreciselyregisteredtothedepthimage.WeuseaCannyedgefilter[Canny1986]tolocatecandidateedgepixelsintheimage(7x7Sobelfilter,withhysteresisthresholdsof4000and8000),withresultsshowninFigure6.5.Notetheparametersarenotspecifictotheoperatingdepthorobjectsinthescene.

RobertXiaoProposal–March29,2017

57

Figure 6.5. Canny edge detection on the IR image.

Left: Kinect IR image. Right: Edge map.

AfterrunningtheCannyedgefilter,someedgesmayhavegaps.Thesecanoccurduetoe.g.,multipleedgesmeeting.Acommongap-fillingtechnique,imagedilationfollowedbyerosion,isinappropriateforourcaseduetothesmallsizeofthefingertip. Instead,weemployanedge-linking[Maeda1998]algorithmacrosstheedgemap,whichwalksalongCannyedgeboundariesandbridgesone-pixelbreaksbetweenneighboringedges.Afterapplyingthisalgorithm,finger-tipsandhandsareusuallyfullyenclosed(Figure6.3candFigure6.4c,paleyellowlines).

Surfaceswithaverysimilarinfraredalbedotoskincouldcauseissuesforedgefinding.However,anecdotally,wehavefoundthattheshadowscastbythearmandhand(illuminatedbytheactiveIRemitterfoundinmanydepthcameras),evennearthefingertip,helpstoincreasecontrast(asseeninFigure6.3bandFigure6.4b).

6.2.3 IterativeFlood-FillSegmentationOurtouchtrackingpipelineconsistsofasequenceofiterativefloodfills,eachresponsibleforadifferentpixeltype:armfilling,handfilling,fingerfilling,andfingertipfilling.Wheneachfloodfillcompletes,ittriggersthenextfillinthesequence,startingfrompixelsonitsboundary.Critically,eachfloodedareaislinkedtotheparentareafromwhichitwasseeded(e.g.thefingerfillstartsfromthehandthatitisattachedto),formingahierarchyoffilledobjectsthatmatchtheirrespec-tiveanthropometricrequirementsderivedfrom[EastmanKodak2003,Haley1988,NASA1995,White1980].Forafingertobesuccessfullysegmented(andpassedasinputtoaninterface),acompletehierarchymustexist–aprocessthatrobustlyrejectsfinger-likeobjectsthatarenotconnectedtohands,armsandsoon(e.g.,awhiteboardmarkerlayingonatabletop).

RobertXiaoProposal–March29,2017

58

ArmStage

Thefirststageislabelingarmpixels,andmergingthemintoconnectedarmblobs(Figure6.3dandFigure6.4d,lightblue).Armpixelsaredefinedaspixelsthatareatleast5cmclosertothesensorthanthebackgroundmean,i.e.pixelsthatareatleast5cmabovethesurface.Thishighthreshold ischosenbothtounambiguouslydistinguishhumanactivityfrombackgroundnoise(standarddeviationsattheedgeofthedepthmapcanreach~1.5cm,so5cmismorethan3standarddeviationsaway),andtodetecthumanforearmsevenwhenlaidtotallyflatonthetable(6.3cmisthe2.5thpercentilediameterofahumanforearm).

HandStage

Then,forallarmblobs,ouralgorithmfloodfillsdownwardstowardsthehandpixels,definedaspixelsthatareatleast12mmfromthesurface.Thisthresholdwaschosentosegmentindividualfingersapartfromthehand(12mmisthe2.5thpercentilethicknessofahumanfinger,allowingustodetectevensmallfingerslyingflatonatable).Duringthisstep,thefill isconstrainedtoavoidpixelswithhighdepthvarianceintheirlocalneighborhood.Thisconstraintensuresthatthisfloodfilldoesnotsimplyfillintonoisypixelssurroundingthearm.Theresultisahandblobattachedtotheparentarmblob(Figure6.3dandFigure6.4d,darkblueblob).Ifmultiplehandblobsarefound,onlythelargestistaken(toavoidnoise-induced“phantomhands”).

FingerStage

Inthethirdstage,thealgorithmfillsfromthehandblobintofingerpixels,whicharedefinedaspixelsthatareat leastonestandarddeviationabovefromthesurfacemean.Thesepixelsareabovenoise,butareotherwiseveryclosetothesurface.Becauseofthis,weconstrainthefilltostaywithintheboundariesderivedfromtheinfrarededgemap.Intheeventofaholeintheedgemap,thefillwillstopatbelow-noisepixelsandthuswillnotfilltoofar.Thisprocessproducesanumberoffingerblobsattachedtothehandblob,andthepointatwhichthefingerattachestothehandiscalledthefingerbase.

FingertipStage

Finally,foreachfingerblob,thealgorithmfillsfurtherintothebelow-noisepixels(fingertippixels).Atthispoint,thedepthmapisnotused(asfingersgenerallymergewithnoise),andonlytheinfrarededgeconstrainsthefill.Avastmajorityofframesfillsuccessfully,howeveroccasionally,agapintheedgemapwillcausethefloodtoescapeoutsidethefinger.Tomitigatethis,ourfloodfillstopsandflagsanoverfillerrorifthefillextendsmorethan15cmfromthefingerbase.Thisvalueallowsforboththelongesthumanfingers(meanlength8.6cm,SD0.5)andis2standarddeviationsabovethemeanpalmcentertofingertiplength(mean12.9cm,SD0.8),whichistheworst-casescenarioifthehandfillstagewasonlypartiallysuccessfulinfloodingintothehand.Additionally,thispermitstheuseofgraspedpens,markersorstylusesusedaspointingdevices,whilestillrejectingout-of-controlfloodfillsthatspilloutontothebackground.

Ifanoverfillconditionisdetected,DIRECTdeletesthefloodedfingertippixelsandreturnsafail-ureindication.Thisallowsthesystemtogracefullyfallbacktodepth-onlytouchtrackingintheeventthattheIRimageisunusableforanyreason(e.g.becausethereareholesintheedgeim-age).Crucially,thisenablesthealgorithmtoworkinclutteredorcomplexenvironmentswhen

RobertXiaoProposal–March29,2017

59

theedgeimagemaybedamagedorunusable,albeitwithreducedperformance.Otherwise,ifnooverfilloccurs,thefingertippixelsareaddedtotheparentfingerblob.TheresultingfingerblobsareseeninFigure6.3dandFigure6.4d(varyingshadesofgreen).

6.2.4 TouchPointExtractionDuringboththefingerfillandtipfill,werecordthedistanceofeachfingerpixeltothefingerbase.Foreachdetectedfinger,wesimplyplacethefingertipatthepixelwiththehighestsuchdistance.Wefoundthatthispixelcorrelatesextremelywellwiththefingertip’sactuallocation,andfurthermorethatthispointisstableenoughthattouchpositionsmoothingisunnecessary.

Ifthetipfillingfailedduetoanover-fill,thefingertip’spositioncanbeestimatedusingforwardprojection,byusingthearmandhandpositionstodeterminetheorientationofthefinger.How-ever,theresultingestimatewillbesubstantiallynoisier,afactwhichcanbeconveyedtoahigher-levelfilteringorsmoothingalgorithm.

6.2.5 TouchContactDetectionTodetectifthefingertipisincontactwiththesurfacebehindit(i.e.todistinguishhoverfromcontact),weexaminethe5x5neighborhoodaroundthetippixel.Ifanypixelismorethan1cmfromthebackground,wemarkthefingerashovering,otherwisethefingerismarkedastouchingthesurface.Wethenapplyhysteresistoavoidrapidchangesintouchstate.Althoughthistouchdetectionapproachissimplistic,itissurprisinglyrobust;weattributethistothehighprecisionofourfingertiptracking.

6.2.6 TouchTrackingPropertiesTheDIRECTapproachmergesaspectsofopticaltracking,backgroundmodelingandanthropo-metricfingermodelingapproaches,andthereforeexhibitssomeuniquepropertiesrelativetoothermethods.Comparedtodepth-onlymethods,thetouchpointsdetectedbyDIRECTaremorestable, as the infrared estimation provides pixel-accurate detection of the fingertip. Conse-quently,DIRECTrequiresnotemporaltouchsmoothingtoworkwell,allowingittohaveverylowlatency(15msaveragelatencyfrominputdepthdatatooutputtouchpoint)withoutsacrificingaccuracy.

WhileDIRECTusesideasfromfingermodeling,itdoesnotdirectlymodeltheshapeoffingers,nordoesitmakeassumptionsabouttheshapeofthetouchcontactarea.Therefore,DIRECTiscapableoftrackingtoucheswhenthefingersassumeunusualconfigurations,suchasholdingallfingerstogether,performinggesturaltouches,orholdingobjectssuchaspensandstyli.

Lastly,DIRECTprovidessomeadditionalinformationaboutthehandandfingersbeyondjustthetouchpoint.Specifically,italsoprovidestheorientationofthefinger(thevectorfromthefingerbasetofingertip),theposeofthehand,andthearmassociatedwitheachtouch.Thismetadatacouldbeusedtoimplementinteractiontechniquesbeyondsimplemultitouch,e.g.usingthefin-geranglesforrotationalinput[Wang2009],ortomanipulate3Dobjects[Xiao2015],andusingthearmdatatoenablebimanualinteractions.

RobertXiaoProposal–March29,2017

60

6.3 ComparativeTechniquesTocomparetheperformanceofourtechnique,weimplementedfourrepresentativedepth-cam-era-basedtouchtrackingmethodsfromtheliterature(seealsoRelatedWork).AnexampleframeoftrackingoutputfromtheseimplementationscanbeseeninFigure6.1.

Ofnote,manyofourcomparisontechniqueswereoriginallydevelopedusingthepredecessoroftheKinect2weuse(the“KinectforWindows”,henceforthcalledKinect1forsimplicity).TheKinect1sensorusesstructuredlight,whichprojectsaninfraredspecklepattern,asopposedtothetime-of-flightmethodusedintheKinect2.Thespecklepatternrenderstheinfraredimagevirtuallyunusablefortracking,precludingDIRECT-stylesensing.TheKinect1featuresbothlowerdepthimagepixelresolutionandlowerdepthresolution(~4mmperdepthunitatadistanceof2meters)thantheKinect2.WehavethusadjustedandtunedthecomparisontechniquestoworkwiththeKinect2sensorasbestpossible.

Wealsodidnotuseanycalibrationwiththesetechniques.Touchtrackingsystemsoftenemployacalibrationstagewhereausertouchesseveralpointstoestablishamappingbetweenthesen-sorandknownphysicalpoints.However,iftheusercalibrateswithoutsignificantlychangingtheirorientation,thiscalibrationcanmaskcertainorientation-dependentbiases(asweshowinourresults)andmakethesystemdependentontheuserandfingerorientation.Hence,inourstudy,weapplyonlyaglobalcalibrationbetweenthedepthcameraandtheprojector,anddonotcali-brateusingdetectedfingerpositions.

Allofourcomparisontechniqueshaveacertainnumberof“magicnumbers”thatmustbetunedforcorrectoperationofthesystem.Furthermore,preciseadjustmentsofthesenumberscanbeusedtotradeoffbetweene.g.touchaccuracyandfalsetouchdetection.Forthestudy,wetunedthesenumberstokeepfalsepositivesacceptablylow,astheseareespeciallydamagingtotheuserexperience.Specifically,weaimedtoreducefalsepositivestolessthanonefalse-positiveperfivesecondswithinourtargetarea(2m2)whennofingersarepresent.Althoughthisisstillahighrateof falsepositives,wefoundthatattemptingtofurtherreducethisrate ledtounac-ceptable insensitivity in twoofourcomparison techniques.We tunedeachcomparison tech-niqueindependently,communicatingwiththesystem’sauthorswhennecessarytobestrepro-ducethetechnique.

Testsonall techniquesweredonewithoutsmoothingof thetouchdata.Smoothingcouldbeappliedtotheoutputofanyoftheseapproachesandmight increaseaccuracy.However,thiswouldcomeatthecostofincreasedlatencyandwouldtendtohidethemeritsofthetechniqueitself.

6.3.1 Single-FrameBackgroundModelOurfirstcomparisontechniqueisacommon,naïvetechniqueoftenusedforsimpletouchtrack-ing. Itusesabackgroundmodelconsistingofasinglecaptureddepthframe.Candidatetouchpixelsaresimplythosethatliebetweenaminimumandmaximumdistancethresholdfromthebackground(inourimplementation,between7and15mmfromthesurface).

RobertXiaoProposal–March29,2017

61

Forallbackgroundmodelimplementations,weapplyalow-passboxcarfilterfollowedbythresh-olding.Aconnected-componentspassthenextractstouchblobs,followingtheimplementationin[Wilson2010a].Althoughmostnaïveimplementationsdonotperformthisfiltering,wefounditnecessarytoavoidexcessivenoiseinthecontacttracking.

6.3.2 MaximumDistanceBackgroundModelThesecondcomparisontechniquemodelsthebackgroundusingthemaximumdepthvalueseenoverawindowoftime.Thiseffectivelyimplementsaconservativenoisemodelofthebackground.Likethesingle-frameapproach,thecaptureddepthisthenprocessedthroughalow-passfilterandthresholded,followedbyaconnected-componentspasstosegmentfingertouches.

Wilson[Wilson2010a]implementsavariationonthistechnique(pers.comm.),usingahistogramtochoosee.g.,the90thpercentiledepthvalueateachpixeltoeliminateoutliersintheoriginalKinectdata.WiththeKinect2,wedidnotobservesuchoutliers,andsoourmaximumdistancemodeliseffectivelythesameasthehistogrammethod.WilsontestedtheirsystemusingaKinectatamaximumdistanceof1.5metresfromatable.Atthatdistance,Wilsonreportedanecdotallythatthepositionaltrackingerrorwasabout1.5cm.

6.3.3 StatisticalBackgroundModelThefinalbackgroundmodelingmethodusesastatisticalapproach.ThisimplementationusesthesamemeanandstandarddeviationcalculationastheDIRECTmethod.NewdepthframesareconvertedintoZ-scoresbasedontheirdifferencesfromthebackground(specifically,thevalueofeachpixel,minus themean,dividedby thestandarddeviation),and theZ-scoresare thenfiltered,thresholdedandconnectedasintheothermethods.

Thismethod closely resembles the background differencing approach used inWorldKit [Xiao2013]. Italsoaimstocapture theessenceof theKinectFusionSLAMtouchtrackingapproach[Izadi2011],inthatitintegratesthebackgroundprofilegraduallyovertime,buildingastatisticalmodeloftheenvironmentthatismuchmoreaccuratethananysingleframe.However,ourrep-licatedapproachlacksthespatialaveragingofKinectFusion.

6.3.4 SliceFindingandMergingForourfinalcomparisontechnique,wechosetoimplementtheslice-findingapproachusedinOmniTouch[Harrison2011].Thisapproachlocatescylindricalslicesinthedepthimageusinganelastictemplate,andthenlinkstogetheradjacentslicestoformfingers.Ofnote,unliketheotherapproaches,OmniTouchrequiresthatthefingersbeclearlyseparatedinthedepthimage.Fur-thermore,asitdoesnotuseabackgroundmodel,itmaydetecterroneoustouchesonthesurfaceduetoobjectsalreadyinthescene;therefore,forthestudy,weclearedthetablesurfaceofallobjects.

TheoriginalOmniTouchimplementationonlysupportedfingersorientedhorizontally.Wethere-foreextendedOmniTouchbyimplementingbiaxialtemplatematching–locatingcandidatefingerslicesinboththeX-andY-axes–andthenmergingthecentroidsoftheseslicesintofingers.Our

RobertXiaoProposal–March29,2017

62

approachdemonstrablylocatesfingersorientedinanydirection.TobestreplicateOmniTouch’strue performance, we also implemented the system‘s finger forward-projection method. Asnotedpreviously,becausethefingersfusewiththedepthbackgroundupontouch,itisdifficulttoestimatethetruetipposition.Inresponse,OmniTouchusesthedetectedfinger’sorientationtoextendtheestimatedtouchposition15mmtowardstheun-sensedtip[pers.comm.].Thisimprovementisnotdirectlyapplicabletothepreviouslydiscussedbackground-subtractionmeth-ods,astheydonotmodelthefingerorientationorshape.Ofnote,theoriginalOmniTouchsys-temwasdesignedtooperateatadistanceofjust40cm.

6.4 EvaluationToassesstheaccuracyofDIRECT,werananevaluationwith12participants(3female;averageage25).Alluserswerefamiliarwithtouchinterfaces.Eachstudysessionlastedforroughly30minutes,andparticipantswerecompensated$10USDfortheirtime.

Participantsweresimplytoldthatthetablesurfacewasatouchscreenfromtheoutset,andtotouchthesurfaceastheywouldanyordinarytouchscreen.Userswerepermittedtouseeitherhand(includinginterchangeably)andtouseanyfingerposetheyfoundcomfortable.Userswerenotrequiredtoremovejewelryorrollupsleeves–severalusersconductedthestudywhilewear-inglongsleevedshirts,bracelets,watchesandrings.Ourexperimentsystemranallfivetouch-trackingmethods(DIRECTplusthefourcomparisontechniques)simultaneouslyataconsistent30fps(theframerateofthedepthsensor).

Figure 6.6. Tasks performed by users in the DIRECT study.

(a) crosshair task, (b) multitouch box task, (c) line shape tracing task, (d) circle shape tracing task.

6.4.1 TasksParticipantscompletedaseriesofsmalltasks,organizedintothreecategories.Taskorderwasrandomizedperparticipant tomitigateordereffects. Foreach task,userswere instructed tostandalongoneofthetwolongedgesofourtesttable(thuschangingtheorientationoftheirtouches).

Crosshair:Participantsplacedtheirfingertiponaprojectedcrosshair(Figure6.6a),afterwhichtheexperimentermanuallyadvancedthetrialandthetouchesdetectedbyeachtrackerwere

RobertXiaoProposal–March29,2017

63

recorded.Thistaskmeasuredthepositionalaccuracyofeachfingertrackingmethodandtouchsegmentationaccuracy.Crosshairswerearrangedina4x8gridspanningthetablesurface,butwereshownoneatatimeandinrandomorder.Thistaskwasperformedtwiceforeachedgeofthetable.

Figure 6.7. Touch error and detection rate for DIRECT and competing methods.

Average touch positional error (left) and touch detection rate (right) for each of the five touch tracking methods. Error bars are standard error.

Multitouch Segmentation: Participants were instructed to place a specific number of fingerswithinaprojected20cmsquareonthetable(Figure6.6b).Theexperimentermanuallyadvancedthetrialandthenumberoftouchesreportedwithintheboxforeachtrackerwasrecorded.Thistaskwasintendedtomeasurethemultitouchcontactreportingaccuracyofeachtechnique(i.e.,falsepositiveandfalsenegativetouches).Sixboxeswerespecifiedacrossthelengthofthetable,andthenumberoffingersvariedfrom1-5foratotalof30trials,randomlyordered.Thistaskwasalsoperformedtwiceforeachedgeofthetable.

ShapeTracing:Participantswereinstructedtotraceaparticularprojectedshape,beginningatthestartpositionindicatedwithagreentriangle(Figure6.6c,d),andtracingtotheendofthepath.Foreachframebetweenthestartandendofthetrial,werecordedthetouchcoordinatesforeachmethod.ThistaskwasintendedtoreplicatethetracingtaskusedinOmniTouch[Harri-son2011].Therewerethreeinstancesofthistaskpertableedge,oneforeachshape:horizontalline,verticalline,andcircle.

RobertXiaoProposal–March29,2017

64

Figure 6.8. Touch error after post hoc offset correction.

Average positional error after removing the average offset vector and assuming a priori knowledge of the user’s orienta-tion. Error bars are standard error.

6.5 ResultsandDIscussionInourresults,wedenotethetwoedgesofthetableas“back”and“front”.ThetopoftheKinect’simagecorrespondedtothefrontedgeofthetable.

Figure 6.9. 95% confidence ellipses for crosshair task.

Left: from the back of the table; right: from the front of the table. X and Y units are in millimetres; colours are as in Figure 6.8.

x-50 -40 -30 -20 -10 0 10 20 30 40 50

y

-50

-40

-30

-20

-10

0

10

20

30

40

50

x-50 -40 -30 -20 -10 0 10 20 30 40 50

y

-50

-40

-30

-20

-10

0

10

20

30

40

50

RobertXiaoProposal–March29,2017

65

6.5.1 CrosshairThecrosshairtaskallowedustotestbothtouchpositionalaccuracyandtouchdetectionrate.Duetopotentialspuriously-detectedtouches,wemeasuredaccuracyasthemeandistancefromthefingertothenearestdetectedtouchpointforeachtracker.Touchesfurtherthan200mmfromthefingerwerenotcounted,sincethosetoucheswouldclearlybeerroneous.Ifnotouchesweredetectedinrangeforaparticulartracker,thetrackerwasconsideredtohavefailedtode-tectthetouch.

Intotal,wecollected768trialsforeachsideofthetable.ThetouchpositionalaccuracyresultsaresummarizedinFigure6.7.Theaccuracyresultsshowaslightbutconsistentincreaseinaccu-racyacrossalltrackerswhenusersstoodatthefrontedgeofthetable.

DIRECTachievedanaverageEuclidean-distancepositionalerrorof4.8mmacrossalltrials,witha99.3%touchdetectionrate.Thenextbesttechnique,slicefinding,hadanaveragepositionalerrorof11.1mmandatouchdetectionrateof83.9%.Thebackgroundmodelingmethodsallperformedpoorly,withaveragepositionalerrorsofover40mmandtouchdetectionratesrang-ingfrom52.1%to84.8%.Putsimply,thesemethodsdonothavethenecessarysophisticationtosegmentsmallfingercontactsatthenoiselevelpresentwhensensingat1.6meters.

Duringdevelopment,wenoticedthattheslicefindingmethodwithoutforwardprojectionper-formedverypoorly(~20mmaverageerror),soitwasclearthatfingerforwardprojectionwascrucialtoobtaingoodaccuracy.Thisisbecausethesemethodscannotaccuratelylocatethefin-gertipinthenoise,andsotheyinsteadlocateapointsomewherealongthefinger.

Therefore,wealsoanalyzedtheaccuracyofthefourcompetingapproachesbyapplyingameanoffsetvector(i.e.,aposthocglobaloffset).Thisvectordependsonknowingtheprecisefingerorientation,andthustheoffsetcorrectioncorrespondstoa“calibrating”ofthetouchalgorithmfromafixeduserpositionandassumingthefingerisextendedperpendiculartothetable.Con-sequently,wecomputedoffsetsseparatelyforthefrontandbackuserpositions.Becausethepriorsystemsrecognizeneithertheuserpositionnorfingerorientation,thisresultispurelyhy-pothetical,butservesasausefulbenchmark.

Theresultingaverageoffset-correctederrors(Figure6.8)were4.46mmforDIRECT(anegligible0.3mmimprovement),9.9mmfortheslicefindingmethod(amodest1.2mmimprovement),and12.3-12.7mmforthebackgroundmodelingapproaches(asignificant20-30mmimprove-ment).Tovisualizetheseerrors,wealsocomputedthe95%confidenceellipsoids(Figure6.9)foreachtracker(notethattheoffsetcorrectioncorrespondstorecenteringtheellipsoids).Theseerrorsareconsistentwithpriorresults fromWilson[Wilson2010a]andOmniTouch[Harrison2011],suggestingthatouroffset-correctedcomparisonimplementationsarereasonablyclosetotheoriginalimplementations.

6.5.2 MultitouchSegmentationWiththemultitouchsegmentationtask,weaimedtomeasurethefalsepositiveandfalsenega-tiveratesfortouchdetectionwithmultiplefingerspresentinasmallregion.Underthesecondi-tions,touchtrackersmightmergeadjacentfingersordetectspurioustouchesbetweenfingers.

RobertXiaoProposal–March29,2017

66

Thedifferencesbetweenthebackandfrontsideswerenotsignificantinthistask,sotheresultshavebeencombined.Intotal,1440trialswerecollected.

Detectingasingleextendedfingeristheeasiesttask.Insinglefingertrials,DIRECTdetectedthecorrectnumber95.8%ofthetime.Single-framebackground,maximumframebackgroundandstatisticalmodelbackgroundachieved52.8%,66.3%and35.1%respectively.Slice-findingwas75.0%accurate.

Detectingseveralfingersincloseproximityismuchmorechallengingwhensensingat1.6meters.Withalltrialscombined,DIRECTdetectedthecorrectnumberoffingersin75.5%oftrials,morefingersthanwerepresentin2.4%oftrialsandfewerfingersthanwerepresentin22.1%oftrials.The three background modeling approaches – single-frame, maximum frame and statisticalmodel–detectedthecorrectnumberoffingers22.2%,29.2%and17.3%ofthetime.Veryfewtrialsreportedmorefingersthanwerepresent:9,7and4trialsrespectively(<0.1%ofalltrials).Instead,thesemethodstendedtomissfingers(77.1%,70.2%,and82.4%oftrialsrespectively).Finally,theslice-findingapproachdetectedmorefingersinjust9trials(<0.1%oftrials),fewerfingersin75.4%oftrials,andthetruenumberoffingersin24.0%oftrials.

Wetunedthecomparisontechniqueimplementationstominimizespurioustoucheswhilenoth-ingwastouchingthetable.However,ourmultitouchsegmentationresultssuggestthatoptimiz-ingforthiscriterioncouldhaverejectedtoomanylegitimatetouches,reducingtouchdetectionrates.On theother hand, increasing touch sensitivity significantly increases noise anderranttouches.Forexample,decreasingthe“lowboundary”depththresholdinthemaximum-distancebackgroundmodeltrackerbyasinglemillimeterresultsinhundredsoferranttouchesdetectedonthesurfaceeverysecond,whichisclearlynotacceptable.

6.5.3 ShapeTracingTrackingmovingfingersisextrachallengingastheexposuretimeofthedepthcameraproducesmotionbluracrossthemovingpartsoftheframe,reducingaccuracy.Further,asalreadymen-tionedabove,ourcomparativemethodsarepronetomissingfingers.Inthecaseoffingermove-ment,thiswillmanifestaslossoffingertrackingforseveralframes.Duringthisperiod,spuriousinputwouldoftencausethetracedpathtozigzag.Forthisreason,itwassimplynotpossibletocompletearealisticandusefulanalysiswithourstudydata.

However,wecanuseresultsreportedinOmniTouch[Harrison2011]asonepointofcomparison.Specifically,OmniTouchreportsameanerrorof6.3mm(SD=3.9mm)atasensingdistanceof40cm on a flat notepad. For comparison, DIRECT achieves a mean error of 2.9mm (meanSD=2.7mm)atasensingdistanceof160cm(onaflattable).

6.6 ConclusionWehavepresentedDIRECT,atouchtrackingsystemwhichstrategicallymergesdepthandinfra-reddatafromanoff-the-shelfinfrareddepthcameratoenablehighlyaccuratetouchtracking,evenatsignificantdistances fromthesensor.Overall,DIRECTdemonstratesgreatly improvedtouchtrackingaccuracy(meanerrorof4.9mm)anddetectionrate(>99%)–roughlytwiceas

RobertXiaoProposal–March29,2017

67

good as the next bestmethod in the literature, and nearly ten times better than classic ap-proaches.

Therearealsoimmediatewaystofurtherimproveoursystem.Forexample,inourpresentim-plementation,DIRECToutputsintegercoordinatesonthedepthmap,whichquantizestheX/Ypositionto4.4mm(meanerrorduetoquantization:3.1mm).Averagingpixelsatthetipcouldprovideasub-pixelestimate,furtherboostingaccuracy.Additionally,wecouldapplytemporalsmoothingtoimprovetouchpositionstabilityattheexpenseoflatency.

Toconclude,IhopethatDIRECT’ssignificantlyimprovedfingertrackingaccuracycanopennewopportunitiesinadhoctouchinterfacesandbetterallowthisemergingcomputingmodalitytobeexplored.Aswe’llseeinlaterchapters,thisfoundationaltechnologycanhelpmovedepth-drivensystemsfromproof-of-concepttomorepracticalandwidespreaduse.

RobertXiaoProposal–March29,2017

68

CHAPTER7. PROPOSEDWORKTocompletemythesisonon-worldinteraction,Iproposetocreatetwoembodimentsoftheon-worldinteractionconcept,extendingandimprovinguponmyearliersystemstocreateplatformsuponwhichIcanperformfurtherresearch.Usingtheseembodiments,Iwillprobethecentralquestionsof inputsensingandsupporting interactiontechniques,extendmyinputstrategytonewdomains,andexpandtheearlierworkoninteractiontechniquestoafull-fledged,on-worldapplicationenvironment.

7.1 ExploringEmbodiments

7.1.1 InfoBulbLightbulbsinexistingenvironmentsalreadyprovideilluminationacrosssurfacesandspaces,andsothesevenerablefixtureshavebeenlongconsideredpromisingcandidatesforintroducingubiq-uitousinteractivity.Underkoffleretal.[Underkoffler1999]articulatedanearlyvisionofthelight-bulbasacomputationaldevice,butwithoutapracticalbulb-sizedimplementation.InDesktopog-raphy,Ibuiltaprototypedepthcamera/projectorpairwhichfitsintoastandardlampshade,butlackson-boardcomputationalcapabilities.Ibelievethat,giventhetechnicaladvancesofthelastfewyears,itistimetobuildatrue“informationlightbulb”,or“InfoBulb”,andusethisbulbasaplatformtoexploreon-worldinteractiononawiderscale.

Figure 7.1. An early prototype of the InfoBulb concept.

At left: standard light bulb socket for power. At right, picoprojector and small depth camera, connected to offscreen laptop. Proposed InfoBulb devices will incorporate a computer directly in the design, requiring no tethers to external components or power.

TheInfoBulbwouldserveasadrop-inreplacementforexistinglightbulbs,comprisingapowersupply,computer,camerapackage,andprojector.Onceinstalledintoalightfixture,theInfoBulb

RobertXiaoProposal–March29,2017

69

willprojectoutacomputerinterface,ratherthanasinglecolouroflight,transformingtheun-derlyingsurfaceintoanexpansivemulti-touchsurface.Crucially,theseleverageexistinglightinginfrastructurestobringcomputationtotheenvironment,ratherthanrequiringtheconstructionofnewinfrastructure(wiring,newsurfaces,hardwareinstallationetc.).

IproposetobuildtheInfoBulbfromthecommoditypartsthatarenowavailable–smallformfactorcomputer,picoprojectorandalow-powerdepthcamera.Usingcommodityhardwarewillallowmetobuildmultipleidenticaldeviceseasilyandreliably,expandingthescopeofpotentialexplorations.TheInfoBulbsoftwarewillbetheculminationofexistingwork,aswellasexplora-tionsintofurtherwork–advancedtouchtracking,environmentsensing,andapplicationenvi-ronment.

7.1.2 WornARProjectingvirtualinterfacesontophysicalenvironmentsis,bydefinition,aformofmixed-reality–awayofmixing thevirtualandphysicaldomains.Recently, thesubjectofheads-upmixed-reality,inwhichuserswearadisplaythatoverlaysvirtualcontentontotheirperceivedenviron-ment,hasgained significantprominence,with commercialdevices suchas theMicrosoftHo-loLensandMagicLeapgainingtraction.Ibelievethatthehead-mounteddisplayformfactoralsooffersapromisingavenuetoachievingon-worldinterfaces,andmeritsexplorationasanalter-nativeembodiment.

Mixed-realitydevicesallowuserstosuperimposerealistic,3Dcontentdirectlyovertheenviron-ment, enabling awealth of interactive possibilities. However, current-generation augmentedmixeddevicestypicallyprovideonlyin-airgesturesandvoicecommandsforinput.Neithermo-dalityprovidestheprecisionortactilesensationpresentwithcommoncomputerinputsystems,includingtouchscreensandmice.Further,bothmodalitiesarefatiguingtouseforlongperiodsoftime,duetothegorillaarmeffectforin-airgesturesandtherepetitive,unnaturalnatureofspeechinputasacontrolinterface.Theselimitationspresentasignificantobstacletotheuseofaugmentedrealityindomainssuchas3Dobjectdesign,architecture,engineering,andart.Itisdifficulttoimagine,forexample,workingonabuildingdesignforanentiredayusingonlyvoicecommandsandgesturalinput,evengiventheadvantagesofaugmentingrealitywith3Dvisuals.

Iproposetoaddtouchtrackingandon-worldinteractiontotheARdomain,asopposedtotheexistingparadigmofin-worldinteraction(interactingwithfloatingcontentfromadistance).Withtouchsensing,augmentedrealitydevicesbecomesignificantlymoreusefulandpracticalforeve-rydayoperations,andwithsurfaceinteractions,weunlockthenaturalhapticfeedbackandtactilesensationthatisoftenmissingfromin-airinteractionmodalities.

Theresultwillbeanon-worldinteractionexperience,whichisvirtuallypresentedtoauserinahead-mounteddisplay,ratherthanbeingprojectedontheenvironment.ThisnewformfactorpresentsadifferentsetofadvantagesthantheInfoBulb,andsoformsaperfectlycomplementaryembodiment.

RobertXiaoProposal–March29,2017

70

7.2 RobustInputSensingWhileDIRECTdemonstratesafairlyrobust,accuratetouchtrackingmethod,thereisstillmuchwork tobedone.Touch tracking isacriticalpartof theexperience–withoutaccurate touchtracking,theuserexperienceisseverelyimpacted.Consequently,amajorcomponentofmypro-posedworkisinfurtherimprovingthetouchexperience.

7.2.1 Host-Surface-BasedTouchTrackingInDIRECT,Icapturedabackgroundmodeltodeterminewhenhandsappearedoverthesurface,andtoclassifydepthpixelsintothevariouspartsofthehand.Whilethisapproachprovidedgoodaccuracy,andsimplifiedtheimplementation,maintainingastablebackgroundmodelovertimeis not trivial. DIRECT had to gradually update the backgroundmodel over time to copewithchangesintheenvironment.Furthermore,withthehead-mountedwornARimplementation,abackgroundmodelisnolongerfeasibleatall,becausetheuser’sconstantheadmotionprecludesstablebackgroundcapture.

Tosolvetheseissues,Iproposemovingtoaplane-basedtouchtrackingapproach.Thisreplacesthesemi-staticbackgroundmodelwithafitted2Dplane,overwhichdepthcalculationscanbemade. Insteadofcapturingabackgroundmodelat startup, thisapproachwilldetect suitablesurfacesintheenvironmentbyscanningforpixelswhichcanbeconnectedintolarge,flatareas.Itthenfitsaplanetoeachsuchsurface,andusestheplanetodetecthandsandtouchcontacts.Planefittingcanbeperformedoneveryinputframeifneeded,tosupportthedynamiccameramotionsofthewornARembodiment.

7.2.2 ImprovedHoverDisambiguationWhileDIRECThadhighspatialaccuracy,itisnotasaccuratewhendistinguishinganactualtouchfromauser’sfingerhoveringoverthesurface.Althoughitstouchdetectionperformanceismuchimprovedoveratypicalinfrared-orcolor-cameratouchtrackingapproach,itstilldetectsatouchwhentheuser’sfingercomeswith5mmofthesurface–inotherwords,afingerhoveringthisfaroverthesurfacecannotbereliablydistinguishedfromafingertouchingthesurface.Thereasonforthisambiguityistwo-fold:first,adepthcameraseesonlythetopofafinger,notthebottom,meaningthatitisimpossibleinacompletelygeneralsensetodetermineifauseristouchingthesurfaceorwhethertheysimplyhaveathinnerfinger.Thesecondissueisthatdepthcamerashavesignificantnoise,whichisoftenexacerbatedbyplacingthefingerveryclosetothesurface.Consequently,itcanbeverydifficulttodeterminewhattheactualdistancetothefingeris.

Inproposedwork,Iwillinvestigatevariouswaysinwhichthishoverdisambiguationcanbeper-formedmoreaccurately.Machinelearning,whichIhaveusedinafewpriorprojectsforhumansensing,presentsapotentialsolution.Iplantolearnthedifferencebetweenatouchandanon-touchusingasmallwindowofdepthdatasurroundingthefingertip,toseewhetheramachinelearningalgorithmcanautomaticallydistinguishthesestates.

Inparallel,Ialsoplantoexplorestrongerheuristicmethodstodetectatouch,takingintoaccountthe finger’swidth (to infer finger thickness), thedistance to the sensor, and the fittedplane

RobertXiaoProposal–March29,2017

71

model.Improvinghoverdisambiguationdirectlyimprovestheuserexperience,astheycanmorenaturally lift their fingersbetweentoucheswithoutworryingabout the touchcontactstayingactive.

7.2.3 ReducingLatencyTouchlatencyisanimportantfactorintheuserexperience.Hightouchlatencydelaystheuser’sinputfeedbackcycle,whichslowsdownall interactionsandresults in lowerinteractionband-width.On-worldinteractionshavestricterlatencyrequirementsthane.g.mobilephonescreens,becausethelargersurfaceareaencouragesfastermotions.Forafixedtouchlatencyvalue,thefastermotionscausetheuser’spositiontodivergemorefromthesensedpositionwhencom-paredwithanequivalentinteractiononamobilephone.Furthermore,thelowupdaterateoftypicaldepthcameras(usually30fps)combinedwiththeirsophisticated,computationally-ex-pensivedepthprocessingstagesproduceshighsensorlatency,evenbeforeanytouchprocessingtakesplace.

Toalleviatetheselatencyissues,Iproposeexploringforwardprediction–theuseofhistoricaltouchdatatopredictwheretheuser’stouchlocationisinthefuture.Althoughthisisnotper-fectlyaccurate, forward-predictedtouchpositionsmayhelp to reduceperceived latency,andthereforeachievethegoalofshorteningtheinputfeedbackcycle.Secondly,Iwillinvestigatetheuseofpre-touchtechniquestopredictwhenauser’sfingerwillcontactthesurface,allowingthesystem to respondpre-emptively to touch events. Both techniques havebeenpreviously ex-ploredinthetouchscreenliterature,buthavenotbeenexploredforon-worldinteractionswheretheycouldhavegreaterimpact.

7.2.4 HandGesturesWithplane-basedtouchtracking,itbecomespossibletodetecthandsabovethesurface,simplybycomputingthesurface-relativeheightofincomingdepthpixels.Thisinturnenablesthesys-temtomixin-airgesturesandtouchinput.In-airgesturesprovidehighlevelsofexpressivityandnatural3D-spaceinteraction,whiletouchprovideshapticfeedback,precisecontrolandnon-fa-tiguingoperation.In-airgesturesalsoformanatural“out-of-band”inputmodalitywithrespecttotouchinput,allowingthemtoe.g.summonoractivateinterfaceswhichcanthenbetouched.Therefore,thesetwomodalitiesnaturallycomplementeachother,makingitpossibletohaveaninputsystemwhichoffersbothhighexpressivityandhighprecision.

Avarietyofhandgesturesarepossible,andwillbeexploredinproposedwork.Handgesturescouldbeusedtosummoninterfaces(e.g.a“launching”or“summoning”gesturetopullupanapplicationlist),todestroyinterfaces(e.g.a“crunchingup”,“throwaway”gesture),toresizeandrepositioninterfaces,andvariousotherinterfacemanagementgestures.Withinanapplication,gesturescouldbeusedtomanipulatecontent(e.g.select,placeorresizeanobject),withtouchinteractionsusedtorefinetheconfiguration.Or,anapplicationmightusehandgesturestoim-plementsimplediscreteactions,suchascontrollingmediaplaybackornavigatingback/forwardinawebbrowser,withtouchinteractionsusedformorecomplexorprecisemanipulationslikescrubbing,scrolling,textselection,andsoon.

RobertXiaoProposal–March29,2017

72

Inproposedwork,Iwillinvestigatemerginghandgestureswithtouch,drawingonpastresearchonin-airgesturesformobiledevices(e.g.Air+Touch[Chen2014])andmyownexplorationsonthetopic(e.g.Gaze+Gesture[Chatterjee2015]).

7.3 DevelopingOn-WorldApplicationsInparalleltoimprovingtouchinteraction,Ialsowanttoexpandtherangeofapplicationsoftwarethatcanbebuiltforon-worldinteraction.Myexistingresearchhasbeenfocusedonlower-levelinteractionsandbuildingblocks,withtoydemoapplicationssimplyforshow.Toexpandonthese,Iwillinvestigatethehigh-levelproblemsofon-worldapplicationdevelopment,distributionanduse.

7.3.1 APIs&SDKsOneoftheclassicwaystoobtainalargeapplicationlibraryistoprovideamechanismforrunningexistingapplications.Ourexisting systems, suchasWorldKit,DIRECTandDesktopography,allhavethecapabilitytodisplayrectifiedgraphicsandreceivemultitouchinput,makingitpossibleinprincipletorunarbitrarytouch-sensitiveapplications.Consequently,inproposedwork,Iwilldevelopacompatibilitylayertorunarbitrarymobile-optimizedwebpagesontheworld,effec-tivelytreatingsurfaceslikegianttablets.Thisenablestheexecutionofexistingmobilewebappsdirectlyontheworld,providingabreadthofexistingfunctionality,aswellasenablingthedevel-opmentofnewon-worldapplicationsusingwell-knownwebAPIs.

Inmobiledevelopment,thereisfrequentlyadistinctionbetween“webapps”and“nativeapps”.Webappshaveaccesstoasubsetofthedevice’sfunctionalitythroughthewebbrowser,whereasnativeappshaveaccesstothefullbreadthoffunctionalityofferedbythedevicemanufacturer.Examplesoffunctionalityavailabletonativeapps,butnotwebapps,includeaccesstocertainsensors(e.g.videocamera,high-speedaccelerometerdata),accesstomoreprivatedata(suchascontact lists,calendarappointments,etc.)andtheability torun in thebackground.Analo-gously,webapps in our on-worldoperating environmentwill have access to touch input andgraphicaloutput,butwillnothaveaccesstodatasuchasthephysicalenvironmentgeometry,objectspresentintheworld,orexecutioncontext(e.g.whichroom,surfaceorspacetheappli-cationisrunningin).

WorldKittouchedbrieflyonsomeoftheseissues,withthenotionof“syntheticsensors”whichcansensevariousaspectsoftheenvironment,andwithaprogramminginterfacethatprovidesthistypeofdata.Inproposedwork,IwillexpandupontheseearlyAPIstoprovideaccesstodatalikesurface-rectifieddepthandcolourdata,recognizedobjects(e.g.recognizededges,perDesk-topography), environment geometry (e.g. the surface orientation and height, to determinewherethe interface isrelativetotheuser)andadvanceduser inputdata(e.g.handgestures,fingerorientation).This“worldAPI”willenablethedevelopmentofmoresophisticated“native”on-worldapplications.

RobertXiaoProposal–March29,2017

73

7.3.2 AppManagementMobiledevicesnowhavemanystrategiesformanaginglargecollectionsofapplications,andforobtainingnewapplications(e.g.throughastore).However,theseissuesarelargelyunexploredinthespaceofon-worldapplications.Iproposeexploringwaysofmanagingexistingapplications–switchingbetweenapplications,summoningandlaunchingnewapplications,andclosingdownunusedapplications.WhileDesktopographytouchedonsomeofthesebehaviorsbriefly,thereismuchmoreworktobedoneinthisspace.Forexample,oneofthemajorchallengesofanon-worldinterfaceisinmovingyourdigitalcontextfromonespacetoanother–forexample,whenmovingbetweenthestudyandthekitchen,thesystemshouldselectasetofapplicationsthatwillbeimmediatelydisplayedinthenewspace.Thisneedstoconsiderbothcontextandneed–perhapstheuserwantstokeepthedocumenttheywerereading,andaddarecipeapplication,butdoesnotwanttokeeptheirappointmentcalendar.

Asforobtainingnewapplications,Iproposeexploringwhatan“appstore”foron-worldapplica-tionsmight look like.Applicationsmightneedtodeclarewhatkindsofphysical requirementstheyneed–spacerequired,whethertheycooperatewithspecificobjectsintheenvironment–andalsoconsidertheprivacyimplicationsofallowingapplicationsaccesstothecamerasensorsintheon-worldsystem.Userswillwanttorundifferentkindsofapplicationsontheirenviron-ment,andsoitisinterestingtoconsiderhowtoconveytheapplicationrequirementsofasophis-ticatedon-worldapplicationtotheenduser.

7.3.3 LabDeploymentFinally,totrulyexploreon-worldinterfacesinarealisticenvironment,Iproposetodeploythesysteminapracticallabsetting,foreverydayuseoveraperiodofthreemonths.Iplantoplacemultipleprototypesthroughouttheenvironment,andgatherfeedbackonhowtheyareused.Additionally, interviewswith labmemberswillbeusedtoestablishwhatkindsof interactionsuserswanttohave–forexample,whatsortofapplicationsuserswanttorunontheworld,orhowtheyexpectthesystemtobehaveincornercases.Throughthislabstudy,Ihopetorevealthechallengesandopportunitiesassociatedwithwidedeploymentofon-worldinterfaces.

Akeyadvantageofon-worldinterfacesisthattheyshouldcooperatewiththeexistingenviron-ment,insteadofreplacingthemoutright,andalabdeploymentprovidestheperfectopportunitytotestthishypothesis.Inaddition,byallowingawiderangeofuserstointeractwiththesysteminanaturalsetting,Ihopetolearnmoreaboutusagepatternsandbehaviours,whichwillhelptoinformfuturedesignersofon-worldappliances,devicesandapplications.

RobertXiaoProposal–March29,2017

74

CHAPTER8. REFERENCES[Agarawala 2006] Agarawala, A. and Balakrishnan, R. Keepin’ It Real: Pushing the

Desktop Metaphor with Physics, Piles and the Pen. In Proc. CHI ‘06, 1283-1292.

[Agrawala 2001] Agrawala, M. and Stolte, C. Rendering effective route maps: improv-ing usability through generalization. In Proc. SIGGRAPH ‘01, 241-249.

[Akaoka 2010] Akaoka, E., Ginn, T., and Vertegaal, R. DisplayObjects: prototyping functional physical interfaces on 3d styrofoam, paper or cardboard models. In Proc. TEI ‘10. 49-56.

[Arai 1995] Arai, T., Machii, K, Kuzunuki, S. and Shojima, H. InteractiveDESK: a computer-augmented desk which responds to operations on real ob-jects. In CHI Companion ‘95, 141-142.

[Arduino] Arduino. http://www.arduino.cc [Ashbrook 2008] Ashbrook, D., Lyons, K., and Starner, T. An investigation into round

touchscreen wristwatch interaction. In Proc. MobileHCI ‘08. 311-314. [Augsten 2010] Augsten, T., Kaefer, K., Meusel, R., Fetzer, C., Kanitz, D., Stoff, T.,

Becker, T., Holz, C., and Baudisch, P. Multitoe: high-precision inter-action with back-projected floors based on high-resolution multi-touch input. In Proc. UIST ‘10. 209-218.

[Avrahami 2002] Avrahami, D. and Hudson, S.E., Forming interactivity: a tool for rapid prototyping of physical interactive products. In Proc. DIS ‘02, 141-146.

[Bancroft 1985] Bancroft, S. An algebraic solution of the GPS equations. IEEE Trans. Aerospace and Electronic Systems, vol. AES-21, pp. 56-59, Jan. 1985.

[Baudisch 2004] Baudisch, P., Cutrell, E., Hinckley, K. and Gruen, R., Mouse Ether: Accelerating the Acquisition of Targets Across Multi-Monitor Dis-plays. In Proc. CHI 2004, 1379-1382.

[Bazo 2014] Bazo, A. and Echtler, F. Phone proxies: effortless content sharing be-tween smartphones and interactive surfaces. In Proc. EICS ‘14. 229-234.

[Benko 2005] Benko, H. & Feiner, S., Multi-Monitor Mouse. In Proc. CHI 2005, 1208-1211.

[Benko 2007] Benko, H. & Feiner, S., Pointer Warping in Heterogeneous Multi-Monitor Environments. In Proc. Graphics Interface, 2007, 111-117.

[Benko 2012] Benko, H., Jota, R. and Wilson, A. Miragetable: freehand interaction on a projected augmented reality tabletop. In Proc. CHI ‘12. 199–208.

[Bi 2011] Bi, X., Grossman, T., Matejka, J., Fitzmaurice, G. Magic desk: bring-ing multi-touch surfaces into desktop work. In Proc. CHI ‘11.

[Bimber 2005] Bimber, O. and Raskar, R. 2005. Spatial Augmented Reality: Merging Real and Virtual Worlds. A. K. Peters, Ltd., Natick, MA, USA.

RobertXiaoProposal–March29,2017

75

[Bodart 1994] Bodart, F., Hennebert, A., Leheureux, J. and Vanderdonckt, J. To-wards a dynamic strategy for computer-aided visual placement. In Proc. AVI ‘94, 78-87.

[Bondarenko 2005] Bondarenko, O. and Janssen, R. Documents at Hand: Learning from Paper to Improve Digital Technologies. In Proc. CHI ‘05, 121-130.

[Brooks 1997] Brooks, R. A. The Intelligent Room Project. In Proc. International Conference on Cognitive Technology ‘97, 271.

[Butler 2008] Butler, A., Izadi, S., and Hodges, S. SideSight: multi-"touch" interac-tion around small devices. In Proc. UIST ‘08. 201-204.

[Caffery 1998] Caffery, J. and Stuber, G. L. Overview of radiolocation in CDMA cel-lular systems. IEEE Commun. Mag., 36(4), pp. 38-45, Apr. 1998.

[Canny 1986] Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6), 1986, 679-698.

[Cao 2008] Cao, X., Wilson, A.D., Balakrishnan, R., Hinckley, K., Hudson, S.E. ShapeTouch: Leveraging Contact Shape on Interactive Surfaces. In Proc. Tabletop ‘08, 129-136.

[Carter 1981] Carter, G. C. Time delay estimation for passive sonar signal pro-cessing. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-29, pp. 463-470, June 1981.

[Chatterjee 2015] Chatterjee, I., Xiao, R. and Chris Harrison. 2015. Gaze+Gesture: Ex-pressive, Precise and Targeted Free-Space Interactions. In Proceed-ings of the 2015 ACM on International Conference on Multimodal In-teraction (ICMI ‘15). ACM, New York, NY, USA, 131-138. DOI: http://dx.doi.org/10.1145/2818346.2820752

[Chen 2014] Chen, X.A., Schwarz, J., Harrison, C., Mankoff, J. and Hudson, S.E. 2014. Air+touch: interweaving touch & in-air gestures. In Proceed-ings of the 27th annual ACM symposium on User interface software and technology (UIST ‘14). ACM, New York, NY, USA, 519-525. DOI=10.1145/2642918.2647392 http://doi.acm.org/10.1145/2642918.2647392

[Chen 2015] Chen, H., Lee, A. S., Swift, M. and Tang, J. C. 3D Collaboration Method over HoloLens™ and Skype™ End Points. In Proceedings of the 3rd International Workshop on Immersive Media Experiences (ImmersiveME '15). ACM, New York, NY, USA, 27-30. DOI=http://dx.doi.org/10.1145/2814347.2814350

[Cohn 2011] Cohn, G., Morris, D., Patel, S.N., and Tan, D.S. Your noise is my command: sensing gestures using the body as an antenna. In Proc. CHI ‘11. 791-800.

[Cong 1996] Cong, J., He, L., Koh, C. and Madden, P.H. Performance optimization of VLSI interconnect layout. Integr. VLSI J. 21(1-2), 1996, pp. 1-94.

[Crow 1984] Crow, F. Summed-area tables for texture mapping. In Proc. SIG-GRAPH ‘84. 207–212.

RobertXiaoProposal–March29,2017

76

[Di Battista 1998] Di Battista, G., Eades, P., Tamassia, R. and Tollis, I.G. 1998. Graph Drawing: Algorithms for the Visualization of Graphs (1st ed.). Pren-tice Hall PTR, Upper Saddle River, NJ, USA.

[Eastman Kodak 2003] Eastman Kodak Company. Kodak's Ergonomic Design for People at Work, 2nd Edition. 2003, pp. 48-49.

[Fails 2002] Fails, J.A. and Olsen, D. Light Widgets: Interacting in Everyday Spaces. In Proc. IUI ‘02, 63-69.

[Fogarty 2003] Fogarty, J. and Hudson, S.E. GADGET: A Toolkit for Optimization-Based Approaches to Interface and Display Generation. In Proc. UIST ‘03, 125-134.

[Gajos 2008] Gajos, K.Z., Weld, D.S. and Wobbrock, J.O. Decision-theoretic user interface generation. In Proc. AAAI ‘08, 1532-1536.

[Gajos 2010] Gajos, K.Z., Weld, D.S., and Wobbrock, J.O. Automatically generat-ing personalized user interfaces with Supple. Artificial Intelligence, vol. 174, 12-13 (August 2010), 910-950.

[Gartner 2015] Gartner, Inc. 2015. “Gartner Says 6.4 Billion Connected "Things" Will Be in Use in 2016.” November 10, 2015. http://www.gart-ner.com/newsroom/id/3165317

[Gebhardt 2014] Gebhardt, C., Rädle, R. and Reiterer, H. Integrative workplace: stud-ying the effect of digital desks on users' working practices. In CHI EA ‘14, 2155-2160.

[Good 1984] Good, M.D., Whiteside, J.A., Wixon, D.R., and Jones, S.J. Building a user-derived interface. Commun. ACM 27, 10 (October 1984), 1032-1043.

[Greenberg 2001] Greenberg, S. and Fitchett, C. Phidgets: easy development of physical interfaces through physical widgets. In Proc. UIST ‘01. 209-218.

[Gustafson 2008] Gustafson, S., Baudisch, P., Gutwin, C, and Irani, P., Wedge: Clutter-Free Visualization of Off-Screen Locations, In Proc. CHI 2008, 787-796.

[Haley 1988] Haley, J. Anthropometry and mass distribution for human analogues. Volume 1, 1988. Aerosp. Med. Res. Lab Wright-Patterson, Ohio.

[Han 2005] Han, J. Y. Low-cost multi-touch sensing through frustrated total in-ternal reflection. In Proc. UIST ‘05. 115-118.

[Hardy 2012] Hardy, J. Experiences: a year in the life of an interactive desk. In Proc. DIS ‘12, 679-688.

[Harrison 2008] Harrison, C. and Hudson, S.E. Scratch input: creating large, inexpen-sive, unpowered and mobile finger input surfaces. In Proc. UIST ‘08. 205-208.

[Harrison 2009] Harrison, C. and Hudson, S. Abracadabra: wireless, high-precision, and unpowered finger input for very small mobile devices. In Proc. UIST ‘09. 121-124.

RobertXiaoProposal–March29,2017

77

[Harrison 2010a] Harrison, C., Wiese, J., and Dey, A. K. “Achieving Ubiquity: The New Third Wave.” IEEE Multimedia, 17, 3 (July-September 2010), 8-12.

[Harrison 2010b] Harrison, C. Appropriated Interaction Surfaces. IEEE Computer Mag-azine, June 2010, 43(6). 86-89

[Harrison 2011] Harrison, C., Benko, H., and Wilson, A.D. OmniTouch: wearable multitouch interaction everywhere. In Proc. UIST ‘11. 441-450.

[Hartmann 2006] Hartmann, B., Klemmer, S.R., Bernstein, M., Abdulla, L., Burr, B., Robinson-Mosher, A., and Gee, J., Reflective physical prototyping through integrated design, test, and analysis. In Proc. UIST ‘06, 299-308.

[Hinckley 2004] Hinckley, K., Ramos, G., Guimbretiere, F., Baudisch, P. and Smith, M. Stitching: pen gestures that span multiple displays. In Proc. AVI ‘04. 23-31.

[Hodes 1997] Hodes, T.D., Katz, R.H., Servan-Schreiber, E. and Rowe, L. 1997. Composable ad-hoc mobile services for universal interaction. In Pro-ceedings of the 3rd annual ACM/IEEE international conference on Mobile computing and networking (MobiCom ‘97). ACM, New York, NY, USA, 1-12. http://dx.doi.org/10.1145/262116.262121

[Hudson 2006] Hudson, S.E. and Mankoff, J. Rapid construction of functioning phys-ical interfaces from cardboard, thumbtacks, tin foil and masking tape. In Proc. UIST ‘06. 289-298.

[Intel 2012] Intel Corporation. Object Aware Situated Interactive System (OASIS). Retrieved April 7, 2012: http://techresearch.intel.com/ProjectDe-tails.aspx?Id=84

[Ishii 1990] Ishii, H. TeamWorkStation: Towards a Seamless Shared Workspace. In Proc. CSCW ‘90, 13-26.

[Ishii 1999] Ishii, H., Wisneski, C., Orbanes, J., Chun, C., and Paradiso, J. Ping-PongPlus: design of an athletic-tangible interface for computer-sup-ported cooperative play. In Proc. CHI ‘99. 394-401.

[Izadi 2011] Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A. and Fitzgibbon, A. KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera. In Proc. UIST ‘11. 559-568.

[Jones 2010] Jones, B., Sodhi, R., Campbell, R., Garnett, G., and Bailey, B. Build your world and play in it: Interacting with surface particles on com-plex objects. In Proc. ISMAR ‘10. 165 - 174.

[Junuzovic 2012] Junuzovic, S., Quinn, K.I., Blank, T. and Gupta, A. IllumiShare: Shar-ing Any Surface. In Proc. CHI ‘12, 1919-1928.

[Kane 2009] Kane, S.K., Avrahami, D., Wobbrock, J.O., Harrison, B., Rea, A.D., Philipose, M. and LaMarca, A. Bonfire: a nomadic system for hybrid laptop-tabletop interaction. In Proc. UIST ‘09, 129-138.

RobertXiaoProposal–March29,2017

78

[Khalilbeigi 2013] Khalilbeigi, M., Steimle, J., Riemann, J., Dezfuli, N., Mühlhäuser, M. and Hollan, J. D. ObjecTop: Occlusion Awareness of Physical Objects on Interactive Tabletops. In Proc. ITS ‘13, 255-264.

[Khoshelham 2012] Khoshelham, K. and Elberink, S.O. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 2012, 437-454.

[Kim 2014] Kim, J., Seo, J. and Han, T-D. AR Lamp: interactions on projection-based augmented reality for interactive learning. In Proc. IUI ‘14, 353-358.

[Knowlton 1977] Knowlton, K. C. Computer Displays Optically Superimposed on Input Devices. Bell Systems Technical Journal, 53(3), 1977, 367-383.

[Koike 2001] Koike, H., Sato, Y. and Kobayashi, Y. Integrating paper and digital information on EnhancedDesk: a method for realtime finger tracking on an augmented desk system. ACM Trans. on Computer-Human In-teraction, 8 (4), 307-322.

[Kratz 2009] Kratz, S. and Rohs, M. HoverFlow: expanding the design space of around-device interaction. In Proc. MobileHCI ‘09. Article 4, 8 pages.

[Krueger 1985] Krueger, M. W., Gionfriddo, T. and Hinrichsen, K. VIDEOPLACE — an artificial reality. In Proc. CHI ‘85. 35-40.

[Landay 2001] Landay, J.A. and Myers, B.A. Sketching Interfaces: Toward More Hu-man Interface Design. Computer 34, 3 (March 2001), 56-64.

[Laput 2015] Laput, G., Yang, C., Xiao, R., Sample, A. and Harrison, C. 2015. EM-Sense: Touch Recognition of Uninstrumented, Electrical and Electro-mechanical Objects. InProceedings of the 28th Annual ACM Sympo-sium on User Interface Software & Technology (UIST ‘15). ACM, New York, NY, USA, 157-166. http://dx.doi.org/10.1145/2807442.2807481

[Lee 1985] Lee, S. K., Buxton, W. and Smith, K. C. A multi-touch three dimen-sional touch-sensitive tablet. In Proc. CHI ‘85. 21-25.

[Lee 2004] Lee, J.C., Avrahami, D., Hudson, S.E., Forlizzi, J., Dietz, P., and Leigh, D. The calder toolkit: wired and wireless components for rap-idly prototyping interactive devices. In Proc. DIS ‘04, 167-175.

[Lee 2005] Lee, J., Forlizzi, J. and Hudson, S.E. Studying the effectiveness of MOVE: a contextually optimized in-vehicle navigation system. In Proc. CHI ‘05, 571-580.

[Leo 2002] Leo, C. K. Contact and Free-Gesture Tracking for Large Interactive Surfaces. MEng Thesis, MIT Dept. of EECS and MIT Media Lab, May 2002.

[Linder 2010] Linder, N. and Maes, P. LuminAR: portable robotic augmented reality interface design and prototype. In Adj. Proc. UIST ‘10, 395-396.

RobertXiaoProposal–March29,2017

79

[Maeda 1998] Maeda, J., Iizawa, T., Ishizaka, T., Ishikawa, C. and Suzuki, Y. Seg-mentation of Natural Images Using Anisotropic Diffusion and Link-ing of Boundary Edges. Pattern Recognition, 31(12), 1998.

[Malone 1983] Malone, T. W. How do people organize their desks?: Implications for the design of office information systems. ACM Trans. Inf. Syst. 1(1), 1983, 99-112.

[Matsushita 1997] Matsushita, N. and Rekimoto, J. HoloWall: designing a finger, hand, body, and object sensitive wall. In Proc. UIST ‘97. 209-210.

[Maynes-Aminzade 2007] Maynes-Aminzade, D., Winograd, T., and Igarashi, T. Eyepatch: pro-totyping camera-based interaction through examples. In Proc. UIST ‘07. 33-42.

[Mulder 2003] Mulder, J., Jansen, J. and Rhijn, V. An affordable optical head track-ing system for desktop VR/AR systems. In Proc. EGVE ‘03, 215-223.

[Mulloni 2011] Mulloni, A, Seichter, H. and Schmalstieg, D. Handheld augmented reality indoor navigation with activity-based instructions. In Proc. MobileHCI ‘11. 211-220.

[Nacenta 2008] Nacenta, M., Mandryk, R., and Gutwin, C, Targeting across display-less space. In Proc. CHI 2008, 777-786.

[Nacenta 2009] Nacenta, M., Gutwin, C., Aliakseyeu, D., and Subramanian, S., There and Back again: Cross-Display Object Movement in Multi-Display Environments, JHCI, 24, 1, 2009, 170-229.

[NASA 1995] NASA. Anthropometry and Biomechanics. NASA-STD-3000: Man-Systems Integration Standards, Volume 1, Section 3. Revision B, July 1995.

[Newman 1992] Newman, W. and Wellner, P. A desk supporting computer-based in-teraction with paper documents. In Proc. CHI ‘92, 587-592.

[Nielsen 2004] Nielsen, M., Störring, M., Moeslund, T.B. and Granum, E. (2004) A procedure for developing intuitive and ergonomic gesture interfaces for HCI. Int'l Gesture Workshop 2003, LNCS vol. 2915. Heidelberg: SpringerVerlag, 409-420.

[Olsen 2008] Olsen, D. 2008. Interactive viscosity. In Proceedings of the 21st an-nual ACM symposium on User interface software and technology (UIST ‘08). ACM, New York, NY, USA, 1-2. http://dx.doi.org/10.1145/1449715.1449717

[Paradiso 2000] Paradiso, J., Hsiao, K., Strickon, J., Lifton, J. and Adler, A. Sensor Systems for Interactive Surfaces. IBM Systems Journal, Volume 39, Nos. 3 & 4, October 2000, pp. 892-914.

[Paradiso 2002] Paradiso, J., Leo, C., Checka, N. and Hsiao, K. Passive acoustic sens-ing for tracking knocks atop large interactive displays. In Proc. IEEE Sensors ‘02. 521-527.

RobertXiaoProposal–March29,2017

80

[Paradiso 2005] Paradiso, J. and Leo, C. Tracking and Characterizing Knocks Atop Large Interactive Displays. Sensor Review, vol. 25, no. 2, pp. 134-143, 2005.

[Pinhanez 2001] Pinhanez, C.S. The Everywhere Displays Projector: A Device to Cre-ate Ubiquitous Graphical Interfaces. In Proc. UbiComp ‘01, 315-331.

[Processing] Processing. http://www.processing.org [Raskar 1998] Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L. and Fuchs, H.

The office of the future: a unified approach to image-based modeling and spatially immersive displays. In Proc. SIGGRAPH ‘98, 179-188.

[Raskar 2003] Raskar, R., Baar, J., Beardsley, P., Willwacher, T., Rao, S. and For-lines, C. iLamps: Geometrically Aware and Self-Configuring Projec-tors. In Proc. SIGGRAPH ‘03, 809-818.

[Rekimoto 1997] Rekimoto, J. Pick-and-drop: a direct manipulation technique for mul-tiple computer environments. In Proc. UIST ‘97. 31-39.

[Robertson 1999] Robertson, C. and Robinson, J. Live paper: video augmentation to simulate interactive paper. In Proc. MULTIMEDIA ‘99, 167-170.

[Saba 2012] Saba, E.N., Larson, E.C. and Patel, S.N. Dante vision: In-air and touch gesture sensing for natural surface interaction with combined depth and thermal cameras. In Proc. IEEE ESPA ‘12. 167-170.

[Schmidt 2010] Schmidt, D., Chong, M. K. and Gellersen, H. HandsDown: hand-con-tour-based user identification for interactive surfaces. In Proc. Nor-diCHI ‘10. 432-441.

[Schmidt 2012] Schmidt, D., Molyneaux, D. and Cao, X. 2012. PICOntrol: using a handheld projector for direct control of physical devices through vis-ible light. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST ‘12). ACM, New York, NY, USA, 379-388. http://dx.doi.org/10.1145/2380116.2380166

[Sears 1993] Sears, A. Layout Appropriateness: A Metric for Evaluating User In-terface Widget Layout. IEEE Trans. Softw. Eng. 19(7), 1993, pp. 707-719.

[Seewoonauth 2009] Seewoonauth, K., Rukzio, E., Hardy, R. and Holleis, P. Touch & con-nect and touch & select: interacting with a computer by touching it with a mobile phone. In Proc. MobileHCI ‘09. Article 36 , 9 pages.

[Sellen 2003] Sellen, A. and Harper, R. The myth of the paperless office. The MIT Press, Cambridge, London, 2003.

[Steimle 2010] Steimle, J., Khalilbeigi, M., Mühlhäuser, M. and Hollan, J.D. Physical and digital media usage patterns on interactive tabletop surfaces. In Proc. ITS ‘10, 167-176.

[Steimle 2013] Steimle, J., Jordt, A. and Maes, P. Flexpad: highly flexible bending interactions for projected handheld displays. In Proc. CHI ‘13. 237-246.

RobertXiaoProposal–March29,2017

81

[Sugita 2008] Sugita, N., Iwai, D. and Sato K. Touch Sensing by Image Analysis of Fingernail. In Proc. SICE Annual Conference ‘08. 1520-1525.

[Tang 2008] Tang, A., Greenberg, S., and Fels, S. Exploring video streams using slit-tear visualizations. In Proc. AVI ‘08. 191-198.

[Underkoffler 1998] Underkoffler, J., Ishii, H. Illuminating light: an optical design tool with a luminous-tangible interface. In Proc. CHI ‘98, 542-549.

[Underkoffler 1999] Underkoffler, J., Ullmer, B., Ishii, H. Emancipated pixels: real-world graphics in the luminous room. In Proc. SIGGRAPH ‘99, 385-392.

[Vyas 2012] Vyas, D. and Nijholt, A. Artful surfaces: an ethnographic study ex-ploring the use of space in design studios. Digital Creativity, 23(1), 2012, 1-20.

[Wagner 2003] Wagner, D. and Schmalstieg, D. 2003. First Steps Towards Handheld Augmented Reality. In Proc. ISWC ’03.

[Wagner 2005] Wagner D., Pintaric T., Ledermann F. and Schmalstieg D. Towards Massively Multi-user Augmented Reality on Handheld Devices. In Pervasive 2005.

[Wang 2009] Wang, F. and Ren, X. Empirical evaluation for finger input properties in multi-touch interaction. In Proc. CHI ‘09. 1063-1072.

[Weiser 1999] Weiser, M. The Computer for the 21st Century. SIGMOBILE Mob. Comput. Commun. Rev. 3, 3 (July 1999), 3-11.

[Welch 2000] Welch, G., Fuchs, H., Raskar, R., Towles, H., and Brown, M., Pro-jected Imagery in Your Office in the Future. IEEE Computer Graphics and Applications, Jul-Aug 2000, 20, 4, 62-67.

[Wellner 1991] Wellner. P. The DigitalDesk calculator: tangible manipulation on a desk top display. In Proc. UIST ‘91, 27-33.

[Wellner 1993] Wellner, P. Interacting with paper on the DigitalDesk. Communica-tions of the ACM, 36 (7), 87-96.

[White 1980] White, R. M. Comparative anthropometry of the hand. No. NA-TICK/CEMEL-229. Army Natrick Research and Development Labs, MA. Clothing Equipment and Materials Engineering Lab, 1980.

[Wilson 2004] Wilson, A. TouchLight: An Imaging Touch Screen and Display for Gesture-Based Interaction. In Proc. ICMI ‘04. 69-76.

[Wilson 2005] Wilson, A. PlayAnywhere: A Compact Interactive Tabletop Projec-tion-Vision System. In Proc. UIST ‘05. 83-92.

[Wilson 2007] Wilson, A.D., Depth-Sensing Video Cameras for 3D Tangible Tab-letop Interaction. In Proc. Tabletop ‘07, 201-204.

[Wilson 2010a] Wilson, A.D. Using a depth camera as a touch sensor. In Proc. ITS ‘10. 69-72.

[Wilson 2010b] Wilson, A.D. and Benko, H. Combining multiple depth cameras and projectors for interactions on, above and between surfaces. In Proc. UIST ‘10. 273-282.

RobertXiaoProposal–March29,2017

82

[Wimmer 2010] Wimmer, R., Hennecke, F., Schulz, F., Boring, S., Butz, A. and Hußmann, H. Curve: revisiting the digital desk. In Proc. NordiCHI ‘10, 561-570.

[Wobbrock 2005] Wobbrock, J.O., Aung, H.H., Rothrock, B., and Myers, B.A. 2005. Maximizing the guessability of symbolic input. In CHI ‘05 EA. 1869-1872.

[Wobbrock 2009] Wobbrock, J.O., Morris, M.R., and Wilson, A.D. User-defined ges-tures for surface computing. In Proc. CHI ‘09. 1083-1092.

[Xiao 2013] Xiao, R., Harrison, C., and Hudson, S.E. WorldKit: Rapid and Easy Creation of Ad-hoc Interactive Applications on Everyday Surfaces. In Proc. CHI ‘13, 879-888.

[Xiao 2014] Xiao, R., Lew, G., Marsanico, J., Hariharan, D., Hudson, S.E. and Harrison, C. Toffee: enabling ad hoc, around-device interaction with acoustic time-of-arrival correlation. In Proc. MobileHCI ‘14. 67-76.

[Xiao 2015] Xiao, R., Schwarz, J. and Harrison, C. Estimating 3D Finger Angle on Commodity Touchscreens. In Proc. ITS ‘15. 47-50.

[Zeidler 2013] Zeidler, C., Lutteroth, C., Sturzlinger, W. and Weber, G. The Auck-land Layout Editor: An Improved GUI Layout Specification Process. In Proc. UIST ‘13, 343-352.