full body immersion in ar

This is actually the page of my thesis and will be discarded after theprint out. This is done because the title page has to be an even page. Thememoir style package used by this template makes different indentationsfor odd and even pages which is usally done for better readability.

University of AugsburgFaculty of Applied Computer Science

Department of Computer ScienceBachelor’s Program in Computer Science

Bachelor’s Thesis

Full Body Immersion in Augmented Reality

using Motion Capturing suit and video see-through

technologies.

submitted by

Ali Saidon 31.07.2014

Supervisor:Prof. Dr. Elisabeth Andre

Adviser:Ionut Damian

Reviewers:Prof. Dr. Elisabeth Andre

Dedicated for the glory of mankind!

Abstract

The purpose of this study is to immerse the user into augmented reality. Weaim at achieving realistic interactions between the user and the augmentedreality scene, for that we provide an augmented environment fitted withseveral virtual characters. The user controls the characters within the sceneusing body movements and gestures to experience concepts such as naturalinteractions and action mirroring. This is accomplished using the x-sensfull body motion detection suit and the Oculus Rift head mounted virtualdisplay along with 2 web-cameras to capture the real environment. Thevirtual environment is generated using the Unity game engine over C#.We describe an evaluation study of the showcase application featuring aninteractive scenario with a virtual agent where several subjects were fittedwith the x-sens suit and the Oculus rift display to evaluate the system andgive feedback via survey questionnaires. We conclude the study with theresults and findings of the evaluation survey and give recommendations forfuture work.

v

Acknowledgments

I would like to express my gratitude to my advisor Ionut Damian for the con-tinuous support of my research, for his patience, motivation and knowledge.I would also like to thank Prof. Dr. Elisabeth Andre for her encouragementand inspiration.

I would like to thank my family and my friends, for the stimulatingdiscussions and the constant help.

vii

Statement and Declaration ofConsent

StatementHereby I confirm that this thesis is my own work and that I have docu-mented all sources used.

Ali Said

Augsburg, 31.07.2014

Declaration of ConsentHerewith I agree that my thesis will be made available through the libraryof the Computer Science Department.

Ali Said

Augsburg, 31.07.2014

ix

Contents

Contents i

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . 31.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theoretical Background 52.1 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Augmentation Methods . . . . . . . . . . . . . . . . . 52.1.2 Registration . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Oculus Rift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 X-sens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Implementation 93.1 Software Integration . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Engine Selection . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Character Design . . . . . . . . . . . . . . . . . . . . . 9

3.2 Hardware Modifications . . . . . . . . . . . . . . . . . . . . . . 103.2.1 Camera Setup . . . . . . . . . . . . . . . . . . . . . . . 103.2.2 Hardware Adjustments . . . . . . . . . . . . . . . . . . 113.2.3 Software Implementation . . . . . . . . . . . . . . . . . 12

4 Augmented Environment 154.1 The Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.1 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2.2 The Task . . . . . . . . . . . . . . . . . . . . . . . . . . 17

i

ii CONTENTS

5 Evaluation 195.1 The Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2 The Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Summary 236.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Bibliography 25

List of Figures 27

Chapter 1

Introduction

The field of augmented reality has existed for over a decade and has beengrowing steadily ever since, the basic goal of most augment reality appli-cations is to enhance the user’s perception of the real world and introducenew interactions using virtual objects that co-exist in the same space asthe user. While augmented reality definition broadens beyond visual aug-mentation, this paper is mainly concerned with optical augmented reality.[1]

Virtual reality environment supply the user with a surrounding virtualscene where the user’s real position, orientation and motion are occludedand thus not perceived by the user. Augmented reality is the middle groundbetween the real world and the virtual environment, where the real worldis still the premise of the application but it’s populated by virtual objects.The user perceives the real world via either projection display; where thevirtual information is projected directly onto the physical objects that areto be augmented, or see-through displays where the user mounts a headdisplay that renders virtual data on top of a live feed of real world; thisstudy uses video see-through via head-worn display.

To integrate the 3D virtual objects into the real environment in realtime, either optical see-through displays or video see-through displays canbe used. Virtual objects in conventional optical see-through displays do notocclude real objects in the background but rather appear as low opacityimages on top and control over the field of view is limited. However videosee-through can implement designed scene completely since the user’s wholefield of view is computer generated rather than projected. To supply thevideo see-through system with live data at a low latency, the stream of video

1

2 CHAPTER 1. INTRODUCTION

input had to be fed to the head mounted display with minimal processingto avoid shuttering or delays. To simulate realistic and intuitive perceptionof the world, this study uses two web-cameras placed over the display tosimulate eyes. The system has to ensure that the set-up is parallax free;meaning that the user’s eyes and the camera share the same path in termsof camera alignment, field of view and angle. [2]

The user needs to be placed as a first person previewer of the augmentedworld thus arises the need to track the user’s body, orientation and location.User tracking is very crucial for augmented reality registration. TypicallyAR systems can use several tracking techniques such as video tracking, mag-netic tracking or markers. Video tracking involves video processing of thescene using computer vision algorithms which can lead to a significant de-lay. Magnetic tracking requires the pre-building of the virtual environmenton top of a known physical location (typically indoors) which allows thedesigning of all the objects of interest within the given area for augmentedinteractions. Using markers demands modifying the real environment withcolored fiducial markers and sensors at known locations then monitoringthe various interactions of the user using mounted video cameras to trackthe markers. This study introduced another technique to track the user,the selected methodology was full body tracking via a motion detectionsuit that allows for head orientation and body movement tracking as wellas limb location and gesture recognition. As a result the location of mostvirtual objects was designed to be bound to the user; rendering the need ofsetting up a controlled environment unnecessary. [3]

1.1 Motivation

Augmented reality enhances the user’s perception and interaction with thereal world to allow users to perform tasks that may not be available tohis senses otherwise. This paper is introducing the combination of severalrecent techniques of augmented reality and showcasing the result into aninteractive application. The used methods are full body tracking and videosee-through augmentation. Positive results can be a step forward towardsvarious applications such as simulations and entertainment applications.Simulation applications can vary from medical experiments simulations tofull body applications such as dancing simulations, this can be used widelyas a teaching methodology with the teacher manifested as a virtual charac-ter within the same environment as the user. Some entertainment applica-tions may include immersion in gaming where the user can exist within thegame environment with his real body occluded by armor and certain ges-

1.2. MATERIAL AND METHODS 3

tures used to fire weapons or execute commands. With full body trackingtechnology deployed, the possibilities remain vast.

1.2 Material and Methods

To meet the specifications of the application, a certain standard of hardwarewas required for capturing and previewing results. For video see-throughaugmentation, the selected hardware tool was the Oculus Rift. The Oculusrift is a head-mounted display screen that views a computer generated videostream, while the Rift actually provides head tracking data it was not usedwithin the project for better synchronization. The favored motion detectionhardware for user tracking was the X-sens full body motion tracking suit.The x-sens MVN suit is fitted with sensors that outputs a skeleton model ofthe user’s body. To capture the real world and stream to the Oculus rift, twoLogitech C920 Web-Cameras were used for their field of view, low latencycapturing and to match the rift’s display since the required camera doesn’tneed to match human eye standards; simply the rift’s field of view andoutput quality. The supporting software selected was Unity game engineas an medium to integrate all the hardware equipment over their respectivesoftware.

1.3 Objectives

The aim of the project is creating an augmented reality system where thewhole physical body of the user is immersed into the Augmented environ-ment where the application is situated in the real world and the point ofview is completely determined by the user’s body location and orientation,and the virtual information is relayed by a head worn video see-throughdisplay, where the user is mobile and able to move freely and no extraequipment or devices are needed for virtual interface.

1.4 Outline

This thesis is divided into 5 chapters, the first chapter explains the neededTheoretical background, followed by the implementation chapter concerningboth software and hardware integrations, the third chapter is concernedwith the created augmented environment. The fourth chapter discuss theevaluation process and the findings. The final chapter is the summary,

4 CHAPTER 1. INTRODUCTION

conclusion and future work recommendations. The flow of the thesis followsthe actual work course taken during the study.

Chapter 2

Theoretical Background

2.1 Augmented Reality

By definition Augmented reality is a variation of virtual reality where theuser isn’t fully immersed into the virtual world but rather the virtual objectsare superimposed over the real world. The ultimate goal of Augmentedreality applications is to surround the user’s real world with virtual objects.Thus all augmented reality systems need real time rendering methods forgenerating the virtual scene elements so that they can overlay the realobjects. Augmented reality systems typically face several problems andchallenges that range from selecting the blending approach to handlingregistration and sensing errors.

2.1.1 Augmentation Methods

The main question when designing an Augmented reality system is themethod of augmentation; how to merge the virtual and real worlds. Thebasic choices are optical and video technologies. Optical see-through headmounted display work by placing partially transmissive optical combinersin front of the user’s eyes so that vision of the real world isn’t occluded, theyare also partially reflective so that the user can see virtual images bouncedoff the combiners. On the other hand, video see-through displays are closedview displays mounted with cameras. The video cameras provide the userwith vision of the real world and the video is combined with the virtualimages generated by the software to achieve the blending. The selectedmethod was video see-through since it allows better control over the user’s

5

6 CHAPTER 2. THEORETICAL BACKGROUND

view since both the virtual and real images are available for editing in digitalformat, the optical method gains control over only the virtual side of thescene and problems can arise such as synchronization and obscuring issues.

Figure 2.1: User wear-ing the proposed ARset-up consisting ofthe motion capturesuit and video see-through head display

2.1.2 Registration

For the illusion that the virtual and real worlds aremerged to hold, the objects in both worlds haveto be properly aligned. Registration address theproblem of proper alignment which can cause sev-eral problems ranging from failure to achieve thetask at hand in the augmented application to mo-tion sickness. Registration errors are difficult tosolve due to the numerous sources of error. Staticerrors arise even when the user and objects in theenvironment are still such as optical distortion andtracking errors. Dynamic errors only arise when ob-jects or user start moving and are usually concernedwith the system’s end-to-end delay. Accurate posi-tioning and registration requires exact tracking ofthe user. Environment sensing is a tracking methodwhere the position of all objects of interest in theenvironment are tracked using cameras and fiducialmarkers, computer vision techniques can be used toidentify the markers accurately. Another method isuser body tracking where either a part of the usercan be tracked like his head and hands, or his fullbody can be tracked using a full body motion de-tection suit, the later method is used in this study.[7]

2.2 Oculus Rift

One of the most conventional virtual reality deviceset-ups, and thus augmented reality, is the headmounted display; a device worn over the head cov-ering the eyes. There are several commercially produced displays that sup-port stereoscopic display and tracking systems. There is one display screen

2.3. X-SENS 7

per eye and thus stereoscopic vision is achieved by creating two virtualcameras on the software; one for each display.

The Oculus Rift is a headset built for gaming. It provides a large fieldof view of 110 degrees stereoscopic vision and head tracking; although therift’s head tracking wasn’t used in this study due to the reliance on thex-sens MVN body tracking, the rift’s light weight of 379g ensured comfortand mobility for the user, also the front most surface of the rift is only 5cm in front of the user’s eyes meaning that mounting cameras on top willonly yield a small offset. The rift creates two virtual cameras that streamthe virtual environment to the display screens on scene render, the softwareapplies shaders and filters both pre-rendering and post-rendering to ensurelens corrections in color and curvature, allowing for the customization ofthe execution pipeline. [4]

2.3 X-sens

The usual approach of data input in Augmented reality systems focuses onmotion recognition. A device that can read and process natural movementsis a very intuitive method of human-computer interactions. One of thebiggest challenges when developing an AR system is body registration; thatis the geometric alignment of the virtual and real bodies coordinate systems.

The selected solution was the x-sens MVN motion capture suit for full-body human motion capture. It’s based on miniature inertial sensor, bio-chemical models and sensor fusion algorithms. The use of the MVN suiteliminates the need for markers and external cameras and can be used bothoutdoors and indoors, it can capture any type of body movements such asrunning or jumping. The user wears the suit fixed with mechanical trackersutilizing fixed or flexible goniometers; angle measuring devices that providejoint angle data to kinematic algorithms. Accurate orientation estimationrequires the use of signals from gyroscopes, accelerometers and magnetome-ters; accelerometers provide vertical direction by sensing acceleration dueto gravity while magnetometers provide horizontal stability by sensing thedirection of earth’s magnetic field.

The suit consists of 17 MTx inertial and magnetic 3D sensors connectedto 2 Xbus masters used to synchronize and power the sensors as well ashandle wireless streaming to the computer where the stream is played outby the MVN studio software. The MVN studio is a graphical software thatallows the user to observe real time stream recordings of the subject orplayback previously recorded tracking. MVN studio has an output stream

8 CHAPTER 2. THEORETICAL BACKGROUND

Figure 2.2: The tracked skeleton by the motion capture suit used withinunity over a virtual model

than can be directed over network to a given port, thus allowing the trackedskeleton to be used over another computer for a real-time application. [6]

Chapter 3

Implementation

3.1 Software Integration

3.1.1 Engine Selection

Several engine options were explored initially over Unity but each had it’sown problems in the form of compatibility and implementation challenges.Unity was found to be the most suitable option for creating the virtualenvironment. Unity is a portable program available on most platformsallowing for a range of choices with future applications, also Unity hasextensive physics engine capabilities and a very flexible execution pipelinethat allows control over the rendering process at various states, but mostlyimportantly unity supports both x-sens full body motion tracking suit andthe Oculus Rift head display via plug-ins, along with several available plug-ins for virtual models and animations that were used within the showcaseto simulate most of the interactive tasks.

3.1.2 Character Design

X-sens provides it’s own unity plug-in that links to the x-sens motion track-ing software called MVN studio, the plug-in grants access over the MVNmotion capture data stream allowing real-time viewing and manipulation ofthe tracking skeleton. Oculus Rift also provides a unity plug-in; the Oculusrift supplies a virtual set of OVR cameras that capture the virtual environ-ment around normally, then the OVR cameras streams the captured sceneto the Oculus rift head display screen, populating the user’s field of view

9

10 CHAPTER 3. IMPLEMENTATION

with the captured virtual content.The two plug-ins where edited together into a humanoid model where

the MVN skeleton was bound to our virtual model’s body overlay so thatthey share the same movements and gestures, then the virtual OVR cameraswere installed over the head of the virtual character to represent the eyes;while the Oculus rift plug-in includes head-tracking capabilities, we usedthe x-sens skeleton with all required motion tracking for clearer and moresynchronized data. This setup allowed for a humanoid virtual model thatmirrors the user completely creating a copy into the virtual world. Sincethe real Oculus rift is to be mounted over the user’s head and the virtualOculus rift is mounted on the model’s head, and the model mimics theuser’s movements in the real world, the module actually occludes the user’sbody within the application, meaning that the user can see the virtualmodel’s hand over his real hand when looking down. This allows for gesturerecognition to trigger on body effects such as a control pad mounted overthe user’s hands used to control the environment.

3.2 Hardware Modifications

3.2.1 Camera Setup

Figure 3.1: The camera set-up installation over the Oculus Rift display

3.2. HARDWARE MODIFICATIONS 11

Augmented reality requires the blending of real and augmented worlds,cameras need to be used to capture the world from the user’s perspective sothat the real-time video can be processed and augmented with the graphicalcontent to be viewed by the user. The camera requirement should not matchhuman vision but rather the head display screen.

There are several key requirements to match between the Oculus Riftand the cameras:

1. 800x600 pixel resolution per camera to match the rift eye resolution.

2. 1.33:1 sensor display since the rift uses side-by-side stereo with 1.33:1vertical to horizontal and thus the cameras have to be mounted inportrait mode.

3. 120°field of view lens to match the rift’s.

4. 60Hz capture rate to match the rift’s refresh rate.

The selected camera was the Logitech C920 web-cameras which wereslightly modified to match the stated requirements. Wrongly mounted cam-eras can lead ti several problems such as convergence issues and motionsickness. The cameras had to mounted in a way that ensures comfort andsmooth precision, but more importantly be intuitive such that the eye con-vergence to focus and eye rotation in the virtual plane should match thereal process.[5]

3.2.2 Hardware Adjustments

The main problem was achieving the required degree of precision. The firstcomponents were 3D printed arms that can slide onto the front of the rift,making the cameras adjustable horizontally. Then the camera needed to bemounted on top of the 3D arms. There are two main ways to mount a stereocamera rig, either parallel or toed-in. To design a parallel camera set-up,physically modifying the set-up by horizontally shifting the lens. Toed-incameras are rotated inwards such that their optical axes intersect at a givenpoint midway of the scene rather than at infinity. While each set-up has it’sown problems, the selected method was toeing in the cameras for simplicity.To achieve that further 3D pads were printed and the cameras were parallelmounted on top of the pads, then the pads were installed over the 3D armssuch that they create a horizontally adjustable angle with the arms, withboth opposite pads rotated towards the center they create a set-up with afocus point of 1 meter in front of the user’s eyes.

12 CHAPTER 3. IMPLEMENTATION

3.2.3 Software Implementation

The game engine Unity’s offering method when dealing with live input fromwebcameras is a texturing technique called WebcamTexture, where unitycan list all connected camera devices and play a selected stream as an outputto a texture. A texture is a render of an image that can be applied overa given surface. The initial approach was applying the WebcamTextureto a plane that lies directly in front of the Oculus rift’s virtual cameras(OVRcamera) thus populating it’s field of view and restricting it to thetexture. A problem arose because this technique meant that the surfaceholding the texture had to be bound to the Oculus’s virtual camera, makingthe distance at which this surface is present and the movements a dynamicvariable which needs adjustment per scene. The initial approach intendedto leave the execution pipeline of the Oculus unity plug-in intact, since thatintroduced other problems another solution had to be found.

Unity also offers a view stream within the execution pipeline, a stackwhich can hold different images per frame and compile them all togetherto create the actual scene frame. The second approach involved manipu-lating the OVRcamera rendering pipeline such that the WebcamTexture isapplied pre-rendering into the OVRcamera view stream; on rendering theOVRcamera simply views the virtual scene created by unity in front of itand then adds it to the view stream. This effectively results in makingthe WebcamTexture the background to the scene and the virtual objectsthe foreground of that same scene, which simulates augmentation. Severaltransformation had to be applied over the WebcamTexture so that it canfit appropriately into the stream and be dynamically adjustable via customshaders.

3.2. HARDWARE MODIFICATIONS 13

Algorithm 3.1 Virtual OVRCamera integration algorithm with the RealLogitech Camera into the unity execution pipeline.

/* Code in Parent Class OVRCameraController */

// geting all connected webcam devices

WebCamDevice [] devices = WebCamTexture.devices;

// initiating devices with dimensions and frame rate

WebCamTexture left = new WebCamTexture(devices [0].name ,

1280, 720, 30);

WebCamTexture right = new WebCamTexture(devices [1].name ,

1280, 720, 30);

// passing WebcamTextures to children

CameraRight.GetComponent <OVRCamera >().SetWebcam(right);

CameraLeft.GetComponent <OVRCamera >().SetWebcam(left);

/* Code in Children Class OVRCamera */

// clearing the execution pipeline

GL.Clear(true , true , camera.backgroundColor);

//start recording from the real camera

logitechCameraTexture.Play();

// applying appropriate shaders to the texture material

rotation = GetComponent <OVRLensCorrection >().GetRotMaterial(

yAdd , xAdd , Zoom);

// merging the WebcamTexture with the virtual CameraTexture

that views the virtual scene.

Graphics.Blit(logitechCameraTexture , CameraTexture , rotation

);

Chapter 4

Augmented Environment

4.1 The Scene

Figure 4.1: An overview of the virtual scene populated by the virtual char-acters and models within unity

The augmented environment is populated with several virtual charactersand virtual interactions; the main control panel is mounted on the user’sbody and thus, a virtual model of the user’s physical body had to be de-signed. The user’s model is a one-to-one replica of the actual user takingit’s skeleton animation as a stream via the x-sens full body motion trackingsuit. Two extra versions of the same model are duplicated each with a 90degree rotation to be looking sideways from the main models perspective as

15

16 CHAPTER 4. AUGMENTED ENVIRONMENT

to simulate real-time one-to-one mirroring of the user. Directly in front ofthe user is a virtual agent acting as a guide within the augmented premiseto guide the user through the available options and tasks. Also presentare two virtual teddy bears, designed to simulate automated behavior, theteddy bears are programmed to follow the user in the augmented environ-ment as he moves around. Finally there are two virtual characters to becontrolled; the first is a virtual character that can either jump or wavebased on user input, the second is a virtual auto-bot that moves around isa square per user command. All the virtual characters are controlled viathe main control panel projected over the user’s left wrist and accessiblevia the user’s right hand.

4.2 Interactions

4.2.1 Actions

Figure 4.2: The home pad of the main control panel used to manipulatethe virtual objects

The main control panel is the basic method to manipulate all the eventsand characters in the augmented scene. To enable the panel the user has totouch his left wrist using his right hand, this pops up the control panel thatprovides several options as a multi level displays. The base display simplyoffers navigation to the three other displays or closing the panel. The firstdisplay provides control over the first virtual character, where the user can

4.2. INTERACTIONS 17

either make the character jump or wave. The second display grants controlover the virtual auto-bot, to either toggle it’s visibility in the virtual sceneor to trigger it’s square based movement. The third display offers the userthe option to change it’s virtual model’s outfit; effectively changing whatthe user seems to be wearing. It also allows the enabling of the virtualteddy bears to either follow the user or stand still.

4.2.2 The Task

The user is expected to complete a virtual task that he’s guided throughby the virtual agent. After briefly introducing the virtual environment andcharacters the agent asks the user to enable the main control panel andtry applying manipulation over the first virtual character, then suggestschanging his virtual outfit. After that the user is encouraged to experimentall the available options. Last the user should complete the interactive taskthat is kicking a virtual ball towards the agent, upon hitting the agent thetask is completed and the user is free to further explore the environment.

Chapter 5

Evaluation

5.1 The Survey

The survey questionnaire included items that evaluates the realism and ef-fect of augmented immersion, also believability and spacial presence. Thesurvey subjects were selected randomly from diverse backgrounds. The sub-jects were introduced to the augmented reality application and expected tofollow descriptions and instructions provided by the virtual guide to ac-complish a given task. The survey questions focused on how believable thevirtual environment was and how the interactions and movement perceptionby the system felt. It also asked about the effect of the virtual agent guid-ing the user on the experience. The experiment procedure was explainedbeforehand to the users as well as a more vague description of the task andthe environment. The task as presented by the virtual guide, required thetest subjects to explore the environment by following the given description,events marked by the agent would trigger the state advancement of the task.The final state required the user to kick a virtual ball at the agent, uponsucceeding to do so the virtual guide goes to idle as to leave the subjectto explore the surrounding environment freely. Answers to the questionsare provided on a 5-point scale ranging from strongly disagree to stronglyagree.

5.2 The Results

Several conclusions can be derived from the survey. The majority of thesubjects agreed that the experience felt realistic and that they achieved

19

20 CHAPTER 5. EVALUATION

immersion into the virtual environment with all the characters, which con-cludes that the experience set-up was successful and can be used with fu-ture applications. However the virtual guide helpfulness was not capturingenough since subjects stated that the guide was distracting from the task,showing the need for better human-computer interaction to allow for betterhuman simulation by the program. While it was agreed that the characteranimations were normal, the subjects also agreed that the control panelinteractions were natural, they also agreed that there was neither a de-lay nor a premature response within virtual interactions, this was due tothe optimization of the pipeline to achieve smoother execution. Despitelittle comments on the head display pixelation, most users agreed that thedisplay’s quality and consistency was acceptable, however room for develop-ment in the form of better cameras and higher quality displays still stands.The extensive amount of wiring made movement limited but the subjectsagreed that the hardware was comfortable.

The main point of disagreement was the body movement tracking, thetask of kicking the ball proved challenging because the virtual body respon-sible for all interactions in the real world didn’t align perfectly with the realbody, while this didn’t introduce difficulty while interacting with the con-trol panel using the virtual hand, kicking the ball with the virtual feetproved very difficult. The subjects commented that the difference betweenthe virtual and real bodies was the main reason for failure to complete thetask and that model needed better alignment. Some users also commentedabout the environment since the experiment carried out in a closed roomwith no fixed settings, making some virtual characters pass through wallsor real objects. The survey conclusion proves that while the need for bet-ter equipment and programming would prove yet helpful, the main pointthat requires perfection is alignment, full body tracking aligned the usercorrectly but video-see through technology introduced other problems suchas scaling, displacement and the need for better limb positioning.

The proposed solution is designing real body measurement into the vir-tual model so that the virtual and real bodies share the same body heightand dimensions, also positioning the virtual replica of the head display cam-eras according to the real one as well, since the real cameras distant theuser’s actual eyes by a 4 centimeters that should be accounted for, makingthe whole virtual set-up an exact replication of the user in real life includ-ing all the mounted hardware. Another conclusion is that the pre-designingof the virtual environment into the real one seemed crucial, to solve theenvironment merging problem without using markers or designing a fixedenvironment; depth mapping technology can be implemented using imageprocessing algorithms for better efficiency and minimal delay to build a

5.2. THE RESULTS 21

depth map of the real world.A depth map is a 3D map that presents theposition of each character by the opacity of that character in a video stream,usually captured by a depth camera, the depth map can be used to createa 3D model of the view in front of it, efficiently creating the real terraincapture into a virtual model, thus collisions and interactions of the real andvirtual will be easily handled.

Chapter 6

Summary

Augmented reality concerns merging the real and virtual worlds together toenhance the user’s perception of the surrounding environment. This paperintroduces an approach towards augmented reality using full body trackingand video see-through head mounted display. The selected hardware fortracking was the x-sens motion capture suit and the selected head mounteddisplay was the Oculus Rift, the hardware integration took place over theunity game engine. A virtual model was designed to reflect the user’s bodyin the virtual environment that matches the user’s motion and was fittedwith the rift’s camera controller to act as a point of view into the virtualworld. Merging the virtual and real worlds required capturing the real viewin front of the user and playing it back to him via the Rift, two cameraswere fitted on top of the Rift and their output was used as the backgroundvideo for the virtual scene.

To evaluate the system a showcase application was designed where theuser had several virtual characters to interact with and a virtual guide toexplain the surroundings and clarify the required task. Several users tookpart as test subjects and were presented with a questionnaire to evaluatethe experience. The main conclusion is that the exact replication of theuser’s real body into the virtual environment was most crucial such thatthe user’s real and virtual bodies completely align. Since the virtual limpswere responsible for all interactions within the augmented reality, any slightmis calibration or noticeable difference would cause confusion and failureto complete the task by the user. Another conclusion is the need of control-ling the surrounding environment so that the virtual characters don’t walkthrough real walls or objects. Since the reason for using full body track-

23

24 CHAPTER 6. SUMMARY

ing was to eliminate the need of designing the virtual environment to fita specific real place for application use, the suggested approach to solvingthat problem was depth mapping, this was to be achieved by using a depthcamera to produce a depth map of the real world that can be transformedinto a 3D space and built into the virtual world.

6.1 Future Work

As part of future work towards better immersion with augmented realityset-ups, several aspects can be improved and introduced. A device thatcan save a lot of processing time and hardware issues is a head worn dis-play with embedded cameras for the purpose of video see-through, wherethe device can have 2 cameras already positioned over eyes that streamdirectly to it’s own virtual screen where the position and zoom of the cam-era feed can be designed into the device. A useful addition is the use ofdepth cameras, this proves useful with the design of a depth map of thereal world and can also be used to calibrate the user’s real body dimensionsand thus allow for hybrid tracking along with full body tracking for opti-mal registration, the tracked body can be fed into a gesture recognizer toreact to user’s movements and postures. A fully wireless set-up is easy toachieve and can be very useful for better user mobility and comfortableness.Finally introducing augmented reality to another sense will improve bodyimmersion, this can be achieved by sound augmentation, playing back sur-rounding sounds to the user using headsets and microphones after addingthe virtually created sounds.

Bibliography

[1] Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner, SimonJulier, and Blair MacIntyre. Recent advances in augmented reality. ComputerGraphics and Applications, IEEE, 21(6):34–47, 2001. [cited at p. 1]

[2] Ronald T Azuma et al. A survey of augmented reality. Presence, 6(4):355–385, 1997. [cited at p. 2]

[3] Hirokazu Kato and Mark Billinghurst. Marker tracking and hmd calibrationfor a video-based augmented reality conferencing system. In Augmented Real-ity, 1999.(IWAR’99) Proceedings. 2nd IEEE and ACM International Work-shop on, pages 85–94. IEEE, 1999. [cited at p. 2]

[4] Paul Milgram, Haruo Takemura, Akira Utsumi, and Fumio Kishino. Aug-mented reality: A class of displays on the reality-virtuality continuum. InPhotonics for Industrial Applications, pages 282–292. International Societyfor Optics and Photonics, 1995. [cited at p. 7]

[5] Ye Pan, William Steptoe, and Anthony Steed. Comparing flat and sphericaldisplays in a trust scenario in avatar-mediated interaction. In Proceedings ofthe 32nd annual ACM conference on Human factors in computing systems,pages 1397–1406. ACM, 2014. [cited at p. 11]

[6] Daniel Roetenberg, Henk Luinge, and Per Slycke. Xsens mvn: full 6dof humanmotion tracking using miniature inertial sensors. Xsens Motion TechnologiesBV, Tech. Rep, 2009. [cited at p. 8]

[7] Bruce Thomas, Benjamin Close, John Donoghue, John Squires, PhillipDe Bondi, Michael Morris, and Wayne Piekarski. Arquake: An outdoor/in-door augmented reality first person application. In Wearable Computers, TheFourth International Symposium on, pages 139–146. IEEE, 2000. [cited at p. 6]

25

List of Figures

2.1 User wearing the proposed AR set-up consisting of the motioncapture suit and video see-through head display . . . . . . . . . 6

2.2 The tracked skeleton by the motion capture suit used withinunity over a virtual model . . . . . . . . . . . . . . . . . . . . . . 8

3.1 The camera set-up installation over the Oculus Rift display . . 10

4.1 An overview of the virtual scene populated by the virtual char-acters and models within unity . . . . . . . . . . . . . . . . . . . 15

4.2 The home pad of the main control panel used to manipulate thevirtual objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

27

List of Algorithms

3.1 Virtual OVRCamera integration algorithm with the RealLogitech Camera into the unity execution pipeline. . . . . . 13

29

full body immersion in ar

Documents