development of an open-source api for augmented reality for the

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet

gnipökrroN 47 106 nedewS ,gnipökrroN 47 106-ES

LiU-ITN-TEK-A--11/005--SE

Development of an Open-SourceAPI for Augmented Reality for

the Android SDKPatrik Arthursson

Yin Fai Chan

2011-02-17

LiU-ITN-TEK-A--11/005--SE

Development of an Open-SourceAPI for Augmented Reality for

the Android SDKExamensarbete utfört i medieteknik

vid Tekniska högskolan vidLinköpings universitet

Patrik ArthurssonYin Fai Chan

Handledare Jimmy JonassonHandledare Mikael Karlsson

Examinator Matt Cooper

Norrköping 2011-02-17

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Patrik Arthursson, Yin Fai Chan

Abstract

Augmented reality has in recent years become very popular in commercial areassuch as entertainment and advertising. The fastest growing field right now isaugmented reality for mobile devices. This is because the rapidly increasingperformance can manage the heavy operations which only desktop computersused to handle. So far traditionally standard markers have been used, but inthis paper we will take a look at a different technique, natural feature tracking.The final product will be an open source augmented reality API for Androidfreely available online.

Contents

1 Introduction 61.1 Combitech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Purpose and goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Our task and expected outcome . . . . . . . . . . . . . . . . . . . 71.4 Timeplan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 92.1 Augmented reality . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Field of use . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2.1 Advertising . . . . . . . . . . . . . . . . . . . . 102.1.2.2 Entertainment . . . . . . . . . . . . . . . . . . . 112.1.2.3 Information and navigation . . . . . . . . . . . 11

2.1.3 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3.1 Marker Tracking . . . . . . . . . . . . . . . . . . 122.1.3.2 Natural feature tracking . . . . . . . . . . . . . . 12

2.2 Development environment . . . . . . . . . . . . . . . . . . . . . . 142.2.1 Open source . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.2 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2.1 Android software development kit (SDK) . . . . 142.2.2.2 Dalvik virtual machine (Dalvik VM) . . . . . . 152.2.2.3 Native development kit (NDK) . . . . . . . . . 15

2.2.3 Development tools and software . . . . . . . . . . . . . . . 152.2.3.1 Eclipse IDE . . . . . . . . . . . . . . . . . . . . . 152.2.3.2 CMake . . . . . . . . . . . . . . . . . . . . . . . 162.2.3.3 Simplified wrapper and interface generator (SWIG)

16

3 Augmented reality on Android smartphones 173.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Restrictions of mobile devices . . . . . . . . . . . . . . . 183.1.1.1 Hardware restrictions . . . . . . . . . . . . . . . 183.1.1.2 OpenGL ES . . . . . . . . . . . . . . . . . . . . 18

3.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2

CONTENTS 3

3.2.1 OpenCV for Android . . . . . . . . . . . . . . . . . . . . 193.2.2 Camera calibration with OpenCV . . . . . . . . . . . . . 20

3.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3.1 Studierstube . . . . . . . . . . . . . . . . . . . . . . . . . 203.3.2 Qualcomm AR (QCAR) SDK . . . . . . . . . . . . . . . 213.3.3 AndAR, Android augmented reality . . . . . . . . . . . . 213.3.4 PTAM, Parallel tracking and mapping . . . . . . . . . . 213.3.5 Uniqueness of ARmsk . . . . . . . . . . . . . . . . . . . . 21

4 Design of ARmsk 234.1 Implementation of AR . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . 254.1.1.1 Features from accelerated segment test (FAST) 254.1.1.2 Speeded up robust features (SURF) . . . . . . . 264.1.1.3 Center surround extremas (CenSurE/STAR) . . 264.1.1.4 Detector of choice . . . . . . . . . . . . . . . . . 26

4.1.2 Feature description . . . . . . . . . . . . . . . . . . . . . . 274.1.3 Descriptor matching . . . . . . . . . . . . . . . . . . . . . 284.1.4 Pose estimation . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.4.1 Homography . . . . . . . . . . . . . . . . . . . . 294.1.4.2 Perspective-n-point (PnP) . . . . . . . . . . . . 30

4.1.5 3D rendering . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 API design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 ARmsk API . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Building an application with ARmsk API 35

6 Marketing of ARmsk 386.1 Naming the project . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 Promotional website . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2.1 Site structure . . . . . . . . . . . . . . . . . . . . . . . . . 396.2.2 Wordpress . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3 Version control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.4 Social networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7 Discussion 417.1 The task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.2.1 OpenCV for Python . . . . . . . . . . . . . . . . . . . . . 427.2.2 Building OpenCV . . . . . . . . . . . . . . . . . . . . . . 427.2.3 Outdated examples . . . . . . . . . . . . . . . . . . . . . . 42

7.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.3.1 JNI export . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8 Conclusion 448.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

List of Figures

1.4.1 A time chart describing the estimated time consumption for dif-ferent parts of the master thesis. . . . . . . . . . . . . . . . . . . 8

2.1.1 Reality-Virtuality Continuum proposed by Ronald Azuma. . . . 102.1.2 Tracking of markers. . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Tracking of natural features. . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 A HTC Desire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 The AR processing pipeline for ARmsk. Numbers in every rightcorner of the steps are section numbers in this paper. . . . . . . 24

4.1.2 a) The marker. b) A sample frame from the camera. . . . . . . . 254.1.3 Features detected on the (a) marker and in the (b) camera stream. 274.1.4 Matches between the marker and current frame. . . . . . . . . . 294.1.5 a) The red circles are outliers removed by RANSAC. b) With the

help of the homography matrix the orientation of the marker inthe stream can be calculated. . . . . . . . . . . . . . . . . . . . . 30

4.1.6 The final rendering with the markers pose estimation. . . . . . . 324.2.1 The structure of the ARmsk API. . . . . . . . . . . . . . . . . . . 34

5.0.1 Building order of ARmsk. . . . . . . . . . . . . . . . . . . . . . . 365.0.2 Application work flow. . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2.1 http://www.armsk.org, the promotional website for ARmsk. . . 39

4

List of Tables

3.1 Comparison between AR projects. . . . . . . . . . . . . . . . . . 22

4.1 Comparison of detectors in terms of rotation-, scale- and illumination-invariances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Comparison of feature detectors in terms of repeatability and speed. 26

5

Chapter 1

Introduction

In this paper we present an open source markerless Augmented Reality API forAndroid, named Augmented Reality Markerless Support Kit or ARmsk. Thisthesis was realised at Combitech AB in Linköping, Sweden.

1.1 CombitechCombitech AB is an independent consulting company within the Saab concern,with services in engineering, environment and security. Combitech has recentlystarted up a research department for visualization and simulation on handhelddevices, named Reality Center. ARmsk is one of the first projects started underthe new department.

1.2 Purpose and goalAt the time when the specification for this master thesis was written, there wasnot any open source toolkit or alternative that uses natural feature as markers.There was AndAR1 that did augmented reality for smartphones, but that wasdeveloped ontop of the ARToolKit2 which can not handle anything other thantraditional black and white markers. What is more, before using these markersthey need to be trained for the algorithms to recognize the marker. What peoplecall markerless is natural feature recognition in pictures and video in real-time,which is our ultimate goal, performance-wise. However, robustness and stabilitywill be prioritized over speed and performance for this thesis. Natural features,markers, robustness, stability and speed are terms that will be described lateron.

What we would like accomplish is to make things easier for future developersto work with augmented reality on Android and present an API where the

1http://code.google.com/p/andar/2http://hitl.washington.edu/artoolkit/

6

CHAPTER 1. INTRODUCTION 7

developer or user does not have be immersed into computer vision theory andcoding to be able to produce augmented reality with natural feature tracking.

1.3 Our task and expected outcomeDue to the time constraint, producing a fully functional bug-free API is notpossible, but what we want to deliver is at least an alpha version that is usableenough to help developers produce augmented reality applications for Android.While the API is not even at version 1.0, there will be bugs, all the features willnot be there, and performance will be lacking.

We wish for ARmsk to continue to be developed and become more stable,feature-packed and eventually reach an official release. All to contribute to theaugmented reality on Android community. For this to happen, ARmsk needs tobe put out there in the open source community in an appealing and promotingpackage. A website where news, pictures, videos and updates will be published.

Additionally, with the first version of the API the idea is to include an An-droid application that can be used for demonstrating the API. This applicationwill be updated alongside the API as new features come along.

1.4 TimeplanFor this project we had an initial timeplan in which we specified the workflowand time consumption. Parts of the project will be carried out simultaneously.

• Research, 2 weeks: Study Android SDK, Augmented reality on Androidand structure of open-source projects.

• Implementation, 15 weeks: Includes development of API, application andcommunity.

– API/Application, 12 out of 15 weeks: This part was divided into twoelements: design and development.

– Community, 3 out of 15 weeks: Create a webpage and publish project-related material online.

• Thesis & project diary - 3 weeks: Write the thesis for the last three weeks,and update the project diary during the project.

CHAPTER 1. INTRODUCTION 8

Figure 1.4.1: A time chart describing the estimated time consumption for dif-ferent parts of the master thesis.

Chapter 2

Background

2.1 Augmented reality

2.1.1 DefinitionAugmented Reality (AR) is a term for applications that add virtual informa-tion to real-life physical environments. There are in general two types of AR;projection based systems where transparent information get projected in frontof the viewer and camera based system where the environment is captured witha camera, processed and augmented. This paper focuses on and discusses im-plementation of the latter on embedded systems.

There is no official definition of the term ‘Augmented Reality’ but it iscommon, in the field of AR, to use Ronald Azuma’s definition[1]. It statesthat an AR system must have the following characteristics:

1. Combines real and virtual

2. Interactive in real time

3. Registered in 3D

The first point states that there needs to be a mixture of both virtual realityand the reality itself. Almost all movies nowadays use virtual information, orcomputer-generated imagery (CGI), and blend it with real-life footage for specialeffects. This is however not regarded as AR, since movies are not interactiveas the second point states. The third and last point states that it have to beregistered in 3D, as in some live sport broadcasts on TV. AR however excludese.g. weather forecasts because only 2D planar effects are used.

9

CHAPTER 2. BACKGROUND 10

Figure 2.1.1: Reality-Virtuality Continuum proposed by Ronald Azuma.

In Figure 2 seen on the far left side is the Real Environment, on the otherside is the Virtual Environment. Between these extremes there is Mixed Reality(MR), which includes both AR and Augmented Virtuality (AV). AV is theopposite to AR, it merges real information into a virtual environment.

The term Augmented Reality is believed to have been coined in 1990 byThomas Caudell, an employee at Boeing at that time. In 1992 L.B. Rosen-berg developed one of the first functioning AR systems, called VIRTUAL FIX-TURES, at the U.S. Air Force Armstrong Labs, and demonstrates benefits onhuman performance. [2, 3]

2.1.2 Field of useAR has formerly been difficult to apply for practical purposes on mobile devicesdue to AR-technology being very computationally heavy. But with current tech-nology, with faster and more powerful phones and smarter algorithm solutions,AR can be applied in most areas.

2.1.2.1 Advertising

Advertising is undoubtedly the fastest growing field within AR. Since the mid00’s marketers started to promote products via interactive AR applications. Theapplications were mainly run on stationary computers with a recording camera.For example, at the 2008 LA Auto Show, Nissan unveiled the concept vehicleCube and presented visitors with a brochure which, when held against a camera,showed several versions of the vehicle1. Another example is Burger King whichlaunched a Flash based web advertising campaign in 2009, where people couldhold up a dollar bill in front of the camera, and then watch the campaign offerswithout installing any software2 beforehand.

The new and faster smartphones opens up for AR-advertisement on handhelddevices. An example of this is an AR application which LG made available forthe release of their new Android phone LG Ally 3. The application is one of thefirst smartphone-based applications which use markerless tracking.

1http://nissan.t-immersion.com/2http://cargocollective.com/jeffteicher#191861/BK-AR-Banner3http://www.phonearena.com/news/Augmented-Reality-with-the-LG-Ally-and-IM2-

Comic-Book_id11576/


2.1.2.2 Entertainment

AR can be found in the entertainment sector too. There are various applicationsthat use the AR technology, but there are not many commercialized productsat the moment. However, in the gaming industry games are being released thatincorporates AR, such as Ghostwire4 for Nintendo DSi in 2010 and for Nintendo3DS there are already a couple of releases scheduled. There have also been afew applications developed during research processes, like [4][5][6] which includea racing, a tennis and a train game.

2.1.2.3 Information and navigation

Head-up displays (HUD) was one of the first fields where AR got fully inte-grated, it was early used to display transparent information in front of jet pilotseyes, with full interactivity, including eye pointing. HUD is also used in newgenerations of cars which projects useful information onto the driver’s wind-shield. Another type of interactive AR that has in recent years become popularis AR for smartphones. A great example is LayAR5, which is an AR informationapplication for both Android and iPhone OS. LayAR uses the mobile phone’scamera, compass, GPS and accelerometer to identify the user’s location, orien-tation and field of view. It will then retrieve data based on those geographicalcoordinates, and then overlay that data over the camera view. The data over-layed can also be customized to show specific data to the user’s liking, such asnearby gas stations or restaurants.

2.1.3 TrackingOne thing that all AR applications must use is some kind of tracking techniqueto determine the pose and position of the camera relative to the virtual 3Dinformation. There are in general two different methods to achieve good andstable results for AR and it is tracking with markers and tracking of naturalfeatures.

4http://www.ghostwiregame.com/5http://www.layar.com/


Figure 2.1.2: Tracking of markers.

2.1.3.1 Marker Tracking

A classic marker is in general a black and white 2D image that is placed wherethe tracking is supposed to take place. This high-contrast type of marker iseasy to find and track with basic image processing and does not require manyoperations. The camera position is determined by tracking the outer blackborder and the orientation by tracking the inner black figure. This technique isimplemented and optimized in the commonly used ARtoolkit library. Trackingof markers is however a bit outdated nowadays, since you actually need a printedblack and white marker to produce AR. Now there are novel techniques that useregular images, like logos and symbols, as tracking objects instead of markers.This will be described in the next section. Figure 36 shows AR that uses markertracking.

2.1.3.2 Natural feature tracking

Features is a rather general concept in computer vision and what to be calleda feature in an image is highly dependent on what is relevant information or apoint of interest for the proceeding operations. It can be edges, ridges, blobsor corners and vary from a single point to a region of points. The featuresrelevant for this project are most often found in the form of isolated points,continuous curves or connected regions in the image. Using a good algorithmfor feature detection is highly important since these points are most likely used

6Image taken from http://www.hitl.washington.edu/artoolkit/.


as a starting point, and following algorithms will only perform as good as thefeature detector will. Good features are features with high repeatability, whichin short means that the same point can be found in another image. Repeatabilityis an important concept and will be discussed more thoroughly in section 4.1.1.

There are many different types of feature detectors that performs differentlyfor different kinds of features. Below are a couple of common feature detectorsand their classifications listed:

• Sobel (edges)

• Harris & Stephens / Plessey (edges & corners)

• FAST (corners)

• Difference of Gaussian (corners & blobs)

• Determinant of Hessian (corners & blobs)

• MSER (blobs)

Natural feature tracking, or markerless tracking uses key features detected inevery frame from the camera and matches them with pre-specified features.This type of tracking is computationally heavy and has therefore not been usedfor embedded systems, until recent years. Figure 47 shows an example of hownatural feature tracking can be used, in this case the fingertips are the trackingtargets.

Figure 2.1.3: Tracking of natural features.

7Image taken from http://ilab.cs.ucsb.edu/projects/taehee/HandyAR/HandyAR.html.


2.2 Development environment

2.2.1 Open sourceARmsk is an open source software and is published under the GNU GPLv38

license. Open source means that anyone have the freedom to read, use, modifyand redistribute software source material. An open source software project canbe published under many different licences which grants recipients the freedomsotherwise protected by copyright laws. This allows the developer to alter codeuntil it suits the needs of the developer. There are many advantages with opensource since any developer may contribute to a project. At the times when manydevelopers are joined together software development and progress are faciliatedand many issues and bugs are found faster. As long as the communication worksopen source software projects can be developed very quickly and efficiently.There are several sites and tools that can manage and support an open sourcesoftware project, including maintaining communication and version control.

2.2.2 AndroidThe Android platform is an open source operating system developed and re-leased by Google. It is mostly found in mobile devices e.g. smartphones andtablets and comes with a lot of complete features and functionality such asmulti-tasking, mail, web and media management. Android is based on theLinux 2.6 kernel which is used as the hardware abstraction layer because it hasa proven driver model and also a lot of existing drivers. It provides memorymanagement, process management, a security model, networking and a lot ofcore operating system infrastructure that are robust and proven to work overtime. The Android platform for mobile devices intends to be a complete stackthat includes everything from operating system through to middle ware and upthrough to applications. Developing for Android is free and has allowed Androidto grow very fast since its release, both when it comes to new great applicationsand when it comes to smartphone manufacturers choosing Android as operatingsystem for their products. For developers Google has provided tools that makesit easier to develop for Android, the Android SDK and the Android NDK.

2.2.2.1 Android software development kit (SDK)

The Android SDK is a set of tools that are meant to aid developers when theyare developing Android applications and the Android platform itself. The 1.0version of the SDK9 was released in september 2008 and has since been updatedtogether with each release of the Android operating system. The version 1.0was the first stable release of the Android platform and allowed developersto prepare applications for commercially available smartphones. The SDK isavailable for Windows, Mac OS X, and Linux and includes tools as well as

8http://www.gnu.org/licenses/gpl.html9http://developer.android.com/sdk/


an Android smartphone emulator to run, test and debug applications. Theemulator is standalone and can be run in order to give users and developersa chance to interact with the operating system on Android handsets. Theemulator is commanded through something called the Android Debug Bridge(ADB) via the terminal or command line. Android includes a debug monitor andruns a Dalvik virtual machine to run and compile Java, the main developmentlanguage for Android applications.

2.2.2.2 Dalvik virtual machine (Dalvik VM)

The Dalvik Virtual Machine is developed especially for Android to meet theneeds of running in an embedded environment where battery, memory and CPUis limited. The Dalvik VM uses registers as storage instead of stacks and runsDEX files which are byte-code that are the result of converting, at runtime,.Class and .JAR files. So when these files are converted to .dex they become amuch more efficient byte-code that can run very well on small processors. Theyuse memory very efficiently and the data structures are designed to be sharedacross processes whenever possible. The Dalvik VM uses a highly optimizedbyte-code interpreter. The end result is that it is possible to have multipleinstances of the Dalvik VM running on a device at the same time, each in eachof several processes, efficiently.

2.2.2.3 Native development kit (NDK)

Java is the only supported programming language for creating Android ap-plications. However, it is possible to combine Java with C/C++, the nativelanguage for Android smartphones, through JNI. The provided NDK10 containsthe completed toolchain for cross compilation. It is based on the GNU Com-piler Collection and GNU make. With those tools developers are able to createshared libraries in the Executable and Linkable Format (ELF) used by Linux.Currently there are only a few libraries that are officially supported. Amongthose are libc, libm, libz, liblog and the OpenGL ES 1.1 libraries.

2.2.3 Development tools and software2.2.3.1 Eclipse IDE

The Eclipse IDE for Java developers11 is used for developing and building An-droid projects which is also the recommendation by the Android developmentteam. The Android specific functionality is enabled through the Android Devel-opment Tools (ADT) plug-in. It allows the creation of a mobile user interfacevia a visual editor. Managing and editing the C/C++ files and libraries aswell as makefiles for CMake and SWIG interface files is also done in Eclipse.Throughout the project, Eclipse Galileo was used.

10http://developer.android.com/sdk/ndk/11http://www.eclipse.org


2.2.3.2 CMake

Building native parts of ARmsk is done in the terminal or command line withCMake12, in this project version 2.8-2 was used. CMake reads makefiles thatspecify building details and generates executable files and in this case librariesfrom source code. Once built CMake figures out automatically which of thefiles it needs to update, based on which source files that have changed. Italso automatically determines the proper order for updating files, in case onenon-source file depends on another non-source file. This means that the wholeprogram does not need to be recompiled by changing a few source files. CMakeis not limited to any specific language.

2.2.3.3 Simplified wrapper and interface generator (SWIG)

SWIG13 is a software development tool that connects programs or librarieswritten in C and C++ with different high-level programming languages. SWIGis used in this project to connect the native library of ARmsk written in C/C++with Java for Android. The functions to be wrapped are specified with a SWIGinterface file, and needs to be added into the makefile to be included in thebuild. SWIG wrappers are generated during the build and are linked to thenative files. Then it is possible to call native classes and native functions fromJava or the language the wrappers are specified for, in the same syntax as if thefunctions were written in that very same language. The version used for thisproject is SWIG 2.0.0.

12http://www.cmake.org13http://www.swig.org

Chapter 3

Augmented reality onAndroid smartphones

3.1 HardwareA HTC Desire is used, figure 51, with Android 2.2 for testing during the devel-opment period. The Desire is equipped with a 3.7 inch AMOLED display with480 x 800 pixels resolution. With a 1.0 GHz Snapdragon processor, 576 MB ofRAM and a 5.0 megapixel color camera with video recording. This device isconnected to a Macbook Pro via USB and is working faster than the emulatorand also far more convenient by being a portable camera.

Figure 3.1.1: A HTC Desire

1http://www.htc.com/www/product/desire/overview.html

17

CHAPTER 3. AUGMENTED REALITY ON ANDROID SMARTPHONES18

3.1.1 Restrictions of mobile devices3.1.1.1 Hardware restrictions

There is a fundamental difference between a desktop workstation and a mobilehandheld device and that is the hardware. In mobile devices where size is avital factor there is just no way to have the same amount of processing power.Though in later years the hardware of mobile devices has been developed in arapid speed and gradually cathing up on desktop computers. The phone usedfor testing has 512MB ROM (which has separate partition reserved for the OS)and the 1.0 GHz in processing power is still way behind a modern computer.

Most mobile phone CPUs do not have a unit that solely calculates floating-points, a Floating Point Unit (FPU), in contrast to desktop CPUs. This forcesthe compiler to emulate the floating points calculations, rather than calculatedirectly on hardware, which are approximately 40 times slower than their cor-responding integer calculation[7]. According to [25] a well-written smartphoneapplication runs around 10 times slower than on a normal computer. While mostCPUs for mobile phones do not have parallel units that execute processes thereis the option to use multi-threading or interleaving for operation acceleration.

Due to the hardware limitations there are many techniques and operationsthat are totally infeasible to compute on current generation mobile phones.These might need to be rewritten, approached in a different way or simplifiedto suit the available resources.

3.1.1.2 OpenGL ES

Android supports OpenGL ES2 1.0 which is a 3D graphics library and is astripped down version of the OpenGL 3D API for desktops. ES is short forembedded systems and is specifically tailored for mobile devices. OpenGL ES1.0 corresponds to version 1.3 of the original OpenGL library. That it is strippeddown does not really mean that it lacks functionality, rather it includes only themost used functions to minimize redundancy. In OpenGL ES 1.x version fixed-point and floating-point profiles are included, however, support for floating-points are only applied on API level which means that the pipeline is merelydefined to be fixed-point based. In OpenGL ES 2.0 there will be support foronly floating-points. The biggest difference between these are the total exclusionof glBegin and glEnd in the API for embedded systems, that OpenGL ES doesnot manage single vertices, but instead you have to send references to arrays ofvertices.

Another restriction is that all textures must be square, with the size beinga power of two. This restriction not only applies to the embedded version ofOpenGL, but also to the standard edition until version 1.3, as stated by the APIdocumentation [8, p. 1103]. However this applies only to the function glTexIm-age2D, which is used to initially transfer the texture image data to the OpenGL

2http://www.khronos.com/opengles/


driver. When updating the texture through the function glTexSubImage2D1you may provide images of any size after all.

Other characteristics of the embedded edition of OpenGL is not relevant forthis paper. They can be found in [8, p. 739] and [9].

3.2 OpenCVOpenCV3 is an open source computer vision library filled with programmingfunctions that mainly aim at real time processing. The library lies under theopen source BSD4 license and is free to use for both academic and commer-cial use. It was developed by Intel and is now supported by Willow Garage.OpenCV was officially launched in 1999 as a research initiative of Intel to ad-vance processor intense applications. Early goals of OpenCV was to improvecomputer vision research and provide open and optimized code for this. It wasalso to spread knowledge of computer vision by providing a stable infrastructurethat could be further built on and developed. All this for free and be availablefor commercial use, that do not require a license to be open or free themselves.OpenCV reached its first version 1.0 release in 2006 and is now a wide and largelibrary that contains more than 500 optimized functions.

The code is written in C and thus portable to a selected amount of otherplatforms, such as DSP’s. To make the code more approachable and to reacha wider audience wrappers have been developed for the more popular program-ming languages such as Java, Python, C# and Ruby. In OpenCV version 2.0,released in 2009, C++ interface was implemented therein. It is backward com-patible with C, however, all the new functions and algorithms are written forthe new interface. We chose to use the OpenCV libraries primarily becausemost of the functionality needed for AR is already implemented. Not only that,there was also a certain OpenCV port for Android readily available.

3.2.1 OpenCV for AndroidThere are a couple of projects out there that ports parts of OpenCV to An-droid. However recently, an Android port has officially been integrated into theOpenCV library and can now be found on official OpenCV wiki5. This portwas formerly known as android-opencv and is the port that has been used asfoundation for ARmsk. More about this and exactly how OpenCV is used andthe port for Android will be discussed in detail in section 4.2.1.

3http://opencv.willowgarage.com/4http://www.opensource.com/licenses/bsd-license/5http://opencv.willowgarage.com/wiki/Android/


3.2.2 Camera calibration with OpenCVOpenCV has built-in methods for camera calibration, where it is possible todynamically estimate the intrinsic camera parameters and lens distortion for acamera in use. The intrinsic parameters encompass focal length, image formatand principal point. These are expressed with a 3x3 camera matrix, A.

A =

αx γ u00 αy v00 0 1

The parameters αx = f ·mx and αy = f ·my represent focal length in pixels,

where mx and my are scale factors. γ represents the skew coefficient betweenthe x and the y axis. u0 and v0 are the focal points, which would be in thecenter of the image.

The camera matrix can be estimated by taking multiple pictures from dif-ferent angles of a calibration rig, an object with known geometry and easilydetectable features. A common calibration rig is a black and white chessboard-like pattern since it has very distinctive edges and corners.

3.3 Related work

3.3.1 StudierstubeStudierstube6 is a software framework for the development of AR and VR ap-plications. This framework was first invented to develop the worldwide firstcollaborative AR application. Later the focus changed to support for mobileAR applications. Studierstube produces AR with natural feature tracking anddoes that with a close to realtime framerate and can also detect and track mul-tiple targets[20]. As the tracked targets increases to 5-6 targets including 3Drendering the framerate drops slightly, but the performance is still smooth andvery impressive. The framework is developed and written from scratch and em-bodies research of the last decade from the Graz University of Technology (TUGraz). Not long ago one of the main researchers behind Studierstube joinedQualcomm and has been working on a free AR SDK for Android, named Qual-comm AR. Studierstube for PC is freely available for download under GPL,while Studierstube ES for mobile phones is available commercially. The variantfor mobile phones builds on the scene graph library Coin3D and features thedevice management framework OpenTracker. Studiestube has during their re-search presented different useful techniques to achieve natural feature trackingon embedded devices in real time. One of them is PhonySIFT[24], a modifiedversion of the commonly used SIFT[14] algorithm.

6http://studierstube.icg.tu-graz.ac.at/


3.3.2 Qualcomm AR (QCAR) SDKThe QCAR SDK7 is developed by Qualcomm in 2010 and is free for developersto use in their AR applications. It is available for download on their site after aregistration, but it is closed source. QCAR allow developers to upload pictures,that will be used as tracking targets, online. These get processed and optimizedto be used together with the SDK. The uploaded target gets rated dependingon number of features and how well the features are distributed over the image.Then the processed resources can be downloaded in a .dat file and added totheir AR project. The applications based on QCAR performs really well andis able to deliver AR in real time, despite doing markerless tracking. The SDKcurrently supports Android 2.1 operating system and above. Qualcomm pro-vides developers with tools and resources to create Android AR applicationsbut does not provide a channel for commercial distribution. AR apps may bedistributed in the Android Marketplace subject to the AR SDK license termsinstead.

3.3.3 AndAR, Android augmented realityAndAR is an Open Source AR framework for Android and is released under theGNU General Public License. The framework is based on the open ARToolkitlibrary, and was developed by Tobias Domhan as his masters thesis. Since ituses ARToolkit, it is limited to tracking with markers. AndAR might be used asa foundation for AR projects that is capable of displaying 3D models on blackand white AR markers.

3.3.4 PTAM, Parallel tracking and mappingPTAM8 is a camera tracking system for AR. It requires no fiducial markers, pre-made maps, known templates, or inertial sensors. It is an implementation of themethod described in the paper [10]. This type of tracking for AR is called exten-sible tracking which basically is a system that tracks in scenes without any priormap with the use of a calibrated hand-held camera. In a previously unknownscene without any known objects or initialisation target the system builds a 3Dmap of the environment. This is done by connecting features from two imageswhich have been translated horisontally or vertically. Once a rudimentary mapis built, it is used to insert virtual objects into the scene, and these should beaccurately registered to real objects in the environment.

3.3.5 Uniqueness of ARmskARmsk is very closely related with all the projects in section 3.3 and share alot of their functionality. However, there are a few aspects that makes ARmskunique:

7http://developer.qualcomm.com/ar8http://www.robots.ox.ac.uk/~gk/PTAM/


Free Open source Track natural features Local marker trainingStudierstube ES No No Yes No

QCAR Yes No Yes NoAndAR Yes Yes No NoARmsk Yes Yes Yes Yes

Table 3.1: Comparison between AR projects.

PTAM is not included in the comparison since it produces AR with acompletely different tracking technique, it does not track markers, butinformation in the environment instead. Local marker training means that thedeveloper can define the marker, or tracking target, on the local device duringrun time. ARmsk is unique in the sense that it is a completely free opensource project, it provides a solution for tracking of natural features and cantrain markers locally.

Chapter 4

Design of ARmsk

4.1 Implementation of ARIn the following section the approach of implementing augmented reality formobile devices using different known techniques is described. Robustness isprioritized over speed therefore methods that are too computationally heavy fora modern mobile device will be used. Too computationally heavy meaning thatit is not likely for the mobile device to produce real-time framerate. Five majorsteps make up the pipeline of our AR implementation, as shown in figure 6.

23

CHAPTER 4. DESIGN OF ARMSK 24

Figure 4.1.1: The AR processing pipeline for ARmsk. Numbers in every rightcorner of the steps are section numbers in this paper.

For each run, as inputs for the pipeline are two images. First a static markerimage that in the initialization get processed through step 1 and 2. In step 3,this image will be used as the search target in the iterative matching process.Secondly we have an incoming camera frame that will run through the pipelineand get matched to the marker, pose estimated and then augmented with 3Dinformation which is done in steps 1 to 5. This is executed for every frame.


(a) (b)

Figure 4.1.2: a) The marker. b) A sample frame from the camera.

4.1.1 Feature detectionThe first step in the process of AR is to detect useful features in the image. Asthere are a couple of feature detection algorithms present it is necessary to findone that suits the requirements for the project at hand. The implementationneeds to foremost be robust, being fast is desirable, but comes in second. Bylooking at [18] it is possible to see how the detectors perform on a mobile device.However, all the detectors have parameters that can be varied to affect the result,still none of these are documented. Therefore it is only possible to take thesenumbers as an approximate measurement for how well the methods work for amobile device. A more thorough tweak of all the detectors is needed to musterthe highest repeatability and speed possible to see which detector that actuallysuits our problem best. Repeatability is a measure of how well a feature detectorperforms in finding the same feature in another image. It is determined by acombination of three invariances; rotation, scale and illumination. All of thefollowing detectors are found in the OpenCV library.

4.1.1.1 Features from accelerated segment test (FAST)

FAST[11, 12] is an extremely fast corner detector with high illuminance-invariancewhich is highly desirable for mobile devices. Unfortunately FAST lacks rotation-and scale-invariance, resulting in low repeatability. There are known problemswith noise for FAST, which can be reduced by setting an appropriate thresholdfor the detector. Though the threshold needs to be set differently for each case.


4.1.1.2 Speeded up robust features (SURF)

The SURF[13] detector, or Fast-Hessian detector, is based on the determinantof the Hessian matrix. It detects blob-like structures at locations where thedeterminant is a maximum. The SURF-detector is robust, provides good re-peatability and is relatively fast with the use of integral images[22]. SURF uses,just like SIFT, an image pyramid as scale space and sub-sample pixels in betweenlevels of the pyramid. This will provide a relative good repeatability when scalediffers. Also SURF can handle a great deal of luminance variance which makesit stable even when the ambient lighting is uneven. SURF is rotation-invariant.

4.1.1.3 Center surround extremas (CenSurE/STAR)

The CenSurE[15] feature detector, also known as STAR, computes features atthe extrema of the center-surround filters over multiple scales. Unlike SIFT andSURF, STAR uses the original image resolution for each scale. This makes theSTAR detector more exact in scale-invariance and raises the chances of featurerepeatability. The features are an approximation to the scale-space Laplacian ofGaussian and can almost be computed in real time using integral images. TheSTAR detector is sensitive for changes in illumination, but not for rotations.

4.1.1.4 Detector of choice

Detector Rotation-invariance Scale-invariance Illumination-invarianceFAST Low Low HighSURF Medium Medium HighSTAR Medium High Low

Table 4.1: Comparison of detectors in terms of rotation-, scale- and illumination-invariances.

Detector Repeatability SpeedFAST Low HighSURF High LowSTAR High Medium

Table 4.2: Comparison of feature detectors in terms of repeatability and speed.

Rounded into table 2 and table 3 is a comparison of the discussed feature detec-tors, fetched from [18], and STAR is the highest rated among the contenders.

FAST is tremendeously fast and finds a huge amount of features. This wasour initial selection of feature detector, but there are repeatability issues withthis detector and in the later stages of our implementation these issues causesmatching instability. Although being the detector that finds most feature points,


and doing that very fast, it picked up a considerable amount of noise as featureseven after tweaking the threshold parameter.

SURF feature detector has very good performance in terms of repeatability,but despite the fact that SURF is faster than SIFT, it is the slowest featuredetector of all the ones tested. It is very stable, but there are other detectorsthat are equally stable, but faster, such as the STAR feature detector. This isour detection of choice when it comes to stability.

STAR is very good and fast but lacks in illumination-invariance. This can betroublesome when trying to do AR where the markers are subjected to stronglighting and where reflections are prone to occur, for instance in an outside en-vironment. In spite of that, after examining the performance, STAR is regardedas the top choice.

(a) (b)

Figure 4.1.3: Features detected on the (a) marker and in the (b) camera stream.

4.1.2 Feature descriptionThe second step is to create image descriptors for the image. Image descrip-tors describe the visual information in sections of the image, and in our case,around each detected feature. The descriptors are used during the matchingprocess to find corresponding features in other images. Since features that candiffer in pose, angle and lighting conditions needs to be found it is necessaryto use an algorithm that generate descriptors that are scale-, rotation- andillumination-invariant. In OpenCV there are only two different descriptor algo-rithms implemented; SIFT and SURF. The newer SURF descriptor outperforms


SIFT in almost every aspect[13] and is an obvious choice for us to implement.The SURF descriptor extracts 20x20 pixel patches for each feature and cal-

culate the gradient grid for each patch. Descriptors can then be calculated usingthe gradient grid with the following procedure:

• Each 20x20 grid is divided into subregions of 5x5 gradients, resulting in atotal of 16 subregions.

• For each subregion, a four-dimensional descriptor vector is calculated,where every set of four element consists of:

– The absolute sum of differences (SAD) in x-direction.

– The SAD in y-direction.

– The sum of magnitudes in x-direction.

– The sum of magnitudes in y-direction.

This means that each descriptor, with 16 subregions and respectively assiociatedvector, has a total length of 64 elements. There is also an extended version ofthe SURF descriptor with 128 elements, which additionally calculates the sumof differences in positive and negative x- and y-directions, resulting in an eight-dimensional vector per subregion, making the descriptor 128 elements wide. Theextended version provide higher accuracy but due to higher dimensionality thematching is slower[28]. Since robustness is prioritized, the extended version isimplemented.

4.1.3 Descriptor matchingAfter successfully extracting the image descriptors from the images we need tomatch these with the ones extracted from the marker in the initialization to findcorresponding features. One way of doing this is by using kd-trees[16], which isa space-partitioning data structure. The kd-tree is a binary tree in which everynode is a k-dimensional point, or in our case a descriptor. By randomizing thekd-tree structure we improve the effectiveness of the representation in higherorder dimensions[23].

A set of four randomized kd-trees, also called a forest, was created with thedescriptors from the marker image. Then the forest is searched through usingthe k-nearest neighbor algorithm (k-NN)[26] for each descriptor in the camerastream image. By constructing a forest the precision of the nearest neighbors isimproved[27]. If the search is limited to 64 leafs per tree computational power issaved while getting acceptable matches. The k-NN search algorithm is alreadyimplemented in OpenCV and the results from the search are pairs of nearestneighbors of descriptors. The distances between nearest neighbors are checkedto see if they are small enough to actually be neighbors and thus be labeled asa match. If the distances are too large then the points are most likely not amatch.


Figure 4.1.4: Matches between the marker and current frame.

4.1.4 Pose estimation4.1.4.1 Homography

Our matching implementation does not not provide perfect matches. Instead,it generates a few outliers, i. e. points that are matched wrong. These outliersare not be included in the homography calculation. To remove these outliers aniterative method called Random Sample Consensus (RANSAC)[17] is used tofilter the features that do not fit the model. This does not entirely remove alloutliers, but most of them.

When the matches between the incoming image and the marker mostly con-sists of inliers the perspective transformation, also known as the homography,can be calculated. In other words the homography describes how the markerhas changed its orientation when shown in the camera. This means that for anygiven point in the marker image, it is possible to find its position in the camerastream image by transforming it with the calculated homography matrix.


(a) (b)

Figure 4.1.5: a) The red circles are outliers removed by RANSAC. b) With thehelp of the homography matrix the orientation of the marker in the stream canbe calculated.

4.1.4.2 Perspective-n-point (PnP)

Next a square plane is created in 3D space and oriented in the center of themarker. The sides of the plane is equal to the height of the marker. The reasona square is used is because it simplifies the calculations for OpenGL transfor-mations later on. The x and y values are transformed with the homographymatrix to find the planes’ position in the camera stream image. We have thecamera model:

z

uv1

= A [R T ]

xwywzw1

where [u v 1]

> represents a 2D point in pixel coordinates and [xw yw zw 1]>

a 3D point in world coordinates. A is the camera matrix (see section 3.2.2), Ris the rotation and T is the translation. R and T are the extrinsic parameterswhich denote the coordinate system transformations from 3D world coordinatesto 3D camera coordinates. Since the 3D to 2D point correspondence are known,together with the camera matrix, the extrinsic parameters can be found whichdescribes the change of position for the 3D plane. This is called solving the PnPproblem and the algorithm for this is also implemented in OpenCV. The resultfrom solving the problem is the transformation matrix.


4.1.5 3D renderingAn augmented reality application is a combination of 2D (camera stream) and3D graphics. In an Android application there is no way for them to be combinedeven though Android has APIs for both of them. Only by drawing the camerastream images as an OpenGL texture with the size of the screen makes it possibleto work 3D onto 2D. In the end only the 3D API is used practically because theAPI of the camera is not designed to just provide a raw stream of byte arrays.The camera API always has to be connected to a surface on which it can directlydraw the stream. However, it is not possible to have the OpenGL surface as apreview surface for the camera as they would cause conflicts with both threads(camera and OpenGL) trying to access the the surface simultaneously.

Furthermore the preview surface must be visible on the screen, otherwise thepreview callback will not be invoked. The way around this is to put an OpenGLsurface on top of the preview surface. The preview surface is there and visible,if it was not covered by the OpenGL surface. The camera stream is drawn on asurface that is not visible to the user and additionally on an OpenGL texture.Compatibility is more important, than avoiding this overhead, because there isno other way to circumvent this API design decision.

So lastly, after solving the PnP problem the resulting 4x4 transformationmatrix is sent back to Java through JNI as a float array. The reason for returninga float array is that the matrix loading function in OpenGL only accepts matricesin form of an one-dimensional array. The array is used to set the orientationof the 3D object which is rendered on a OpenGL layer on top of the camerastream.


Figure 4.1.6: The final rendering with the markers pose estimation.

4.2 API designAPI design is hard, but there is an abundance of information about theoriesand guidelines for it. The ARmsk API is not going to be big, so most of theguidelines are suited quite well. The primary guidelines for the API design is:

The API must be, in prioritized order:

1. Absolutely correct, no errors anywhere

2. Easy to use

3. Easy to learn

4. Fast enough

5. Small enough

This can be summed up into correctness, simplicity and efficiency. It is impor-tant to design to evolve and prepare for future work as well. An API shouldbe minimalistic, i. e. it should contain functionalities that are really neededand used. When an API is small there are less things to keep track of and it


becomes easier to learn. At the same time, this API should include main ARfeatures/capabilities, so that developers can get started easily. It was decidedthat the API should include marker management, marker detection and poseestimation. As for packages, a small project with less than 30 classes should behoused in a single package and should one completely rewrite the API it is agood idea to choose entirely different package name.

4.2.1 ARmsk APIARmsk is designed to be an API for markerless AR applications for Android.The idea is to produce AR with only the Java API and let all the native callsbe done without ever being shown. Figure 12 shows a visual of the ARmskarchitechture.

ARmsk is largely built upon an OpenCV port for Android as foundation,the port is called android-opencv (AO), earlier mentioned in section 3.2.1. Thishas now been announced as the official OpenCV port for Android. In sampleapplications provided by AO, SWIG and CMake is introduced for connectingnative C++ with Java and Android. AO takes care of the camera as well asimage acquisition and turning the images into workable formats for OpenCVsuch as converting them into the correct colorspace, into grayscale and intoOpenGL 2D textures. The port provides an image pool that could work likea buffer, but can for now only house a single image. Every frame is placed inthis image pool to be converted into the right workable format, and can thenbe accessed later for calculations, operations and drawings. Lastly the frame ispassed on to be used as an OpenGL texture to be applied on the view, whetherthey have been operated on or not. This is how the camera preview worksand is also the reason why the preview has slight lag compared to an ordinarycamera as it actually is an OpenGL 3D layer ontop of the preview surface. Thisis the solution to the constraint mentioned in section 4.1.5. The preview runson a native processor which works much like a thread, continously running inthe background, capturing frames. This processor mentioned is a native classof AO and has also been implemented into ARmsk for running the processingpipeline.

An Android application is using parts of AO through ARmsk to access thecamera. Through JNI the ARmsk library is called to get the transformationmatrix needed to set the orientation and render 3D objects with OpenGL ESin Java.


Figure 4.2.1: The structure of the ARmsk API.

Chapter 5

Building an application withARmsk API

ARmsk needs to be built (compiled) before anything else, as it mostly containsOpenCV functions written in C++. CMake is needed to read the enclosedmakefiles and for the building commands to work. Building ARmsk meansbuilding two times, first to build the OpenCV library and then the ARmsk nativeprocessing files. Building the OpenCV library takes quite some time becausethe whole library is admitted into ARmsk, but fortunately it is required to do itonly once. Once built the ARmsk shared library is ready to be used, in EclipseARmsk is it imported as a library project. Then it is time to build the nativeprocessing files of ARmsk. It is done in the same manner, but it is important tohave SWIG installed since there are SWIG interface files placed together withthe ARmsk native files, one for each C++ file. These interface files are includedin the building instructions of the makefile. Compiled native files are placed ina JNI folder while the SWIG wrappers of those files are automatically generatedand placed in a folder inside the JNI folder. Lastly the build is imported as anAndroid application project in Eclipse and linked to ARmsk/OpenCV as thelibrary.

35

CHAPTER 5. BUILDING AN APPLICATION WITH ARMSK API 36

Figure 5.0.1: Building order of ARmsk.

ARmsk is a markerless AR API for Android and will provide an objectoriented Java API. The cornerstones of the API is providing native functionalityfor camera preview, marker management, marker matching and access to atransformation matrix which is used to set the orientation and position of a3D rendering. User interface and graphics are up to each developer buildingan application ontop of ARmsk as there are no such things included. However,the API does include everything else needed to start up AR, all of it is builtinside the library as native code, hidden for the ordinary developer. As longas an Android project has everything included and is linked to the ARmsklibrary correctly it is easy to call on every native function of the API from Java.For ARmsk and AR to work on Android there is need to put on at least twolayers on the application surface, a camera layer to get frames and an OpenGLlayer for OpenCV operations and rendering. A good idea is to split up thesecond layer to a operation layer and a rendering layer for better structure,three layers altogether. The camera and texturing layers are included as nativeclasses provided by ARmsk, and needs to be instantiated and added in Java asviews. The third rendering layer can simply be a Java instantiated OpenGL ESview. With the layers set up all the preparations for AR are done and what isleft is to prepare a marker and start the ARmsk process.

CHAPTER 5. BUILDING AN APPLICATION WITH ARMSK API 37

Figure 5.0.2: Application work flow.

The marker management’s main funcitonality is to set a marker and inARmsk there are two ways to do this. First to set a marker with the useof the camera, to take a picture and call a native marker-setting function. Thesecond is to choose an already existing picture and send it to ARmsk to set it asmarker. Setting a marker basically means finding a marker’s features, extractingfeature descriptors and save them for matching and tracking when processingAR.

Marker matching as well as updating the transformation matrix are functionsin ARmsk main class and is initiated on a native processor which needs to beimplemented into a class in Java. Running them in a processor will constantlyupdate the matching and transformation matrix values. The matrix is loadedinto OpenGL to transform a 3D-object rendered on the rendering layer.

Chapter 6

Marketing of ARmsk

6.1 Naming the projectA good project name can really help a project grow and adopt faster than itshould have with a less good name. Karl Fogels [21, p. 21] definition of a whata ‘good’ open source project name should:

• Give some idea what the project does, or at least is related in an obviousway.

• Be easy to remember.

• Unique.

• If possible, be available as a domain name in the .com, .net and the .orgtop-level domains.

With this in mind we started out brainstorming and came up with the name‘ARmsk’. ARmsk is short for Augmented Reality Markerless Support Kit andit meets all the above criterias quite well. It includes and augments the letters‘AR’ to relate the project to the Augmented Reality field. Since the name isonly five letters long it should be easy enough to remember and there are noother projects with a similar name (that we could find).

6.2 Promotional websiteThe ARmsk promotional web site’s main function is to present a clear andwelcoming overview of the project, and to bind together the vital tools for onlineopen source development, such as the version control, bug tracker, discussionforums etc. The web site also provides all the necessary documentation andexamples for newcomers to get started using and further develop ARmsk.

38

CHAPTER 6. MARKETING OF ARMSK 39

Figure 6.2.1: http://www.armsk.org, the promotional website for ARmsk.

6.2.1 Site structureThe short description of the site contains seven sections:

• ARmsk: The presentation page, where visitors arrive at when they visitarmsk.org, .net or .com. Presents a brief summary and mission statementof the project.

• Blog: Project-blog, updates users and developers of future happenings,releases or other project related information.

• Features: Lists the application and API features, and displays how far theproject has come in the implementation work.

• Documentation: An archive of documents containing; examples, step-by-step-guides and other project related documentation.

CHAPTER 6. MARKETING OF ARMSK 40

• Community: Describes how developers can contribute to the project andsend them to the different resources of OSS development (such as SVN,Issue tracker etc.).

• Download: Presents how and where to download the source code and otherproject related items.

• Credits: Lists all the people and companies that had helped and con-tributed to the project during the development period.

6.2.2 WordpressThe promotional page uses Wordpress version 3, an open source Content Man-agement System (CMS), which is powered by PHP and MySQL. Wordpress isoften used as a blog publishing application and is widely used by major sites allover the world. Since Wordpress provides all the basic services we need it is agreat, fast and simple online solution.

6.3 Version controlGooglecode was chosen for hosting the ARmsk open source software project.This is widely used for open source software projects and since it is so widespreadit is a good idea to start growing the project here. Hosting is free and the sourcecode is easily accessible through either subversion checkout or browsing the webfor anyone interested. Adding informative wiki’s, mailing lists and discussionsections for developers to bring up issues and bugs are also options.

6.4 Social networksTo promote ARmsk and increase the number of visitors to the project siteaffiliations to social networks, such as Facebook and Twitter has been made.These are good ways of keeping users and developers updated about happenings,releases and other project related news. Linkage to these social networks areintegrated into the promotional website.

Chapter 7

Discussion

The first problem with the project was defining and constraining the task con-sidering the time limitation for the thesis. Design and implementation takestime, and doing it for something entirely new makes estimating time hard. Theproject was constrained with a set amount of functionality for the API, enoughto be called an alpha release.

7.1 The taskUnderstanding how AR is produced was the main task of our project, but anybreakdown of the technology, even though it is popular, is really scarce. Themost comprehensive and proven to be working description of AR found is pre-sented in [24]. The processing pipeline of ARmsk is based on the presentedmodel. This model is also quite new and has been further developed since interms of speed and stability, making the model even more suitable for ARmsk.However, the methods used in [24] ire not implemented in OpenCV and it wastime to decide the course of action. Time was spent searching and reading aboutthe different methods, among many other, that might be used and implementedfor AR. After weighing in the time constraint and effort needed to create astable implementation of the methods mentioned in [24], it was decided thatusing OpenCV, with already heavily optimized functions, was a better choice.The decision was therefore to implement the methods available in OpenCV eventhough they are not quite as suitable for embedded devices. Even by having thebasic structure ready, it was fairly hard to know which function in OpenCV thatwould be useful. Most helpful was to read the documentation and searching andreading examples with OpenCV implemented. It would often give a push intothe right direction. The documentation for over the 500 functions is categorizedafter purpose and locating a function that does almost the desired operationmight lead to finding one that actually does.

41

CHAPTER 7. DISCUSSION 42

7.2 OpenCV

7.2.1 OpenCV for PythonOpenCV was very good with its vast library that fortunately contained almostevery function needed to create AR, but it had its downsides. The version ofOpenCV used was of the C++ interface and was very hard to debug. There wereproblems understanding the errors throughout the whole project. Althoughprobably slower, OpenCV is also provided in Python which is supported bySWIG, that might be easier to work with than C++. Python has better codereadability and smaller source resulting in faster development speed. The reasonARmsk used the C++ version though is because the AO project had goodsamples written in C++ and it was the language that had been worked withmore prior to the project. Additionally Python is slower than C++ in mostcases and consumes more RAM, which makes it not as preferable for embeddedsystems.

7.2.2 Building OpenCVInitially there was trouble getting the OpenCV libraries built together withandroid-opencv. For it to build both SWIG and CMake were needed to beinstalled on the system. Then came understanding how to write makefiles thatincludes both Android and SWIG building specifications. Studying the samplesprovided in AO proved to be very useful and the housed makefiles served astemplates for the ARmsk makefiles as well.

7.2.3 Outdated examplesMany of the existing examples of OpenCV functions that are available online areoften based on an the older version of OpenCV. That made implemention kindof cumbersome. Due to the big changes in version 2.0 the examples were not ofmuch use with the new interface. They simply did not match the functionalitythat was sought after and trying to combine new and old OpenCV versions didnot work up to expectations. Especially the conversion between new classes andolder outdated classes.

7.3 PerformanceThe alpha version of the ARmsk API works, although with performance issuesand with a few bugs that will make the demonstration application crash occa-sionally. The reason for the crashes is not entirely clear, but it happens mostlyeither right when the AR process is started or when the marker is totally out ofsight, so it might have to do with invalid feature values. The performance issueis a more substantial one and is caused by different factors in every step of theAR process. So far, most of the functions used in ARmsk are from OpenCV andthey are heavily optimized. To get even more performance one should switch

CHAPTER 7. DISCUSSION 43

them out entirely or modify them to be optimal for its task. Also, there arenew techniques presented and implemented into the OpenCV library for everyrelease which are faster and ever more stable, which could be used. There aremany great methods that are not included in OpenCV that could substitute theones used by ARmsk and even perform better, that is not possible to include inthe scope of this project as the time limitation would not allow it.

7.3.1 JNI exportThe goal of the open source project is to make things easier for the futuredeveloper, i.e. they do not need to bother about digging into native coding.Every part of ARmsk is coded and put together into a native library, exceptfor the last part, the rendering. This part could have also been incorporatedinto the native part, but the decision was made to export this to Java. This,so that the ordinary developer can solely focus on the Java programming forAndroid SDK and create his or her own 3D rendering. This actually becamea problem, since there were trouble exporting float arrays from native to Javathrough the JNI. In the native code, each element in the transformation arrayis fetched one by one from Java. Going to Java from native and vice versacreates a performance overhead, so going back and forth is not desirable at all.It would be interesting to see the performance gain by sending over the wholetransformation array instead of the 16 elements one by one.

Chapter 8

Conclusion

This paper has investigated Augmented Reality possibilities on Android smart-phones as well as the technology behind AR. The Android platform has beenintroduced and its capabilities has been presented. Furthermore the restrictionsencountered on mobile devices have been documented.

An AR API, called ARmsk, has been developed during the work of this paperand an approach for natural feature tracking that allows robust pose estimationfrom planar targets on smartphones has been presented. The API is built ontop of the commonly used computer vision library OpenCV and is one of fewAPIs that provide markerless AR on Android. Additionally a demonstrationapplication, which is using ARmsk, capable of rendering basic 3D models ontop of a chosen image marker was developed. This application shows whatARmsk can do and that today’s smartphones are capable of being used for ARpurposes.

The API is released under the GNU GPL and is publicly available on theproject’s website. This allows developers to use the API and software as afoundation of their very own markerless AR applications. As we write thisreport the ARmsk project is by no means over. We hope for the project tocontinue to grow and become a much more polished and powerful API. Themain goal of the project is to provide a free open source augmented realityAPI that performs in real-time. Tthe user has to input a minimal amountof information to have an AR application up and running on Android. Theimplementation is far from optimized and the future work will be to improvethe API’s performance and stability.

8.1 Future workThe following is a list of what future work will include:

• To achieve a more stable and higher framerate a threshold for the detectorthat is dynamically set can be implemented. This to limit the numberof feature points taken into calculation. In the current implementation

44

CHAPTER 8. CONCLUSION 45

the framerate can vary a lot depending on the number of found features.According to [24] a set of around 150 feature points per frame is sufficientfor producing stable AR while retaining real-time performance.

• Implement tracking of the matched feature points instead of recalculatingthem for every frame would result in much greater performance. The cycleof detecting and matching will be rerun only once the feature points arenot feasible for tracking anymore. This would mean that for every frame,while the points are trackable, steps 1 to 3 in the ARmsk pipeline willbe skipped. These steps are by far the most processing heavy steps andwith them out the calculation producing real-time AR is close. Techniquesfor tracking, such as Pyramidal Optical Flow[19] are already included inOpenCV.

• For AR to work robustly there is need of a good marker that has manydistinctive features equally distributed all over the image. This is notalways easy for the human eye to discern which is why there could be amarker evaluation function implemented where markers are rated. Thiscan be done by counting the amount of features detected, calculating theirgeometric placement and distribution to generate a rating depending onthe results. This function could then be called by a user to preview amarker and see if it is any good for AR.

• ARmsk is currently optimized to run with the camera hardware specifica-tions of a HTC Desire, to support more devices a camera calibration toolneeds to be implemented to dynamically set the camera parameters (asdescribed in section 3.4).

• At the moment, ARmsk contains the whole OpenCV library, which isunnecessary since only small parts of the library is being used. WhenARmsk reach a final release of version 1.0, all of the unused OpenCVmethods and classes will be removed from the project to reduce buildingtime and especially storage for applications.

• Finally when ARmsk can generate AR at real-time an even further de-velopment would be to allow detection and tracking of multiple markers.Depending on the method used near real-time AR can be achievable formany targets as well, as seen in [20] Bongers1998.

Bibliography

[1] R. Azuma. A survey of augmented reality. Presence, 6:355-385, 1995.http://www.cs.unc.edu/~azuma/ARpresence.pdf.

[2] L. B. Rosenberg. The Use of Virtual Fixtures As Perceptual Overlays toEnhance Operator Performance in Remote Environments. Technical ReportAL-TR-0089, USAF Armstrong Laboratory, Wright-Patterson AFB OH,1992.

[3] L. B. Rosenberg, "The Use of Virtual Fixtures to Enhance Operator Perfor-mance in Telepresence Environments" SPIE Telemanipulator Technology,1993.

[4] O. Oda, L. J. Lister, S. White, and S. Feiner. Developing an augmentedreality racing game. In INTETAIN ’08: Proceedings of the 2nd interna-tional conference on INtelligent TEchnologies for interactive enterTAIN-ment, pages 18, ICST, Brussels, Belgium, 2007. ICST (Institute for Com-puter Sciences, Social-Informatics and Telecommunications Engineering).ISBN 978-963-9799-13- 4.

[5] A. Henrysson, Mark Billinghurst, and Mark Ollila. Face to face collab-orative AR on mobile phones. In ISMAR ’05: Proceedings of the 4thIEEE/ACM International Symposium on Mixed and Augmented Reality,pages 80-89, Washington, DC, USA, 2005. IEEE Computer Society. ISBN0-7695-2459-1. DOI: http://dx.doi.org/10.1109/ISMAR.2005.32.

[6] D. Wagner. Handheld Augmented Reality. PhD dissertation, Graz Univer-sity of Technology, Institute for Computer Graphics and Vision, October2007. http://studierstube.icg.tu-graz.ac.at/thesis/Wagner_PhDthesis_ fi-nal.pdf.

[7] D. Wagner and D. Schmalstieg. Artoolkitplus for pose tracking on mo-bile devices. In CVWW’07: Proceedings of 12th Computer Vision Win-ter Workshop, pages 139146, Graz University of Technology, Institutefor Computer Graphics and Vision, February 2007. http://www.icg.tu-graz.ac.at/Members/ daniel/Publications/ARToolKitPlus.

[8] R. S.Wright, B. Lipchak, and N. Haemel. OpenGL SuperBible: compre-hensive tutorial and reference. Addison-Wesly, fourth edition edition, 2007.

46

BIBLIOGRAPHY 47

[9] T. Olson. Polygons in your pocket: Introducing OpenGL ES.http://www.opengl.org/pipeline/article/vol003_2/.

[10] G. Klein and D. Murray. Parallel Tracking and Mapping for Small ARWorkspaces by, In ISMAR ’07: Proceedings of the 6th IEEE/ACM Inter-national Symposium on Mixed and Augmented Reality, pages 225 - 234,Nara, Japan, 2007. DOI: http://dx.doi.org/10.1109/ISMAR.2007.4538852.

[11] E. Rosten and T. Drummond. Fusing points and lines for high performancetracking. In IEEE International Conference on Computer Vision, pages1508–1511, October 2005. DOI: http://dx.doi.org/10.1109/ICCV.2005.104.

[12] E. Rosten and T. Drummond. Machine learning for high-speed corner de-tection. In European Conference on Computer Vision, pages 430–443, May2006. DOI: http://dx.doi.org/10.1007/11744023_34.

[13] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, "SURF: Speeded Up RobustFeatures", Computer Vision and Image Understanding (CVIU), Vol. 110,No. 3, pp. 346–359, 2008.

[14] D. G. Lowe. Object recognition from local scale-invariant features. In ICCV,1999.

[15] M. Agrawal, K. Konolige and M. R. Blas, CenSurE: Center Sur-round Extremas for Realtime Feature Detection and Matching, LectureNotes in Computer Science, 2008, Volume 5305/2008, 102-115, DOI:http://dx.doi.org/10.1007/978-3-540-88693-8_8.

[16] J. L. Bentley [1975] Multidimensional Binary Search Trees Used for Asso-ciative Searching. Cornmunications of the ACM 18, 9 (September 1975),509—517.

[17] M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigmfor Model Fitting with Applications to Image Analysis and AutomatedCartography. In Communications of the ACM, pp. 381–395, June 1981.

[18] T. Tuytelaars and K. Mikolajczyk Local Invariant Feature Detectors: ASurvey, Foundations and Trends® in Computer Graphics and Vision: Vol.3: No 3, pp 177-280, 2008. http:/dx.doi.org/10.1561/0600000017

[19] J-Y. Bouguet, Pyramidal Implementation of the Lucas Kanade FeatureTracker Description of the algorithm, in Intel Corporation MicroprocessorResearch Labs, 2000.

[20] D. Wagner, D. Schmalstieg, H. Bischof, "Multiple target detection andtracking with guaranteed framerates on mobile phones," ismar, pp.57-64,2009 8th IEEE International Symposium on Mixed and Augmented Reality,2009.

BIBLIOGRAPHY 48

[21] K. Fogel. Producing Open Source Software: How to Run a Successful FreeSoftware Project. http://www.producingoss.com/en/producingoss.pdf.

[22] F. Crow. Summed-area tables for texture mapping. SIGGRAPH ’84: Pro-ceedings of the 11th annual conference on Computer graphics and interac-tive techniques. pp. 207–212,1984.

[23] M. Muja and D. G. Lowe, Fast Approximate Nearest Neighbors with Auto-matic Algorithm Configuration, in International Conference on ComputerVision Theory and Applications (VISAPP’09), 2009.

[24] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, D. Schmalstieg, PoseTracking from Natural Features on Mobile Phones. In The 7th IEEE andACM International Symposium on Mixed and Augmented Reality (ISMAR2008), 2008.

[25] D. Wagner and D. Schmalstieg. Making augmented reality practical onmobile phones, part 1. IEEE Comput. Graph. Appl., 29(3):1215, 2009.ISSN 0272-1716. doi: http://dx.doi.org/10.1109/MCG.2009.46.

[26] T. M. Cover, P. E. Hart. Nearest neighbor pattern classification. IEEETrans. Inform. Theory, IT-13(1):21–27, 1967.

[27] S. H. Yen, C. Y. Shih, T. K. Li, H. W. Chan. Applying Multiple KD Treesin High Dimensional Nearest Neighbor Searching. In International Journalof Circuits, Systems and Signal Processing, pp153–160, 2010.

[28] H. Bay, T. Tuytelaars and L. V. Gool. SURF: Speeded Up Robust Features,in Lecture Notes in Computer Science, 2006, Volume 3951/2006, 404-417,doi: http://dx.doi.org/10.1007/11744023_32.

development of an open-source api for augmented reality for the

Documents