visual-topological mapping: an approach for indoor robot ...davila/davila_2011_visual... ·...

Visual-Topological Mapping:

An approach for indoor robot

navigation

Master Thesis Project

Jorge Davila Chacon(s1846965)

Faculty of Mathematics and Natural Sciences

Artificial Intelligence Department

University of Groningen

June 2011

mailto:[email protected]

http://www.rug.nl/fwn/index

http://www.rug.nl/ai/index

http://www.rug.nl

Internal supervisors:

Prof. Dr. Lambert Schomaker

(ALICE, University of Groningen, the Netherlands)

Dr. Tijn van der Zant

(ALICE, University of Groningen, the Netherlands)

External supervisor:

Dr. Peter Ford Dominey

(SBRI, Institut National de la Sante et de la Recherche Medicale, France)

Abstract

Using visual features for robotic navigation in natural and changing envi-

ronments remains to be an unsolved problem. This paper describes the use

of nearest neighbor algorithm on salient visual features for indoor mapping.

The mapping procedure consists on defining an arbitrary number of vector

fields -or paths- in a domestic environment, pointing towards a set of target

locations. From these specified paths, the robot constantly estimates its

most probable location, adjusts its orientation and moves forward a prede-

fined distance, in order to converge to a target location. A SURF (Speeded

Up Robust Features) implementation was used for object detection and

scene recognition. For every location in a path, the perceived scene was

represented with the set of detected objects, and their particular geometric

properties relative to the robot’s body. A first experiment confirmed that

the deviation from the trained paths -due to odometric error- decreased as

the n umber of trained intermediate-locations increased. A second exper-

iment showed promising results when testing the effectiveness of nearest

neighbor search for estimating the robot location. Finally in a third experi-

ment, a satisfactory convergence rate of the robot towards predefined target

locations, was observed under real world circumstances.

Keywords: Topological robot-mapping, visual object-recognition, nearest

neighbor, vector-field navigation.

i

Preface

In the following pre-introductory paragraphs are described the research goals of the

scientists supporting this thesis project. It is also explained how the objectives they

share in common originated this project. Finally, it is included an acknowledgments

section.

Shared efforts towards long living robots

This thesis project was originated by the commencement of collaboration between

the Artificial Intelligence and Cognitive Engineering Institute (ALICE), at the Univer-

sity of Groningen (RUG) in the Netherlands, and the Stem-Cell and Brain-Research

Institute (SBRI), a branch of the Institut National de la Sante et de la Recherche

Medicale (INSERM) in France. In order to clarify the objectives of the present work,

it is useful to know the research objectives of both institutes.

The ALICE institute objective is concisely stated by its research director Lambert

Schomaker:

The research in Artificial Intelligence and Cognitive Engineering is in-

herently interdisciplinary in nature. With Computing Science, an interest

in formal modeling is shared, where these models may be symbolic, statisti-

cal or hybrid in nature. With Psychology and Biology an interest in natural

cognition is shared, where the goal is to describe fundamental qualities of

cognition in such an explicit way that computational models of perceptual,

cognitive, and behavioral control functions can be constructed.1

While working at the ALICE institute, Tijn van der Zant, together with Thomas

Wisspeintner, published in 2005 the paper “RoboCup X: A Proposal for a new League

1http://www.rug.nl/ai/onderzoek/index

ii

http://www.rug.nl/ai/onderzoek/index

where RoboCup goes real world”. This paper was the ideological foundation of the

RoboCup@Home league. RoboCup@Home is part of the RoboCup1 initiative, and it is

the largest international competition for autonomous domestic-service robots. Its main

objective is:

. . . to develop service and assistive robot technology with high relevance

for future personal domestic applications. [. . . ] A set of benchmark tests

is used to evaluate the robot’s abilities and performance in a realistic non-

standardized home environment setting. Focus lies on the following domains

but is not limited to: Human-Robot-Interaction and Cooperation, Naviga-

tion and Mapping in dynamic environments, Computer Vision and Object

Recognition under natural light conditions, Object Manipulation, Adaptive

Behaviors, Behavior Integration, Ambient Intelligence, Standardization and

System Integration.2

The Cortical Networks for Cognitive Interaction (CNCI) team, from SBRI, repre-

sents the other side of the collaborative group. Their research is based on Artificial

Intelligence (AI) and Embodied Cognition paradigms. It is described by Peter Ford

Dominey -CNCI research director- as follows:

Our previous research has addressed sensory-motor integration and higher

cognition including spatial integration and language comprehension through

a variety of methods including human functional imagery and clinical neu-

roscience, neural network simulation and robotics. [. . . ] One of the crucial

issues that has emerged in this research is the realization that the nature

of internal representations will be determined by the physiological structure

of the system and its direct link with the nervous system, thus bringing us

towards the “embodied cognition” stance. In this context we will develop a

“hybrid” embodied cognitive system for a humanoid robot, the iCub.3

Intelligent robots @Home

1http://www.robocup.org/about-robocup/a-brief-history-of-robocup/2http://www.ai.rug.nl/robocupathome/3http://www.sbri.fr/teams/human-and-robot-interactive-cognitive-systems.html

iii

http://www.robocup.org/about-robocup/a-brief-history-of-robocup/

http://www.ai.rug.nl/robocupathome/

http://www.sbri.fr/teams/human-and-robot-interactive-cognitive-systems.html

The link between RoboCup@Home research goals[34] and the iCub1 robot, can

be found by exploring the Cooperative Human-Robot Interaction Systems (CHRIS)

project. CHRIS is a European effort that supported the CNCI research on embodied

cognitive systems. The CHRIS project addresses:

. . . the problem of a human and a robot performing co-operative tasks

in a co-located space, such as in the kitchen where your service robot stirs

the soup as you add the cream. [. . . ] The project is based on the essential

premise that it will be ultimately beneficial to our socioeconomic welfare to

generate service robots capable of safe co-operative physical interaction with

humans. The key hypothesis is that safe interaction between human and

robot can be engineered physically and cognitively for joint physical tasks

requiring co-operative manipulation of real world objects.2

During the first semester of 2010, as a research member of the CNCI team, van

der Zant integrated the CHRIS project research goals with the RoboCup@Home com-

petition. He created a team to participate for the first time with a humanoid robotic

platform. One of the requirements for the competition was a navigation system and

students at the ALICE institute were offered the opportunity to join the venture.

The first stage of this thesis project was conducted in Lyon, France, at the SBRI.

The second and third stages were carried out in Groningen, the Netherlands, at the

ALICE institute. The different stages originate from significant changes in the hard-

ware and software architecture. Often it will be useful to explain the work realized on

the initial ones to better understand the later ones. It will be referred to them as first,

second and third stages of the thesis project during the rest of this document.

Acknowledgments

This work was carried under the internal supervision of Prof. Dr. Lambert Schomaker,

Director of Research at ALICE institute, and Dr. Colin Martijn van der Zant, re-

1http://www.robotcub.org/index.php/robotcub2http://www.chrisfp7.eu/

iv

http://www.robotcub.org/index.php/robotcub

http://www.chrisfp7.eu/

searcher at the same institute and founding member of the RoboCup@Home competi-

tion. As an external supervisor I was supported by Dr. Peter Ford Dominey, Research

Director at SCBRI.

Both, RUG and INSERM, greatly supported this thesis project and made possible

the empirical research by providing me with academic guidance, access to physical

equipment, facilities and travel expenses when required.

Concerning to the economic grant that allowed me to take part in the master studies

program at RUG, I benefited from the agreement made between Benemerita Universi-

dad Autonoma de Puebla (BUAP) and Erasmus-Mundus External Cooperation Win-

dow - Mexico (ECW- Mexico). I want to thank both institutions for the opportunity

to study in a university of excellence.

I want to give special thanks to some friends that supported me academically at

various stages during my studies: Olexandr Murov, Herke van Hoof, the BORG1 and

Radical Dudes2 team members. They all made my time in Europe one of the most

memorable experiences in my life. Thank you for giving me a deeper insight in the

diversity of cultures around the world.

Finally, I am deeply grateful with my family for their support in every aspect during

these two years. They clarified my thoughts in moments of uncertainty, and gave me

the mettle to materialize a dream.

In summary, for all the moral, intellectual, and economical support I want to express

my gratitude to the pointed institutions, researchers, friends, colleagues and family. Be-

hind RUG, INSERM, BUAP and ECW- Mexico exists a vast amount of work making

possible to spread high quality education around the globe. You are creating a better

future for society by extending education opportunities without frontiers.

Jorge Davila Chacon

July 2011, Groningen, the Netherlands.

1http://www.ai.rug.nl/crl/People/People2https://sites.google.com/site/pfdominey/robocup@home

v

Contents

List of Figures ix

List of Tables xi

1 Introduction 1

1.1 Robotic-navigation systems . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Animal navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Behavior based robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Technical challenges for autonomous robot mobility . . . . . . . . . . . . 5

1.5 Research goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Project overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 8

2.1 Current approaches for autonomous mobility . . . . . . . . . . . . . . . 8

2.2 Visual-topological navigation . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Robot localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Platform 15

3.1 First stage: Nao with low computing power . . . . . . . . . . . . . . . . 15

3.1.1 First stage hardware platform . . . . . . . . . . . . . . . . . . . . 16

3.1.2 First stage software platform . . . . . . . . . . . . . . . . . . . . 16

3.2 Second stage: Nao with high computing power . . . . . . . . . . . . . . 21

3.2.1 Second stage hardware platform . . . . . . . . . . . . . . . . . . 23

3.2.2 Second stage software platform . . . . . . . . . . . . . . . . . . . 23

3.3 Third stage: Pioneer with distributed computing power . . . . . . . . . 23

vi

CONTENTS

3.3.1 Third stage hardware platform . . . . . . . . . . . . . . . . . . . 24

3.3.2 Third stage software platform . . . . . . . . . . . . . . . . . . . . 26

4 An Algorithm for Visual-Topological Mapping 29

4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Robot niche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.2 Lighting conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.3 Obstacles cluttering . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.4 Landmarks location . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.5 Camera properties . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Machine learning techniques . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Visual navigation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Object detection: Spikenet . . . . . . . . . . . . . . . . . . . . . 33

4.3.2 Object detection: SURF . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.3 Calculation of an object’s angle . . . . . . . . . . . . . . . . . . . 38

4.3.4 Detect surroundings . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.5 Location estimation . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.5.1 Location disambiguation . . . . . . . . . . . . . . . . . 44

4.3.6 Correcting orientation . . . . . . . . . . . . . . . . . . . . . . . . 45

5 Experiments 47

5.1 Experiment 1: Path formation . . . . . . . . . . . . . . . . . . . . . . . 47

5.1.1 Conclusion for path formation . . . . . . . . . . . . . . . . . . . 49

5.2 Experiment 2: Location estimation . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Conclusion for location estimation . . . . . . . . . . . . . . . . . 52

5.3 Experiment 3: Visual navigation . . . . . . . . . . . . . . . . . . . . . . 54

5.3.1 Elevator path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.1.1 Conclusion for elevator path . . . . . . . . . . . . . . . 56

5.3.2 Lounge path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.2.1 Conclusion for lounge path . . . . . . . . . . . . . . . . 59

6 Discussion 62

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

vii

CONTENTS

References 67

viii

List of Figures

2.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Certainty grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Potential field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 Nao with additional mini-PC . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Nao hardware diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Additional mini-PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Used graphical user interfaces . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Nao mounted on pioneer platform . . . . . . . . . . . . . . . . . . . . . 25

4.1 Projection on camera sensor . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Nao’s camera angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Spikenet detections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4 SURF detections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Full images considered as objects . . . . . . . . . . . . . . . . . . . . . . 36

4.6 Object detection components . . . . . . . . . . . . . . . . . . . . . . . . 39

4.7 Example path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1 Trained “U” path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 6-locations path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3 11-locations path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4 Estimation of the robot’s location in a vector field . . . . . . . . . . . . 53

5.5 Stored vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.6 Robot trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.7 Confidence limits for Elevator path . . . . . . . . . . . . . . . . . . . . . 57

5.8 Lounge path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

ix

LIST OF FIGURES

5.9 Confidence limits for Lounge path . . . . . . . . . . . . . . . . . . . . . 61

x

List of Tables

3.1 Nao specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Additional mini-PC specifications . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Desk PC specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Robot’s laptops specifications . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Pioneer specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Default SURF parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Visual navigation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Optimized navigation results . . . . . . . . . . . . . . . . . . . . . . . . 60

xi

LIST OF TABLES

xii

Chapter 1

Introduction

As a starting point, it is addressed the importance of navigation algorithms for au-

tonomous systems. Afterward, it is provided an overview of some navigation methods

found in nature. Then is explained the influence of animal navigation in artificial sys-

tems, and what are the current challenges for robotic navigation. Finally, the research

goals and the project overview are detailed.

1.1 Robotic-navigation systems

The goal of providing autonomy and physical awareness to intelligent systems has been

pursued for decades. More specifically, the case of a robot discovering and describing

its environment has been a challenging task. This is a relevant goal, as it could provide

means for exploring harmful or unreachable environments for humans. Autonomous

mobility is also a requirement when it is not possible or desirable to instruct a robot

-in real time- about which path to follow. This could be the case of natural disasters or

interplanetary exploration. On the other hand, robot-experimentation could provide

insights in the way animals navigate in dynamical environments in real time.

In the case of indoor scenarios, one important goal is to achieve a smooth interac-

tion between robots and people. In typical human environments -such as homes, offices,

hospitals or elderly houses- a robot can be useful for entertaining purposes, collaborat-

ing in stress reduction or even for safety vigilance. Examples include alerting medical

personnel in case a human suffered an accident, or patrolling a neighborhood during

the night. Basic tasks expected from a robot companion include localizing spots in its

1

1. INTRODUCTION

surroundings -after spoken commands or through exploration- and moving objects be-

tween different locations. A reliable navigation system is a priority when being realistic

about the deployment of robots in human niches.

Achieving an accurate map description with a robot may be based on different

kinds of measurements, such as distances traveled, turning angles, or the topology of

landmarks detected in the environment. All the measurements rely on a variety of

sensors like odometers, cameras or proximity sensors (sonars or range finders). The

economical resources available for different research programs is reflected in the variety

of sensors used, affecting in this way the approach to be taken. What remains constant

in different navigation approaches found in literature, is the common goal of providing

a robot with a fast and accurate system to describe its environment [30].

Before providing a further insight on the technical challenges for robot mobility,

it is useful to have an overview of the navigation strategies observed in some animal

species.

1.2 Animal navigation

Many navigation behaviors found in the animal kingdom, depend on reference land-

marks that provide a compass sense in the environment [22]. For example, the desert

ant Cataglyphis bicolor uses the relative sizes and angles of visual landmarks for navi-

gation. They do not retrace their return journey to their nest by following a pheromone

trail. Once they reached a point near the nest, they start searching the nest in a spiral-

like motion, of constantly widening circles [15]. This is an example of the functionality

of dead-reckoning in animal navigation. It suggests the hard coding of exploring behav-

iors by evolutionary means. Sometimes a dead-reckoning behavior is the best solution

for navigation.

Landmarks can be represented by several ways in nature. Studies on the navigation

system of some bees (Apis mellifera) and birds (Passerina cyanea), show the use of

the sun movement, forests border lines and individual trees, as reference landmarks for

navigation [17, 33]. Another example involves human navigation without the use of

compasses or geometrical maps. The inhabitants of the Polynesian islands were able to

navigate successfully over several thousand kilometers of open sea between Tahiti and

Hawaii [33].

2

1.2 Animal navigation

The variety of landmarks used to achieve such odyssey include the sun, moon, night

stars, and unique wind and water patterns determined by near -although not necessarily

visible- islands. On a canoe the view can reach around 15 km, but clouds may form

above islands and make them visible from much further distances. This is because land

heats air masses faster than the surrounding water, creating elevating air currents that

form clouds when cooling down. Furthermore, these white clouds reflect a green light

when formed above atolls or coral islands encircling a lagoon.

In all the cases mentioned above, the common denominator is the formation of

canonical paths. These paths greatly simplify the navigation of animals between their

nest and the food source, name it between their current location and a target location.

The origin of these paths may be learnt or “hard coded” for different animals. Nehmzow

explains concisely the advantage of canonical paths:

The use of unchangeable (canonical) paths improves navigation reliabil-

ity greatly: the task of navigating to a goal location is simplified to one of

staying on a (possibly learned) path. [22]

For some animals, these paths seem to be acquired throughout their lives (Honey

bee) [22]. For others these paths must be genetically preprogrammed. For example,

first-year juvenile terns do not rely on the exemplification of an adult to find their

migratory paths [22].

As seen in the previous examples, the animal niche plays a fundamental role for

navigation. The specific characteristics of different environments determine the na-

ture of the objects to be used for orientation and path traversing. Nevertheless, the

intelligent use of reference landmarks would not make successful navigation possible,

if the environment was not favorable. Nehmzow relates this principle to autonomous

robotics:

This principle applies to robotics as well: we believe that robot perfor-

mance (and therefore navigation as one aspect of robot performance) always

has to be seen with respect to the robot’s environment. A general purpose

robot does not exist. [22]

3

1. INTRODUCTION

Mobile robots are not an exception, and the characteristics of the environment are

also an important consideration for designing navigation algorithms. In the case of

indoor domestic environments, it may be assumed an environment filled with visual

landmarks such as decorative objects and furniture. It will be elaborated further on

this topic in section 4.1.

Another important factor for the interaction between robots and the real world

is time reaction. Behavior based robotics is a promising alternative for the successful

deployment of robots in dynamic environments. This topic is addressed in the following

section.

1.3 Behavior based robotics

In the past decades it has been studied alternatives to provide machines with an animal-

like interaction with real environments. By the end of th 1940’s decade, Norbert Wiener

lead the development of cybernetics. Arkin defines the field and concisely explains the

importance of feedback loops to reproduce natural behaviors:

. . . a marriage of control theory, information science, and biology that

seeks to explain the common principles of control and communication in

both animals and machines [. . . ] Ashby and Wiener furthered this view of

an organism as a machine by using the mathematics developed for feedback

control systems to express natural behavior. This affirmed the notion of

situatedness, that is, a strong two-way coupling between an organism and

its environment. [3]

In the case of autonomous robots, situatedness requires a fast response-loop in or-

der to cope with the environment dynamics. A reactive system is therefore essential to

control the robot behavior. Reactive control involves a tight coupling between percep-

tion and action. More specifically, in the context of robotic behaviors, reactive control

aims to produce a timely motor response in dynamic and unstructured environments.

Due to the accumulated scientific experience of the past years, nowadays it is

clear the advantages provided by reactive hardware and software architectures for au-

tonomous robotics. In our case, this approach was taken at the third stage of the project

by developing a behavior-based software architecture, where a navigation system was

4

1.4 Technical challenges for autonomous robot mobility

required. In the following section, the technical challenges for robotic navigation sys-

tems are explained in more detail.

1.4 Technical challenges for autonomous robot mobility

As a starting point, it is useful to mention some of the main difficulties when trying

to provide a robot with physical awareness. Certainly the creation of effective and effi-

cient navigation algorithms, requires an appropriate interpretation of the information

provided by the sensors. But it is also necessary to consider the perceptual limitations

of the chosen sensors. The sensor limitations make necessary for a robot to explore its

surroundings. The noise affecting the sensed information needs to be included in the

sensor’s noise models, having unknown and/or variable distributions [29]. In the case

of motion, the rotational and translational odometric errors accumulate, making these

sensed data less reliable.

The robot niche generate important constraints for the robot mobility. It is nec-

essary to consider both, static physical constraints (stairs, different rooms, fireplaces),

and dynamic objects perceived as static (animals, chairs, other robots). Finally, it is

fundamental to understand the how a robot perceives its environment from its partic-

ular embodiment. A small biped robot and a tall wheel based robot require different

strategies to interact with the world.

Problems with traditional topological algorithms arise from the assumption that

the environment is static. Conservative approaches constrain the use of a robot to

artificial niches [30]. As well the use of active distance-based sensors such as laser range

finders imposes strict spatial limitations. Other methods capable of handling dynamic

environments assume the lack of occlusions from an aerial perspective [19]. In a typical

domestic setting these algorithms tend to fail. They require extensive retraining (precise

maps must be updated constantly), are costly to implement or simply not suitable

for most indoor environments. Items, such as some furniture and plants, can change

position frequently. Wall’s lines might be obscured from view only to reappear after a

while. Navigation algorithms should be able to deal with changing environments and

retrain automatically when new information is available.

Approaches using temporal consistency or odometry mostly rely on Bayesian models

[1, 2, 10, 23, 35]. Their drawback is that the information required for this modeling

5

1. INTRODUCTION

is unavailable as soon as the user turns off the robot, or changes it from one room to

another.

Some visual approaches for navigation can be found in literature. One alternative

is the use of omni directional cameras [6, 36]. They have the disadvantage of reduced

image resolution at relatively short distances compared to monocular cameras.

The algorithm described in this paper shows that navigation can be implemented

in a dynamic environment, by building a visual-topological map using salient image

features. This is, a map describing the spatial and geometric relations between several

visual landmarks in a given environment. As the system relies on several landmarks, it

is possible to take away some of them without affecting the navigation behavior. The

underpinnings of this approach will be addressed in the following chapters.

Now the motivations for exploring visual-topological navigation have been intro-

duced. It is possible then, to expose the general objective and research questions of the

project.

1.5 Research goals

The general objective of this project was to implement and explore the robustness of

the proposed visual-topological mapping algorithm, for indoor navigation with the robot

Nao. Consequently, the following research questions emerged:

1. Is it possible to implement a visual-topological mapping algorithm with the robot

Nao, relying on its embedded motor, sensing and computing capabilities?

2. Is it possible to improve the mapping algorithm after customizing it to the NAO’s

incorporated sensing devices, and with the aid of additional computational power?

3. Is this algorithm providing sufficient capabilities to the robot Nao to perform

SLAM (Simultaneous Localization and Mapping) [27] in an indoor location in

a reasonable amount of time, relative to the RoboCup@Home [34] competition

requirements?

6

1.6 Project overview

1.6 Project overview

The proposed mapping algorithm relies on the visual detection of landmarks. Two

different approaches were tested for object detection. The first one was a commercial

application based on artificial neural networks [11]. The second one was a SURF

implementation based on [4].

It was implemented an algorithm to calculate spatial and geometric properties of

the detected objects. Such properties include the angle of the objects relative to the

robot’s body, and abstract features dependent on the distance and rotation of the

objects. Assuming a large enough number of visual landmarks, these features allow the

robot to describe any location in a room as a vector with different values.

The environment constrains where the robot is required to navigate, are defined by

the RoboCup@Home rule book. It is assumed that a typical human indoor environment

can provide the robot with enough visual landmarks to differentiate several locations

in a room. It is hypothesized that the difference between all locations, is large enough

to generate a fine-grained vector field. Such vector field should allow the robot to move

from any location near the trained vectors, towards a target location.

Three experiments were carried to test the performance of the mapping algorithm.

The first experiment was performed in a constrained environment, in order to observe

the relation between the density of the vector field and the accumulation of odometric

error. The second experiment tested the performance of the nearest neighbor statistic

for estimating the location of the robot in a room. The last experiment tested the

robustness of the mapping algorithm in the real world.

The results of the experiments are detailed in chapter 5. A global discussion of the

project results, and the possible future work will be commented in chapter 6.

7

Chapter 2

Background

Many of the indoor robot autonomous mobility procedures found in literature consist

of three steps: mapping, localization and navigation [30]. At the same time, navigation

can be subdivided in path planning and obstacle avoidance. In this chapter, it will be

provided an overview of the current challenges for autonomous robot mobility. Then

it will be provided a more detailed description of the alternatives we propose to tackle

each of its required components.

2.1 Current approaches for autonomous mobility

A large proportion of the navigation algorithms found in scientific literature rely on

probabilistic models of the robot’s motion control (movements commanded to the robot)

and localization (its actual position after moving). A topological SLAM algorithm in

[26], used Partially Observable Markov Decision Process (POMDP) model the proba-

bility distribution of the position of a robot. In the experiments in [31] this approach

was adapted to perform hybrid topological-metric SLAM with a laser scanner. How-

ever, POMDP are not well suited for on line mapping since they are learned through

an off line process [25] or built manually from earlier information concerning the envi-

ronment’s geometry, appearance and topology. For example, in some methods [26, 31]

the environment geometry is standardized, e.g. room’s shape, floor distribution and

hall intersections angle.

Inference methods have been investigated by [24] and [23] to create topological

SLAM algorithms. Topologies are built in a space of relational maps. When sufficient

8

2.1 Current approaches for autonomous mobility

information is collected, maps are correlated by commanding the robot to perform extra

actions [24] or measurements [23], in order to estimate the validity of each individually

created map. Nevertheless, the acquisition of large sums of sensor information and the

complexity of sampling through all the images, does not allow mapping environments

with more than 15 nodes in such methods.

Other topological SLAM approaches are based on the global appearance from omni

directional-camera images [6, 16, 32, 36]. Images are described by regions or points

of interest, and the similarity distance between images in a given dataset defines the

nodes in the map. Images with a distance between them below a chosen threshold,

i.e. “similar images”, are considered to be originated from the same location in the

environment. These approaches based on image-appearance, efficiently segment the

environment by recognizing spots from distant locations with omni directional images.

However, none of the previous approaches performs on line nor on real-time. Either the

input images are previously processed in an off line step [6, 16, 36], or when using large

datasets, the computation of the images similarity becomes untreatable for a real-time

implementation [32].

In the case of the approach taken by [14], it is presented a real-time framework

performing topological SLAM based on vision using one monocular camera. Their

approach is based on the bag of words method [9], i.e. the collected images are measured

by a set of elementary features, denominated visual-words, and stored in a dictionary.

This dictionary is constructed by grouping visual-words, i.e. similar visual descriptors

in the images. Once a dictionary is built, the images are represented according to the

frequency of words contained in them. These representations or visual-words vectors,

are taken from the visual vocabulary built off line and a voting process finds the previous

images closest to the current one. Nevertheless, the authors do not mention if the

algorithm performance was tested under occlusion circumstances, e.g. when a chair or

a person would be blocking some region of interest in the image.

The requirements of the RoboCup@Home competition demand a navigation system

capable of performing in dynamical indoor environments. The algorithms described

above can not tackle such challenge. For this reason, an algorithm combining -the

advantages of- visual and topological mapping was considered an appropriate approach.

9

2. BACKGROUND

2.2 Visual-topological navigation

The main motivation for building a visually based navigation system is the big accu-

mulation of odometric error in the humanoid robot Nao, relative to the odometric error

found in wheel based robotic platforms. When a Nao robot is commanded to move n

meters forward, it will never reach exactly the specified distance nor move in a straight

line. The accumulated error depends on several factors, such as the floor material, dust,

the robot battery level and the amount of hours it was used during a day. The temper-

ature of the robot-joints tends to increase faster than it decreases. Another important

consideration is the use of passive sensors for navigation due to energy constraints.

The advantages of a visual navigation system can be thought from a practical per-

spective and from a scientific point of view. In comparison to metric approaches, topo-

logical maps provide a more compact representation that scales better with the size of

the environment. From a scientific point of view, topological maps rely on a higher level

of representation than metric mapping, allowing for semantically transparent planning

and navigation. At the same time, visual-topological navigation is biologically more

plausible. It is closer to animal behavior [22], revealing proof of concept.

It was made a review of different SLAM (Self Localization And Mapping) implemen-

tations, suitable for the embodiment of the robot Nao. There are integrated approaches

as in [20]. It implements the creation of nodes and loop closing based on SIFT (Scale

Invariant Feature Transform) [21] image features.

The proposed paradigm for mapping an indoor area using the android Nao, includes

the collaboration of parallel projects on image processing and computer vision. Current

cognitive and perceptual paradigms were considered as guidelines, to provide the robot

with the required physical awareness. It is known from literature that context plays a

fundamental role [12, 18]. Our method assumes a dynamic indoor human-environment,

where there is enough visual-texture to define fixed regions of interest, e.g. furniture,

decoration, windows, etc.

Finally some algorithms performing localization based on visual features lack the

implementation of navigation paths [13, 35]. In our approach, paths can be constructed

by describing a vector field, where the direction of every vector corresponds to the

desired orientation of the robot before moving towards a target location.

10

2.3 Robot localization

Using a nearest neighbor implementation the humanoid is able to find its way

around a domestic environment, even when some landmarks have changed position or

disappeared. The paper discusses the use of this method in a controlled environment in

details, to demonstrate the workings of the algorithm. Since the real world is changing

all the time, it is difficult to assert the exactness of the method. Nevertheless, when

tested in natural environments it performed satisfactorily, in the sense that the robot

reached the location it was commanded to find.

2.3 Robot localization

In robotics there are two common and related problems concerning localization [7]:

The Kidnapped and the Wake-up robot problems. They are important to test the

recovering capability of a robot from catastrophic localization failures. The kidnapped

robot refers to the situation where an autonomous robot is moved to an arbitrary

location while operating, and then it has to infer its current location. The wake-up

robot refers to the case where the robot is taken to an arbitrary location and put in

to operation. Then it has to infer its current location without any prior information.

Notice that the Wake-up problem is the special case of the Kidnapped problem, where

the robot is “told” that it was moved. These two problems create significant issues

with many localization systems. Only a subset of localization algorithms can deal with

the uncertainty created1.

Our proposed approach relies on the detection of objects or visual landmarks. It

is assumed that from different locations, the objects perceived by the robot have dif-

ferent geometric properties, e.g. size and orientation. The algorithm was tested under

different levels of control in the environment, and proved to be robust enough to tackle

the challenges of both the Kidnapped and the Wake-up robot problems. As expected,

when the landmarks are different at every location, even when the robot perceives a

single one , it always infers correctly its actual location. When the different locations

share some landmarks in common, at least an extra distinct landmark is needed to infer

correctly the current location. Further more, it also showed a satisfactory performance

when the only difference between the trained locations was the perceived size of the

visual landmarks. In this case the algorithm increased its accuracy with the number

1http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume11/fox99a-html/node1.html

11

http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume11/fox99a-html/node1.html

2. BACKGROUND

of objects associated with each location. This behavior will be explained in detail in

section 5.2.

2.4 Obstacle avoidance

This component of navigation can be subdivided in two components: collision avoidance

and path recovery. The collision avoidance part can be tackled with the use of sensors

like sonars, range finders or cameras. The goal is to avoid the robot hitting any static

or moving object in its environment. Different methods vary in the degree of reactivity,

depending on the expected speed of the robot and the expected noise coming from

the sensors [5]. Common algorithms for obstacle avoidance found in literature include

edge-detection, certainty grids and potential field methods [5].

Edge detection extracts the obstacle vertical edges. This information allows the

robot to move around any of the detected edges. Depending on the sensors used, some

shortcomings of this method include: poor directionality, frequent misreadings, and

specular reflections. The Hough transform is a common approach for edge-extraction

in images. An example can be seen in figure 2.1.

Figure 2.1: Edge detection - Example of a sensed image (left) and its Hough transform

(right) showing the found edges.

Certainty grid is an adequate map representation for sensor data accumulation and

fusion. This probabilistic method locates obstacles in a grid-type model. The world is

modeled as a 2D array of squares, or cells. Every cell has a certainty value indicating

the confidence of having an obstacle inside (See figure 2.2). This value is a probability

12

2.4 Obstacle avoidance

function that depends on the sensor properties. Each cell continuously updates its

certainty value considering the sensor readings. After some time of moving around, the

robot can build an accurate map of a given area. There is a trade off between accuracy

and computational cost in this method, as both depend on the cell size.

Figure 2.2: Certainty grid - Example of the detection of an obstacle by a robot (left)

and the obstacle representation in a certainty grid (right). The values in the grid cells

represent the probability of containing an obstacle.

In potential of vector field algorithms, the goal or target location applies an attrac-

tive force to the robot. At the same time, obstacles produce imaginary repulsive forces.

The imaginary forces are represented by vector arrows distributed in the plane. The

robot’s movement direction is computed by adding all attractive and repulsive vector

forces. An example of a computed path in a vector field is shown in figure 2.3.

In our approach, the modules used for obstacle avoidance were different in the last

two stages of the project. During the second stage, the Nao default behaviors included

a reactive collision avoidance module using input from its front sonars. For the third

stage, a parallel project was developed by some of the faculty students to implement

an infrared-range finder collision avoidance module.

In both cases, unless explicitly avoided, the navigation system gives priority to the

collision avoidance commands, over the path-tracking commands indicating where to

move in order to reach the target location.

The path recovery in our method, comes from a vector field built around the main

path that allows the robot to keep moving towards the target location. This approach

13

2. BACKGROUND

Figure 2.3: Potential field - Example of a potential field containing two obstacles (Circle

and thick line). The small arrows represent the repulsion vectors around the obstacles,

and attraction vectors around the goal. The green line shows the computed path from the

starting point to the goal.

successfully solves this problem, although it has the disadvantage of slowing down the

training procedure.

14

Chapter 3

Platform

In this chapter are described the changes made in the hardware and software platforms

used during this research. The evolution of the visual-topological navigation system

can be divided in three stages, relative to the hardware and software architecture used.

This chapter reveals the importance developing hardware and software architectures as

a single parallel project.

3.1 First stage: Nao with low computing power

The election of the android NAO was due to pragmatic reasons. It has been the

standard platform of RoboCup since 2007, confirming it as a reliable robot for exper-

imentation. Nevertheless, it had never been used in the RoboCup@Home league. It

was an exciting challenge to participate for the first time with a humanoid robot in

the competition. Its robustness and re usability also are important factors. Dominey

concisely summarizes the advantages of commercial hardware platforms:

The research strategy is to use high quality commercially available robot

platforms, and software [. . . ] in order to provide a high performance baseline

system. From this baseline we then implement human-robot cooperation

capabilities inspired by contemporary results in human cognitive development

research. Because much of the technology that we use is off-the-shelf it is

robust, versatile and reusable.1

1http://www.sbri.fr/robocup-home.html

15

http://www.sbri.fr/robocup-home.html

3. PLATFORM

The robot external configuration was modified in order to increase its computing

power. This modification consisted of a mini-PC attached to the front, an extra battery

set attached to the back, and an extra unidirectional microphone. These changes did

not impeded the mobility of the robot, but certainly decreased its walking stability. The

computer attached in the front blocked the sonars from the robot, making it impossible

to use of them for obstacle avoidance.

3.1.1 First stage hardware platform

The initial configuration of the robot platform can be seen in figure 3.1. It consisted of

the following components:

Nao robot. Humanoid robot produced by the French company Aldebaran Robotics.

Its detailed specifications are in figure 3.2 and table 3.1.

Additional computer. A fit-PC2i mini-PC was attached in front of the robot for the

additional processing power required for computer vision. This hardware struc-

ture was adopted to provide autonomy to the robot during the RoboCup@Home

competition, without depending on unstable and slower wireless communication.

Its detailed specifications can be found in figure 3.3 and table 3.2.

Additional battery. An external battery set was attached to the back, in order to

provide energy to the additional computer.

Additional microphone. This unidirectional microphone was incorporated to reduce

the background noise detected during the competitions, and increase the perfor-

mance of the speech recognition modules.

3.1.2 First stage software platform

During this stage the modules developed by the team were mainly written in Python

and some in C++ and C#. The set of open source libraries, middleware and commercial

software packages included in the architecture, are described in the following list:

YARP. Middleware from CHRIS project, used to control the information traffic be-

tween sensor’s raw data, the CPU’s and the robot’s end effectors 1.

1http://eris.liralab.it/yarpdoc/index.html

16

http://eris.liralab.it/yarpdoc/index.html


Figure 3.1: Nao with additional mini-PC - Hardware platform used during the

first stage of the project. The unidirectional microphone reduced the background noise.

The external computer provided immediate processing power for computer vision, without

depending on unstable and slower wireless communication.

17

3. PLATFORM

Figure 3.2: Nao hardware diagram - Localization of the principal body components

of Nao robot. With permission from Aldebaran Robotics.

18


Table 3.1: Nao specifications

Component Specification

Height 580 mm

Weight 4.3 kg

Energy autonomy 15-45 min1

DoF 17 (Distributed in joints)

Speakers 2 - 2W

Microphones 4 (Square array)

Networking Ethernet/Wi-Fi (IEE 802.11g)

Sonars 4 (2 emitters, 2 receivers)

Gyrometer 2 - 1 axis

Accelerometer 1 - 3 axis

IR 2 (1 emitter, 1 receiver)

Cameras 2 (VGA 640x480)2

Focus range 30 cm - Infinity

Vision field 58 degrees on the diagonal

CPU x86 AMD-GEODE 500MHz

RAM 256 MB

OS Linux 32 bit, x86 ELF

1 Autonomy depends on use conditions such as the activity of the wireless card or LAN

port, the current joints temperature and the intensity of its movement (standing or

continuously walking).2 Cameras are sorted in a non-overlapping vertical array. Maximum 30 fps, depending

on the chosen image resolution and communication channel. Wired communication is

faster than wireless.

19

3. PLATFORM

Figure 3.3: Additional mini-PC - Devices available in the mini-PC attached to the

front of Nao robot.

Table 3.2: Additional mini-PC specifications


Dimensions 101 x 115 x 27 mm

Weight 0.370 kg

CPU Intel Atom Z530 1.6GHz

RAM 1 GB

OS Linux Mint/Windows 7

Audio Line-out, line-in, mic

Networking Ethernet/Wi-Fi (IEE 802.11g)

USB 4 USB 2.0 High Speed ports

20

3.2 Second stage: Nao with high computing power

OpenCV. An open source set of optimized libraries for image processing and devel-

opment of computer vision techniques 1.

Choregraphe. A graphical user interface (GUI) from Aldebaran included with the

robot Nao. It facilitates the construction of behaviors, control of the robot mo-

tion, and access to sensor data and the robot’s Operating System (OS). Finally,

Choregraphe’s text-to-speech and voice-recognition modules were used for human-

machine communication. A snapshot can be seen on figure 3.4.

Spikenet. A commercial package implementing spiking artificial-neural-networks for

visual object-recognition 2. A snapshot can also be seen on figure 3.4.

3.2 Second stage: Nao with high computing power

During this stage it was used a newer version of the Nao robot and the complementary

computing power was provided by desk computers. The experiments were being carried

for as part of a robotics course and the robot-autonomy constraint was not present

anymore. Therefore, it was possible to communicate with the robot via a long LAN

cable.

It is also important to highlight the drastic changes in the software architecture

between the first and second stages of the project. During the first stage, the architec-

ture was mainly focused on integrating available commercial software with the modules

developed by the team. During the second stage, the architecture evolved to a more

flexible and customized functionality compared to the one provided by YARP.

The changes made in this stage were the first steps towards a behavior-based soft-

ware architecture. This approach should provide the robot with a more reactive inter-

action with the world. This transition finished until third stage, and it is explained in

more detail on section 3.3.

On the other hand, these benefits came at the expense of spending a significant

amount of time building the new structure. The development of the architecture was a

continuous loop, where different requirements were being discovered along the process

and modules had to be re-engineered on several occasions.

1http://opencv.willowgarage.com/wiki/Welcome2http://www.spikenet-technology.com/technology.htm

21

http://opencv.willowgarage.com/wiki/Welcome

http://www.spikenet-technology.com/technology.htm

3. PLATFORM

Figure 3.4: Used graphical user interfaces - GUI’s used during the first stage of the

thesis project: Choregraphe (Top) and Spikenet (Bottom). Choregraphe was also used

during the second and third stages

22

3.3 Third stage: Pioneer with distributed computing power

Table 3.3: Desk PC specifications


CPU Intel Quad Core 2.67GHz

RAM 3.4 GB

OS Linux 2.6.32-5-686 (Debian 6.0)

3.2.1 Second stage hardware platform

The Nao version used on the second and third stages was more recent than the one

on the first stage. Nevertheless, the changes between both versions were software im-

provements. The main change in the hardware architecture was related to the increased

external computing power. The specifications of the desk computers used at this stage,

instead of the mini-PC, are detailed in table 3.3.

3.2.2 Second stage software platform

For the customized architecture built at this stage, the programming language used

was Python. As mentioned before, the middleware was completely developed by our

team. YARP was replaced in order to solve some ground problems, including a long

time-delay for images transmission.

A Speeded-Up Robust Features (SURF) [4] implementation was done for object

detection. In comparison to Spikenet, it is faster, more robust to light changes, and it

is based on open source libraries. Specifically, we used the OpenCV libraries from from

Willow Garage 1. The detailed functioning of the SURF implementation in described

in section 4.1.


This final stage is part of the current work being done at the Artificial Intelligence

department of the University of Groningen. It will be referred mainly in the discussion

section. However, it is important to provide a description of this stage hardware and

1http://www.willowgarage.com/pages/about-us/overview

23

http://www.willowgarage.com/pages/about-us/overview

3. PLATFORM

software components, in order to understand the possibilities for future work on visual

navigation.

The Nao robot was placed on top of a Pioneer platform from the company Mobile

Robots 1. The idea was to ensure robust mobility, and to incorporate a set of five

laptops in order to parallelize the computational processes (See figure 3.5). The Nao

was mainly used for human-machine interaction. It has a more familiar appearance to

people, and its embodiment allows it to produce human-like body language.

During this stage it was conceived the final structure of the behavior-based software

architecture. All the modules previously built were migrated to this new paradigm, and

the underpinnings of the behaviors were completely different in nature. The details are

explained in subsection 3.3.2.

3.3.1 Third stage hardware platform

The functionality of the new hardware components included in the last robot con-

figuration is described in the following list (See tables 3.5 and 3.4 for the technical

specifications):

HD-cameras. Used for visual mapping, object detection and person identification.

Kinect. Utilized for obstacle avoidance, object detection and person identification. It

is an infrared range finder that provides a 3D point-cloud with a range between

0.3 to 5 meters approximately.

Unidirectional microphone. Required for speech recognition. It was located at a

height that could capture the voice coming from a person standing in front of the

robot.

Laptops. Five additional laptops provided the required computer power. The process-

ing of the raw data coming from the sensors, and the execution of the designed

behaviors was done in parallel.

Pioneer. Wheel based robot-platform 2 used to mobilize the processing units, the Nao

robot and the sensors.

1http://www.mobilerobots.com/ResearchRobots/ResearchRobots.aspx2http://www.mobilerobots.com/researchrobots/researchrobots/PioneerP3DX.aspx

24

http://www.mobilerobots.com/ResearchRobots/ResearchRobots.aspx

http://www.mobilerobots.com/researchrobots/researchrobots/PioneerP3DX.aspx


Figure 3.5: Nao mounted on pioneer platform - Principal body components of Nao-

Pioneer robot:

Kinect. Infrared range finder providing a 3D point-cloud. Used for obstacle avoidance,

object detection and person identification.

Cameras. High definition monocular cameras used for visual mapping, object detection

and person identification.

Unidirectional microphone. Used for speech recognition.

Nao. Humanoid robot used for human-machine interaction.

Laptops. They provided computing power for the parallel processing of sensor data and

robot control.

Pioneer. Wheel based robot-platform used for stable mobility.

25

3. PLATFORM

Table 3.4: Robot’s laptops specifications


CPU Intel Core Duo T2400 1.83GHz

RAM 2 GB

OS Linux 2.6.32-31-generic (Ubuntu 10.04.2)

Table 3.5: Pioneer specifications


CPU Intel Pentium M 1.40GHz

RAM 1 GB

OS Linux 2.6.32-5-686 (Debian 6.0)

3.3.2 Third stage software platform

The modules developed for object detection and mapping were adapted to the behavior-

based architecture. However, their core functionality remained the same. The main

challenge was to obtain the same response -proven to successfully work in real environments-

with a much faster performance.

The details of the navigation algorithm are provided in chapter 4. Nevertheless, it

is briefly described the navigation behavior to justify the changes made. This example

allows to understand the reasons for the changes in the software platform, between the

second and third stages.

During the first stage, in order to move between two consecutive points in a room,

the robot performed the following steps:

1. Detect surroundings. While standing at a given location, the robot would

move the head on the horizontal axis (head yaw) to “look around”. The robot

head-yaw allowed a movement of 120 degrees. Every 30 degrees on the yaw axis

the robot would look at ±10 degrees to analyze the incoming image from its

camera. The actual processing of the image would take between 2 - 12 seconds,

depending on the amount of texture found in the image. As moving the neck

26


produced motion blur, at every step a 2 seconds pause was needed to stabilize

the image. At each location, including the time required for moving the head, it

was required 20-80 seconds to detect the surroundings.

2. Estimate location. Depending on the detected objects (their size and orien-

tation), the robot would estimate its most probable location in the room with

respect to the trained locations.

3. Correct and walk. Finally, if required, it would correct its orientation a variable

amount of degrees -according to the estimated location in the trained vector field-

and walk a predefined distance (around 25 cm).

This process was to be repeated, until the robot would estimate that its current

location was the target location in the room.

During the third stage the intention was to make navigation a reactive behavior, i.e.

to have a fast loop between perception and action. It was required to have a smooth

continuous movement of the robot until the target location was reached. For this

reason, it was not necessary for the robot to correct its orientation a variable amount

of degrees at different locations. Instead, it could continuously correct its orientation

a small fixed amount of degrees (5 degrees) in the required direction. Then it would

move forward a smaller distance than the one used in the second stage (5 cm).

Whenever the orientation of the robot was calculated to be corrected less than a

small range (± 5 degrees), the robot would simply move straight forward the predefined

distance. With the help of the distributed computing power, this more fine grained

motion could provide the required smoothness for a reactive behavior.

The use of two cameras mounted on the pioneer platform, instead of one in Nao

robot, eliminated the need of “looking around” at every location, saving valuable time.

However, the processing of the incoming images from the HD cameras required a pro-

cessing time of around 3 seconds, even when the image was reduced to a size similar to

the one used with the Nao camera. This lag may be due to the lower processing power

of the laptops on the robot, in comparison with the desk computers in the robot lab.

From the five available laptops, only one has a graphics accelerator card that could

speed up this processing. CUDA (Compute Unified Device Architecture) is a paral-

lel computing architecture that enables dramatic increases in computing performance

27

3. PLATFORM

by harnessing the power of the GPU (Graphics Processing Unit). Nevertheless, the

particular architecture of this card model was not capable of handling CUDA 1 imple-

mentations.

Some feasible possibilities to speed up the navigation algorithm include:

1. Modules optimization. Re-implementation of the algorithm in a lower level

language (C++).

2. Parallel processing. Starting multiple instances of the same vision modules,

where each works with a subset of the trained images. This division of the pro-

cessing tasks allows parallel processing in multi core CPUs.

3. Planning algorithm. Implementing a planner module that splits a building in

sub-areas. The planner would indicate the object detection modules to search

only for the objects found in the sub-area where the robot is currently located.

The reactive navigation system is part of the current work at the Artificial In-

telligence department of Groningen University. Future-work possibilities for a reactive

visual-topological navigation system, are addressed in the last chapter of this document.

1http://www.nvidia.com/object/what_is_cuda_new.html

28

http://www.nvidia.com/object/what_is_cuda_new.html

Chapter 4

An Algorithm for

Visual-Topological Mapping

This chapter is dedicated to the technical explanation of the proposed visual-topological

mapping algorithm. It is described the design of the testing environments and the

methodology used for the experiments. Finally, a description of each module from

the mapping algorithm provides the necessary information to analyze the experimental

results.

4.1 Experimental setup

There exists a set of conditions that remained constant during all the experiments

performed in this project. They are described in the following subsections.

4.1.1 Robot niche

As the origin of the visual-navigation project is related with the RoboCup@Home

competitions, the robot niche is clearly defined in the tournament rule book. It is

intended to represent a typical indoor human environment. The main concept is defined

in the rule book as follows:

As uncertainty is part of the concept, no standard scenario will be pro-

vided in the RoboCup@Home League. [. . . ] The scenario is something that

people encounter in daily life. It can be a home environment, such as a

29

4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING

living room and a kitchen, but also an office space, garden, supermarket,

restaurant etc. The scenario should change from year to year. . . [8]

The experiments described in the Experiments chapter were performed in two dif-

ferent kind of scenarios. The first kind was an maze were the robot’s motion and the

mapping algorithm were observed with high precision. The second kind of scenarios

were real human indoor environments: A corridor towards an elevator, and a lounge

room with kitchen and living room facilities.

4.1.2 Lighting conditions

During the first stage of the thesis project, the light conditions at the SBRI labora-

tory remained constant during the complete day. During the second stage, the light

conditions were variable at the ALICE laboratory.

The lighting conditions varied within the same range during the training and testing

phases. In order to focus on the navigation algorithm performance, the experiments

were carried between the same day hours. In this way the vision algorithm performance

remained at the same level.

4.1.3 Obstacles cluttering

During the first and second stage of the project, the environment was free of moving

obstacles. The study of the visual-topological mapping required to avoid the influence

of the obstacle avoidance behavior. Collision avoidance was implemented during the

third stage. Its implementation in a full navigation system is work in progress.

4.1.4 Landmarks location

The environment for the first experiment was designed according to the Nao robot

embodiment. The visual landmarks were located at a height that would allow the

robot to detect them between 30-250 cm, without being disturbed by the lights from

the ceiling.

4.1.5 Camera properties

All models of the objects and visual features were built with the camera integrated

at the top-front of the Nao’s head. The chosen image size was 640 x 480 pixels. The

30

4.2 Machine learning techniques

navigation system can infer the angle of a detected object, relative to the X-Y axis in

the image, independently of the image size (See figure 4.1).

Figure 4.1: Projection on camera sensor - Projection of an object in the camera

sensor. The dashed square represents the perceived world and the opposite square the

camera sensor-plate. The point where all the projections converge represents the center of

the camera lens. The distance from this point to the sensor-plate is the focal-distance.

The origin of the arch projected on the sensor plate of the camera is at the center

of the camera lens. The origin of the X-Y coordinates is located at the center of the

image. Such coordinates are given in pixel units.

Once the focal distance is obtained, the angle of the detected object’s centroid can

be computed. The detailed explanation of this computations can be found in subsection

4.3.3. For example, if the centroid of a detected object is in the center of the image, its

coordinates will be (0,0) and both of its X and Y component’s angles will be 0 degrees.

The maximum angle on the Y axis, for the Nao camera, is depicted in figure 4.2.

4.2 Machine learning techniques

The proposed visual-topological mapping belongs to the supervised learning algorithms

category. Specifically two modules of the mapping algorithm are suitable for the use

machine learning algorithms: Object detection and Location estimation.

31


Figure 4.2: Nao’s camera angle - Nao’s camera maximum angle along Y axis direction.

With permission from Aldebaran Robotics .

The success of the mapping algorithm was measured at two levels in the experi-

ments: The success rate for estimating the exact location where the robot is located,

and the success rate for reaching the target location.

During the first stage of the project it was used Spikenet [28] for visual object

detection. Spikenet is a artificial neural network algorithm that allows the construc-

tion of object’s models for later recognition. During the second and third stages, a

SURF [4] implementation was used instead for visual object detection. The SURF

implementation used for this project is based on the OpenCV libraries for computer

vision.

For location estimation it was used a nearest neighbor implementation to catego-

rize the input vectors. Initially such vectors consisted of binary values indicating the

presence/absence of the trained objects. Later the input vectors were constructed with

abstract values describing the distance and rotation of the detected objects.

The methodology used for object detection and location estimation is explained in

detail in the subsections 4.3.2 and 4.3.5.

4.3 Visual navigation algorithm

The visual navigation method consists of behaviors and sub behaviors coupled in a

sequential loop. The implementation can be summarized in the following pseudo code:

while searching_location:

detect_surroundings() #Composed of three sub-behaviors

estimate_location()

32


if current_location == target_location:

searching_location = False

correct_orientation()

1. detect_surroundings() Detection of trained objects in the surrounding space

of the robot. This module is composed of three sub behaviors.

(a) neck_motion() Searching behavior that allows the robot to perceive its

surroundings by moving the neck from left to right.

(b) store_detections() Detection of trained objects using of computer vision

techniques.

(c) compute_properties() Computation of the geometric properties of the de-

tected objects.

2. estimate_location() Estimation of the robot’s current location in the trained

path, or vector field.

3. correct_orientation() Correction of the robot’s orientation to match the po-

sition in the estimated current location.

Each of the elements in the previous steps is described in detail in the following

subsections. The store_detections() method was in charge of object detection. Both

of the object detection approaches introduced in the Platform chapter are explained in

the following subsections: Spikenet and SURF. They are explained separately in order

to compare their capabilities and justify the transition.

4.3.1 Object detection: Spikenet

During the first stage of the project, the object detection process was achieved with

Spikenet [11]. Spikenet fundamentally is an event-driven algorithm that efficiently mod-

els large networks of spiking neurons by only processing the output of firing neurons.

The set of parameters that was possible to specify for the neural network, included

a confidence threshold to consider positive detections, and color space to specify the

dimension of the vector space between RGB or B&W.

The construction of objects models was done manually during an off-line training

process. The complete procedure is described in the following sequence:

33


1. Collection of images from the environment. First it was gathered a dataset

of images from the required room. It was used the robot camera to collect the im-

ages, by turning the robot -in standing position- 360 degrees at spots distributed

all around the room. This approach intended to collect the required images for

building paths between any two points inside the room. Nevertheless, further

experimentation revealed it was sufficient to collect images close to the desired

path, and with the robot turning its head while acquiring images in a range of

120 degrees instead of 360.

2. Creation of models from objects. After the images were collected, they were

opened with the Spikenet Graphical User Interface (GUI) in order to manually

encircle the desired objects in each image. They were assigned descriptive names

such as TV, Computer, Microwave, etc. As well, it was specified the range of

model sizes and rotation to be generated. For example, Spikenet could search for

a model between 50-200% of the size of the original model, and within a rotation

of -12 to +12 degrees in steps of 6 degrees.

3. Continuous detection of objects. Then a script was run to initialize the robot

architecture, and the objects-models file was loaded in to the Spikenet object

detection module. Then every incoming image was searched for the presence of

any of the built models. In case of a detection, it was calculated the location of

the centroid of the detected object, its relative size with respect to the trained

model, and its rotation angle on the image plane. This computation was done

for every detection -in case of multiple simultaneous detections- and sent to the

navigation modules. An example of Spikenet detections can be seen in figure 4.3.

4.3.2 Object detection: SURF

SURF (Speeded-Up Robust Features) [4] is a descriptor and feature detector. It has

the advantage of being an on-line method, scale and rotation invariant. A K-Nearest

Neighbors (K-NN) implementation from OpenCV was used for matching the detected

keypoints or descriptors. SURF performance has shown more robustness when tested

against other popular algorithms, including SIFT (Scale-invariant feature transform)

and GLOH (Gradient Location and Orientation Histogram) [4].

34


Figure 4.3: Spikenet detections - Visualization of detections made on incoming images

from Nao’s camera. Notice the low resolution of the used images. This resolution was

required to handle the expensive computations required by Spikenet.

Initially, images were manually trimmed to contain only the desired objects (See

4.4). Then all the images of the same object were moved to a folder with a descriptive

name, such as fridge, computer, sofa, etc. This process was tedious and slow. The

creation of a GUI was considered to speed up this process. Nevertheless, an alternative

approach -that could be implemented much faster- was sound and worth to try: To

consider a single object every full image from the training dataset. No trimming would

be done anymore.

Figure 4.4: SURF detections - Visual detection of trained objects using the SURF

implementation. Notice the algorithm’s robustness against rotation. The lines in the

bottom-left corner show some of the projections of the object’s keypoints on the X (hori-

zontal) and Y (vertical) image axis.

35


A draw back was that when the images where being trimmed their size was on

average 100 x 150 pixels size, and now they increased to 640 x 480, making their

processing more computationally expensive.

An example of SURF detections considering full images as single objects can be seen

on figure 4.5. The figure depicts two consecutive images taken from the robot, with the

neck pitch at -10 and 10 degrees (left-top and left-bottom images respectively). The

neck yaw angle is the same in both images. The detected objects are surrounded by

squares, and some of them were found in both images.

Figure 4.5: Full images considered as objects - Visualization of detections consid-

ering full images as single objects. The images on the left are the incoming images while

navigating. The images on the right show the detected objects, or what the robot “re-

members” from the trained images. The squares on the left images surround the objects

detected. The deformation in the squares occurs when the detected object is rotated (See

the dashed squares in the bottom images). The dots are located on the keypoints found

by the SURF algorithm.

The SURF method from OpenCV requires the specification of some parameters. For

36


Table 4.1: Default SURF parameters


Descriptors size 64

Hessian threshold 300

Octaves 3

Octave layers 4

the experiments described in this work the default values were used, as they provided

a satisfactory performance. The parameters are described next, and the default values

can be found in table 4.1:

1. Descriptors size. Basic descriptors include 64 elements each, and extended

descriptors 128 elements each. The increase of this parameter enhances accuracy

at the cost of computational power.

2. Hessian threshold. Only features with Hessian larger than the specified value

are extracted.

3. Octaves. Number of octaves to be used for extraction. With each next octave

the feature size is doubled.

4. Octave layers. Number of layers within each octave.

5. Minimum matches threshold. This threshold indicates the minimum number

of keypoints in the current detection matching a stored model, to consider a

positive detection.

The construction of objects models from the images dataset, was done only once

during the training of a path. The created models were stored at the end of the first

training. In following training and testing runs the models were loaded instead of being

built again. Loading the models was 10 times faster than creating them at every run.

The complete procedure is described next:

1. Collection of images from the environment. Similar to the procedure fol-

lowed for Spikenet, it was gathered a dataset of images. This time the area for

37


collecting images was reduced to the path to be trained, instead at locations

spread around the entire room. The robot camera was used to collect the images.

The robot turned its head 120 degrees, on steps of 30 degrees from left to right,

while being on the default standing position from Choregraphe.

The spots for taking images were distributed along the path to be trained. The

distance between each, was twice as big as the distance the robot would walk

while navigating. This is, if during the testing phase it was required the robot

to walk 25 cm every time it would look around to calculate its location, then the

images were taken every 50 cm.

2. Creation of models from objects. The models were built with the SURF im-

plementation by creating a set of descriptors for each of the collected images. The

training process was performed in the initializer of the object detection module.

Then the system was continuously extracting the SURF features of the incoming

images, and comparing them with the trained models.

3. Continuous detection of objects. After the robot has moved its head, the

SURF module had to wait around 1 second to process a stable image. Without

this pause the image was blurry. The SURF module was not able to detect salient

features in blurry images. Salient features depend on the contour lines found by

the Hough transform. In blurry or sweeped images, the Hough transform finds

little -if any- contour lines.

The system concludes the presence of a trained object if the amount of detected

features matching a particular model is above a user-specified threshold. In the

case of several trained objects matching the same detected SURF features, it

displays all.

4.3.3 Calculation of an object’s angle

For the calculation of the angle of a detection in an incoming image, it is necessary to

know the camera’s focal distance. This value is not included in the hardware documen-

tation. Nevertheless, it is mentioned that the camera has a fixed focal distance, and

that the camera’s diagonal angle is 58 degrees. Knowing the camera’s diagonal angle

38


and the image size (in pixels), the focal distance can be calculated with the following

procedure:

h =

√(x

2)2 + (

y

2)2 =

√x2

4+y2

4=

√x2 + y2

4=

√x2 + y2

2(4.1)

Where h is the hypotenuse (pixels from the center of the image to the corner), x is

half the length of the X axis, and y is half the length of the Y axis (See figure 4.6). With

the value of the hypotenuse and the angle of the image diagonal, the focal distance is

computed as follows:

Tan θ =h

fdist(4.2)

Where θ is half of the angle of the image diagonal, i.e. the angle from the center of

the image to the corner. fdist is the camera’s focal distance. By substituting equation

4.1 in equation 4.2 we obtain:

fdist =

√x2+y2

2

Tan θ=

√x2 + y2

2 Tan θ(4.3)

Figure 4.6: Object detection components - Angle of the X and Y components of the

detected object centroid. The blue square represents the camera sensor-plate. The object

is represented by the circle and the origin of the angle is measured from the focal point of

the camera lens.

39


Once knowing the focal distance, it is possible to calculate the angle of any pixel

along the X (horizontal) or Y (vertical) axis. For the calculation of the angle θY along

the Y axis of a given pixel, it was used the following method:

Tan θY =Y − yfdist

θY = arcTanY − yfdist

(4.4)

By substituting equation 4.3 in equation 4.4 we obtain:

θY = arcTanY − y√x2+y2

2 Tan θ

(4.5)

Where Y is the total distance from the bottom of the image to the given pixel (in

pixels), and y is half the length of the Y axis (also in pixels). Therefore, the computed

angle can take positive and negative values. The same procedure was followed for the

calculation of the angle θX along the X axis of a given pixel.

Finally, the current head yaw and pitch angles, were respectively added to the

centroid’s X and Y angles on the image. This computation provided the objects angles

relative to the robot’s front.

4.3.4 Detect surroundings

The creation of a vector field, or a “path”, consists of dividing the route that the robot

should learn in a discrete number of intermediate points. At each point the robot “looks

around” and finds which of the learned objects are present. The detection of several

objects facilitates the disambiguation between multiple locations that share landmarks

with similar properties.

The “look around” behavior made use of the full mobility range in the neck’s yaw

(horizontal) and pitch (vertical). The behavior was performed by placing the robot -in

the default stand position- at each location, and making it move the head from left to

right. The yaw movement was from -60 to 60 degrees, in steps of 30 degrees. On each of

the yaw steps it searched for models at 10 and -10 degrees along the head pitch motion.

In total the “look around” behavior consisted of 10 head positions. This behavior can

be summarized in the following pseudo algorithm:

40


declare head_yaw_start #Initial head angle

declare head_yaw_steps #Number of horizontal movements

declare head_yaw_change #Angle to move the head at every step

declare head_pitch_up #Angle to look up

declare head_pitch_down #Angle to look down

look around():

head_yaw_position = head_yaw_start

for every head_yaw_steps:

set_head_yaw( head_yaw_position )

set_head_pitch( head_pitch_up )

detect_objects_in_image()

set_head_pitch( head_pitch_down )

detect_objects_in_image()

head_yaw_position -= head_yaw_change

store_detected_objects()

compute_geometric_properties_in_detections()

After looking around, the robot relates the detected objects with the current lo-

cation and stores them as a point in an Euclidean vector space. In such space, each

dimension corresponds to the properties described in subsection 4.3.2. The storage of a

location where the robot detected two objects, a painting and a computer, would look

as follows:

painting_23: body_x=0.7215, body_y=-0.1789, cov_x=7560.8802,

cov_y=8944.1634, cov_xy=-5521.9053

computer_17: body_x=0.7564, body_y=-0.0618, cov_x=9383.1553,

cov_y=736.9513, cov_xy=-501.8768

The number after the object’s name is the specific instance detected. body x and

body y are respectively the horizontal and vertical angles of the image centroid, in

relation to the robot’s body. cov x, cov y and cov xy are the values of the covariance

matrix of the object’s keypoints, with respect to their projections on the image’s X and

Y axis (See figure 4.4).

41


4.3.5 Location estimation

A location is any point on the floor of a given room, where the robot can walk. It is

assumed that from any location in a room, it is possible for the robot to detect salient

features in the incoming images from its camera. Such salient features or keypoints,

are generated by the texture produced in the images by furniture, decoration or the

geometry of the room itself (lines between the floor, walls, ceiling and windows). The

little “dots” created by the SURF algorithm represent the keypoints found on each

detected object (See figure 4.4).

The following properties can be computed from the keypoints describing a trained

object or image:

1. Variance. Variance of the keypoint’s projections on the X and Y axis of the

image.

2. Covariance. Covariance correlating the keypoint’s projections on the X and Y

axis of the image.

3. Centroid. Pixel coordinates of the keypoints center of mass.

4. Angles. Angles of the keypoints center of mass: Relative to the camera lens, i.e.

in the image, and relative to the robot’s body.

5. Square coordinates. Pixel coordinates of -the corners of- a square fitted around

the keypoints.

Each of the object’s properties represent a different dimension in its own vector

space. When storing the properties of the objects that the robot detected at a given

location, it is said that the location was stored. Finally, when storing a set of locations

in a vector field -between two desired points in a room- it is said that a path was trained

between such two points. The locations in the area at the end of the trained path are

called target locations. The rest of the locations in the trained path are called path

locations.1

During the testing phase, the robot should manage to re-run a trained path by

looking at the properties of the objects it can detect from its current location. Using

1The terminology used for describing the mapping procedure was considered adequate for our

algorithm, and is not standard in scientific literature.

42


this information it has to find what is its most probable location in the trained path.

Then it has to correct its orientation according to the location it found to be the closest,

and walk the same distance there was between the locations in the trained vector field.

The representation of every location as a set of vectors, allows the robot to estimate

its most probable position when situated in a random place near the stored vector

field. The vector describing the robot’s current detections is found in an n-dimensional

Euclidean space. The n dimensionality of such space depends on the number of features

chosen to describe every object. For every detected object in a current detections

vector, the nearest neighbor algorithm can find another instance of the same object

in the trained locations vectors, with the smallest euclidean distance. Formally the

Euclidean distance is measured as follows:

d(p,q) = d(q,p)

=√

(q1 − p1)2 + (q2 − p2)2 + . . .+ (qn − pn)2

=

√√√√ n∑j=1

(qj − pj)2(4.6)

Where q is the vector describing a detected object, p is its counterpart stored in a

trained location, and n is the total amount of such features. Therefore, the most prob-

able location will have the smallest average distance, between the currently detected

objects and their counterparts stored in the vector field. The formal representation of

this method is given in the following formula:

arg min

[d(q,p1) + d(q,p2) + . . .+ d(q,pn)

]k

= arg min

∑ki=1 d(q,pi)

k

= arg min

∑ki=1

√∑nj=1(qj − pij)2

k

(4.7)

Where k is the number of detected objects. For each detected object, the robot

looks up the closest match in the stored locations vectors using only the keypoints

covariance-matrix. To be more precise, the variance of an object keypoints along the

X and Y axis of the image, will reflect the perceived size and orientation of the object.

The location of keypoints in an image depend on its texture. When considering full

images as objects, the distribution of the keypoints is uncorrelated. Therefore, the x

43


and y components of the keypoints were considered independent, and the covariance

matrix was calculated using an unbiased estimator:

σij =

∑Nn=1 (xin − µi)(xjn − µj)

N − 1(4.8)

The values of the keypoint’s covariance matrix are descriptive features. They reflect

the appearance of visual landmarks at specific locations, irrespective of the robot’s ori-

entation when reaching those locations (or near them). The example on sub-subsection

4.3.5.1 clarifies the specific selection of this features.

In summary, if the robot is commanded to bring a drink, it will search its current

location in the paths it trained to go to the kitchen. Furthermore, inside those paths

it will only compare the objects it is currently detecting, with the locations containing

those same objects. Even if the stored location contains more or less objects, it will

consider only the matching objects. This makes the system robust against occlusion or

displacement, of some of the objects it was trained to perceive at a given location.

4.3.5.1 Location disambiguation

Lets assume a path on a regular street, going towards a “STOP” traffic sign and then

surrounding it. If we had seen it far at the end of a street, it would look small and the

keypoints describing it would be close to each other (with little variance on the X and

Y axis). On the other hand, if we had been standing close in front of it, it would look

big in our field of view and its keypoints would be spread all over the image we perceive

(with large variance on the X and Y axis). Finally, if we would be standing close, with

the “stop” sign on our right, it would look like a long and thin vertical blob (with little

variance on the X axis and large variance on the Y axis). Hence, it would look the

same if we would be standing close, with the “stop” sign on our left. How could be

possible to disambiguate both scenarios if we have no other available landmarks?

It is appealing to use the angle of the object in our field of view to differentiate

between both of the cases mentioned above. In the locations 2 and 3 meters in front of

the “STOP” sign, we would perceive it at the same angle. In such cases the location

estimation is not misled by the use of the object’s angle in the robot’s field of view.

It is enough to use the variance of the keypoints. Nevertheless, the locations near the

44


sign require us to turn in order to walk around the “STOP” sign. In this case the use

of angles may mislead the estimation of the current location. Why?

In the case of a location 0.5 meter in front of the sign (lets call it location B), we

want to turn for example to the left. Hence, when training this location in the path

we placed the robot facing 45 degrees to the left. Due to the nature of the training

method, when the robot reaches location B, even if it perceives the sign with the right

size (keypoints covariance) it may estimate closer the location 1 meter behind (lets call

it location A). In location A the sign was trained to be in front of the robot (as currently

perceived), whereas location B was trained to have the sign on its right side (as it was

facing 45 degrees to the left). The discrepancy between the currently perceived object

angle and the stored angle in location B, may be greater than the discrepancy between

the currently perceived object size and the size stored on location A. Therefore, leading

the nearest neighbor algorithm to find closer location A than location B. This confusion

is avoided when using only the covariance-matrix of the keypoints to feed the nearest

neighbor algorithm.

4.3.6 Correcting orientation

At every location, the angle of the detected objects is used to re-align the robot accord-

ing to the trained vector field. The robot is assumed to always perceive several objects

and/or visual features from any location. Therefore, if a few objects have changed

their position in the domestic setting, the average angle correction will not change

significatively. This is a key feature of this algorithm.

An obstacle avoidance module will prevent the robot from colliding with any moving

objects during navigation. Therefore, it is only important that periodically the robot

corrects its orientation to avoid fixed obstacles (walls, large pieces of furniture). There

is no need to follow a path with high precision. If the robot is aligned imprecisely

towards the goal, this will not avoid converging to the target area.

During the training of a path, on every location the robot was positioned in the

direction required to avoid static obstacles. Consider the example “U” shape path

depicted in figure 4.7. The path starts at the circle and ends at the cross. The green

squares represent the training locations and the pink stars represent the objects it could

detect at each point.

45


Figure 4.7: Example path - Example “U” shape path that the robot learnt. It starts

at the circle and ends at the cross. The green squares represent the training locations and

the pink stars represent the objects it could detect at each point.

On the 3rd and 4th locations from the starting point, the robot was trained to turn

90 degrees to the left. This turning behavior was achieved by storing such locations,

with the robot facing on the direction that it would have after having finished the turn.

During the testing phase, the robot was placed at the starting location. It “looked

around” to determine which objects it could detect from that location. Once the

angles of the detected objects were computed, it compared them with the objects

corresponding to the first location in the trained path. The correction angle was the

average difference between the angles of the matching objects. This angle allowed the

robot to correct its current orientation, and face the direction it had when training the

path. The formal representation of this algorithm is as follows:

g(q,p) =

[(q1 − p1) + (q2 − p2) + . . .+ (qk − pk)

]k

=

∑ki=1(qi − pi)

k

(4.9)

Where q is the vector containing the detected objects angle (relative to the robot’s

body), p is its counterpart stored in a trained location, and k is the number of detected

objects.

Finally, the robot walked a predefined distance between each location. This process

would be repeated until the final -or target- location was reached.

46

Chapter 5

Experiments

Now it is described the set of experiments performed to test the robustness of the

visual-topological mapping algorithm. Such tests explore the underpinnings of the

computational methods and their effectiveness in real world environments.

5.1 Experiment 1: Path formation

As a starting point, the robot learnt to traverse a simple “U” shape path. Such path

is depicted on figure 5.1. The robot started at the circle and finished at the cross. The

squares represent the trained locations and the stars the position of the objects it could

recognize. During the training procedure, the robot was manually placed at each of

the locations to be stored.

There are two effects that are explored in this first experiment:

1. Convergence of the robot to path and target locations. This experi-

ment explores the consequences of varying the number of intermediate locations

stored in a path between two points. In particular, it was observed the effect of

accumulated odometric error between locations.

2. Correction of the robot’s orientation. It is necessary to test the effectiveness

of the correction of the orientation of the robot, given that it actually knows its

current location in the path. The precise algorithm for the correction of the

robot’s orientation was detailed in subsection 4.3.6.

The objectives of this experiment require the following constraints to be observed:

47

5. EXPERIMENTS

Figure 5.1: Trained “U” path - “U” shape path that the robot has to memorize. The

robot starts at the circle and ends at the cross. The green squares represent the training

locations and the pink stars represent the objects it could detect at each point.

1. The robot was provided with its correct current location. Providing this

information to the robot allowed to test the performance of the correct orientation

behavior. During the training phase each location was written in a file line.

Therefore, having stored as many lines in the file, as intermediate locations in the

path. For example, at location 3 the robot would compare its current detections

with the ones stored in line 3 of the trained path file, and so on.

2. The trained path was tested in a maze free of obstacles. In this way

the variance of the Nao’s position, with respect to the trained positions, could be

observed at every location without interference from obstacle avoidance.

3. Objects placed along the path were never occluded. For this experiment

thirty objects were placed along the path. The only factors avoiding the robot

to detect them, would be having the objects further than 1.5 m, perceiving them

turned more than 40-50 degrees, or simply having them out of the range of view.

A high precision system, such as a zenithal camera orthogonal to the floor, was not

available in the laboratory at the moment of running this experiment. Nevertheless, a

48

5.1 Experiment 1: Path formation

set of images was taken at the end of each run. These images allowed the comparison

of accumulated odometric error, at every location in both paths.

Two paths were stored during the training procedure: The first one using 6 in-

termediate locations (90 cm between each), and the second one using 11 intermediate

locations(45 cm between each location).

The starting position of the robot was the same for all runs in both paths. The

position of the robot was marked at each of the locations it stopped by drawing a

line behind its feet. These lines showed the variation in the position of the robot -in

distance and angle- between the training and testing phase.

In figures 5.2 and 5.3, it can be seen the variance in the position of the robot at

every location in both paths. The robot re-ran 3 times the 6 locations path, and 2

times the 11 locations path.

Figure 5.2: 6-locations path - Variance at each location for the 6-locations path. The

white arrows show the training places and the robot orientation in them. The black arrows

indicate the robot position at each location during testing. Each foot was placed on top of

each arrow.

5.1.1 Conclusion for path formation

It was corroborated that the correction of the robot orientation is a robust approach

for traversing a path. If the output of the algorithm for location estimation is close

49

5. EXPERIMENTS

Figure 5.3: 11-locations path - Variance at each location for the 11-locations path.

The white arrows show the training places and the robot orientation in them. The black

arrows indicate the robot position at each location during testing. Each foot was placed

on top of each arrow.

enough to the actual location of the robot, it can be expected that the robot effectively

adjusts its orientation.

It can be observed in figures 5.2 and 5.3, that the variance in the angle of the robot,

when arriving at the intermediate locations, does not improve drastically between the

6 and the 11 locations path. Nevertheless, the variance in the distance to the trained

spots clearly decreased when increasing the number of training locations from 6 to 11.

This phenomenon is expected, as the error in the robot’s orientation accumulates as

the traveled intermediate distance increases. The results observed in this experiment are

promising for accurately following a path, especially if the time for location estimation

can be reduced. For example, in the case of estimating the current location of the robot

10 times per second, instead of every 45 cm, the precision for following a trained path

should increase dramatically.

50

5.2 Experiment 2: Location estimation


During the first stage of the project, a small experiment was carried to verify the

robustness of our approach for solving the kidnapped robot problem. Its results provided

a guide line to formulate the current Location estimation experiment. We will call the

first stage experiment where am I?. Its steps are described as follows:

1. Assign objects to different rooms. It was specified in advance the name

and properties of several rooms in a file. Such rooms were represented by binary

or presence/absence (1 or 0 respectively) vectors, describing the state of all the

learned objects. Mostly the objects would be different in different rooms, although

some of them shared a few objects in common.

2. Estimate current location. Then it was shown to the robot a set of the

learned objects. It would send a feature vector to the nearest neighbor module

describing its current detections. The system would infer that the current location

corresponded to the room’s vector closest to the current detections vector, and

the robot would speak the room’s name out loud.

3. Forget no longer detected objects. Finally, the robot would have a “forget-

ting” period for every object, depending on a predefined time. If the robot would

not detect an object after some seconds, it would adjust the corresponding value

on its current detections vector, affecting in this way the nearest neighbor module

output.

In the particular case of not detecting any object, the robot stated the need of more

information to estimate its location. In a navigation system, when the robot finds itself

in this state, it can start an exploring behavior supported by the obstacle-avoidance

module.

In this second experiment, it was intended a deeper exploration of -the functionality

of- the nearest neighbor method in the proposed mapping system. The trained location

vector and the current detections vector did not consist anymore of binary values,

describing the status of all the trained objects at once. Now the vectors contained the

object’s properties that were described in subsection 4.3.4: body x, body y, cov x, cov y

and cov xy.

51

5. EXPERIMENTS

The robot calculated its most probable current location, by searching in the vector

space of a specific path. The output was the trained location vector with the smallest

Euclidean distance to the current detections vector.

In the previous experiment (section 5.1), all of the robot’s detections were stored

in separate files during the training and testing phases. Therefore, it was possible to

compare each of the detection vectors with the trained location vectors, and find if the

nearest neighbor output was exactly the actual location of the robot, or one of the

neighboring ones.

To provide homogeneity to the data, a new 9 locations path was trained in the

same maze. The path was tested 30 times. The trained locations file and the current

detection files, were used now as training and testing datasets respectively. Each of the

9 observations -or detection vectors- from the 30 runs was fed to the nearest neighbor

algorithm. A classification was considered successful when:

1. It was exactly the location of the robot when storing that detection vector.

2. It was n locations before or after the actual location of the robot, when storing

that detection vector.

Using the nearest neighbor method defined in formula 4.7, and considering the

constraints defined above, the success rate for the location estimation behaved as seen

in figure 5.4.

When zero neighbors are considered for the success, i.e. only when the output of

the nearest neighbor algorithm is identical to the current location, the success rate has

its lowest value. When considering n neighbors adjacent to the current location, as

successful outputs, the success rate increases dramatically.

The different curves specify the object’s properties removed before sending the

detections vector to the nearest neighbor algorithm. The object’s properties that re-

mained being used in the four curves, were the variance of the keypoint’s X and Y

components.

5.2.1 Conclusion for location estimation

It can be seen in graph 5.4 that even in the case of 0 neighbors, the algorithm’s per-

formance is well above the randomness threshold. The accuracy improves further with

52


Figure 5.4: Estimation of the robot’s location in a vector field - In the graph it can

be seen the effect of increasing the number of neighboring locations -to the actual position

of the robot- considered as successful estimations. When zero neighbors are considered for

the success, i.e. only when the output of the nearest neighbor algorithm is identical to the

current location, the success rate has its lowest value. When considering n neighbors adja-

cent to the current location as successful outputs, the success rate increases dramatically.

The different curves specify the object’s properties removed before sending the detections

vector to the nearest neighbor algorithm. The four curves used the individual variances

of the X and Y keypoint’s components for the estimation of the robot’s location. In a

different setting, the use of angles may decrease the success rate more dramatically than

in this experiment. In sub-subsection 4.3.5.1 is described an example of a setting sensible

to the use of angles for location estimation.

53

5. EXPERIMENTS

the use of the first pair of neighbors. These results provide the ground for the follow-

ing experiment, where the path formation and location estimation methods are tested

together.

It is possible that these optimistic results are due to the particular configuration of

the maze used for this experiment. While the robot was walking along the path, some

of the closest objects on its side were only detectable from specific locations.

On the other hand, these results prove the effectiveness of the method when there

are enough objects spread in the environment to disambiguate a location from the rest.

The results found in this experiment showed a satisfactory performance in an artificial

environment. The time had come now to test the mapping algorithm in the real world.

5.3 Experiment 3: Visual navigation

At this point it was possible to test the combined performance of the robot’s visual

perception, the correct orientation behavior, and the location estimation provided by

the nearest neighbor algorithm. Two real scenarios were used for testing purposes. The

first one was a path towards an elevator with a few visual landmarks in the surroundings.

The second path was built in a lounge with kitchen and living room facilities.

Both scenarios were used to test the convergence of the robot to the target location.

In the second scenario was tested an optimized version of the navigation algorithm. This

optimization facilitated the conduction of four times more tests.

The training procedure, for both scenarios, was similar to the procedure used in

the first experiment (Section 5.1). The main difference was that the full images coming

from the robot’s camera were considered as objects. The images were not manually

trimmed anymore. The procedure consisted of the following steps:

1. First a set of images was taken with the robot, at several points along the path

to be created. 431 images were taken for the elevator path and 352 for the lounge

path.

2. After creating models of the set of images, the path or vector field was trained.

The robot was manually placed at different locations facing in the desired direc-

tion. In this way it was formed the vector field leading to the target location. See

54


figure 5.5. 84 locations were trained for the elevator path and 64 for the lounge

path.

3. Finally the robot could be placed at any point near the vector field, and it would

start moving towards the indicated target area.

4. The stopping criterion was defined by the set of trained locations around the

target area. Such locations were stored as target locations and the rest of the

trained points as path locations. If the current detections vector would be closest

to one of the target location vectors, the robot would assume to have reached the

target area and stop.

The robot described trajectories with some deviations from the shortest line between

the starting and target points as seen in figure 5.6. A comparison of the characteristics

and result of both experiments can be seen in table 5.1.

Figure 5.5: Stored vector field - Area where the vector field was trained. The elevator

door was the target location.

Figure 5.6: Robot trajectory - One of the zig-zag paths described by the robot during

testing phase of the Elevator path.

55

5. EXPERIMENTS

Table 5.1: Visual navigation results

Elevator path Lounge path

Trained objects 431 352

Vector field locations 84 64

Vector field density 9 locations/m2 6-14 locations/m2

Convergence to target 0.8 0.8

5.3.1 Elevator path

On the first scenario, the robot was trained to reach the elevator door from one of the

corridors at the Artificial Intelligence department. This location was specially challeng-

ing for the SURF algorithm. The walls lacked enough visual texture from decoration

to generate keypoints, or regions of interest. Nevertheless, the SURF algorithm created

enough models of the scene images to conduct the robot to the goal.

The path locations had no ordered sequence. For this reason it was not possible to

measure the location-estimation success rate as in the second experiment (Section 5.2).

Nevertheless, this procedure provided the desired speed and flexibility for the training

procedure. The success rate now was considered the number of times the robot would

converge to the target location, with respect to the total amount of trials.

The robot converged to the target location in 4 out of 5 times (See table 5.1).

The single occasion that was not successful resulted from situational factors: While

taking a corner, the robot got an arm stuck and, consequently, lost its balance and

fell. The small number of trials is due to the required time for training and testing

the mapping algorithm. It took approximately 1.5 minutes to conclude each step for

location-estimation, and 40 - 50 steps before reaching the goal location. This meant

around 60 - 75 minutes each run.

5.3.1.1 Conclusion for elevator path

Even though the result shown in figure 5.7 seems satisfactory, it has a low statistical

significance due to the low number of trials. The time required for each experiment was

impractical. It is common to find a small number of trials for experiments in robotics

papers. Nevertheless, having only five trials yielded unreliable results.

56


Figure 5.7: Confidence limits for Elevator path - Confidence limits for a recognition

rate p=0.8 as a function of N=5 samples and level of confidence alpha=0.01.

For this reason the training and testing procedures were improved by dynamically

reducing the search space. When searching only for the objects nearby the estimated

location of the robot, the look around behavior was speeded up 4-5 times. The complete

description of this improvement is given in the following subsection.

5.3.2 Lounge path

For this scenario the training and testing procedures were improved. The nearest

neighbor algorithm was used to reduce the search space of the robot while navigating.

One of the trajectories described by the robot in this scenario can be seen in figure 5.8.

The optimization procedure consisted of the following steps:

1. The first location estimation at the beginning of every experiment, was performed

in the same form as in the previous scenario. The robot compared its current

detections against the full set of trained locations. At the same time it searched

for all of the trained images in the incoming image. The output of the algorithm

was the most probable location of the robot.

2. After the first location estimation, the robot calculated which were the l closest

locations to the location it found on the previous step. l = L/4, where L is the

total amount of trained locations in the path, and l is the subset of L closest to

the estimated location at the beginning of every run.

57

5. EXPERIMENTS

Figure 5.8: Lounge path - Example of one of the trajectories described by the robot

when searching for the fuzzy target location Livingroom. In the top four images is shown

an uncluttered section of the trained path. Here the robot did not require to train a dense

vector field to successfully describe the desired route (6 locations/m2). In the bottom 10

images the environment was more cluttered, and the robot required to train a dense vector

field to navigate smoothly (14 locations/m2).

58


3. For the second and subsequent location estimation steps, the robot only searched

for the objects contained in the l closest locations. Then it only compared its

current detections vector against the same l closest locations.

4. Step three was continuously repeated until one of the closest locations would be

a target location. In this case all the target locations were included in the search

space at every step, and the value of l increased to l = L/2

The reasons for the inclusion of all the target locations and the increase of l, under

the final step conditions, were experimental. Sometimes the robot was not able to find

the objects at the end of the path. On the other hand, sometimes it would find itself

in a target location, while being a bit far from the target area. What were the reasons

for this behavior of the algorithm?

Sometimes the robot was not looking for some of the objects in the target area,

because the l estimated closest locations did not contain them. For this reason all the

target locations were included in the search space as soon as one of the closest locations

was a target location.

The inclusion of all the target locations would gave a greater weight to the target

locations in proportion to the included path locations. For this reason, sometimes

the robot found itself in a target location, while being a bit far from the target area.

Therefore, the increment of l balanced again the search space.

The parameters and performance of the mapping algorithm in the lounge path ex-

periment can be seen in table 5.2. The first 10 runs used a SURF minimum matches

threshold of 7 and the last 10 runs a threshold of 4 (For a description of this parameter

see subsection 4.3.2). The intention of this change was to test the effect on the mapping

algorithm an increasing number of detections. There was no significant difference in

the results when changing the value of this parameter.

5.3.2.1 Conclusion for lounge path

From 20 runs completed in this scenario, 4 were not considered successful. In 2 occasions

the robot could not find the objects near the target area. This problem motivated the

inclusion of all target locations and the increase of l as explained in the fourth step

of the optimization procedure. The other 2 unsuccessful runs were due to the robot

walking towards a white shell.

59

5. EXPERIMENTS

Table 5.2: Optimized navigation results

Run Success Distance

between

loca-

tions

(m)

SURF

matches

thresh-

old

Total

steps

Total

time

(min)

Comments

1 good 0.25 7 18 13.00

2 bad 0.25 7 29 21.00 Walked towards white ob-

jects shell. Obstacle avoid-

ance from sonars can avoid

this fault.

3 good 0.25 7 49 36.00

4 bad 0.25 7 26 19.00 Reached target area but did

not stop. Late and getting

dark.

5 good 0.25 7 24 24.28

6 good 0.25 7 22 20.00

7 good 0.25 7 17 16.08

8 good 0.25 7 25 21.67

9 bad 0.25 7 24 18.00 Reached target area but did

not stop.

10 good 0.25 7 17 16.18

11 good 0.25 4 22 19.07 SURF threshold reduced for

darker conditions.

12 good 0.25 4 22 20.32

13 good 0.25 4 19 21.09

14 good 0.25 4 19 14.89 Finished a bit far.

15 good 0.25 4 35 26.26

16 good 0.25 4 21 22.57

17 good 0.25 4 37 22.35

18 good 0.25 4 20 20.83

19 bad 0.25 4 15 18.74 Walked towards white ob-

jects shell.

20 good 0.25 4 18 20.17 Obstacle avoidance imple-

mented.

Average 23.95 19.925

60


The interruption of navigation when the robot is facing a flat surface can be over-

come with obstacle avoidance. The implementation of obstacle avoidance behavior

was done at the end of the trials. Initially, it was not considered necessary for this

experimental setup, but the experiments proved it complementary to the proposed

visual-topological mapping algorithm.

Even though the number of trials increased, the proportion of the success rate

remained the same (See figure 5.9). Nevertheless, the confidence limits increased and

gave more validity to the success rate. A global discussion of the results from the

previous experiments is given in the final chapter.

Figure 5.9: Confidence limits for Lounge path - Confidence limits for a recognition

rate p=0.8 as a function of N=20 samples and level of confidence alpha=0.01.

61

Chapter 6

Discussion

The experiments conducted in this research project explored the robustness of the

proposed visual-topological mapping algorithm, for indoor navigation with the robot

Nao. The results suggest that the mapping algorithm is a promising approach that

can be expanded in to a full navigation system. Such system can be achieved with a

hybrid system that incorporates obstacle avoidance, exploration and dynamic motion

planning.

The visual-topological mapping algorithm can be applied in other settings than

the domestic one. This algorithm should work in any place where there are salient

visual features, and where the environment is not too uniform. The only places where

traditional approaches should outperform this method, are in constrained (indoor) en-

vironments where there is not enough texture in the surroundings, e.g. the walls may

be plain white and with scarce decoration. In such circumstances, it is feasible to

complement the visual-topological mapping with existing alternatives based on odom-

etry, proximity sensors (sonars or range finders providing 3D point clouds) or visual

detection of the environment geometry (detection of corridors).

During the RoboCup@Home championships, the RUG team will not only use the

proposed visual algorithm, but it will be merged with a more traditional approach using

odometry and a 3D point cloud sensor. This approach is an attempt to overcome the

possible difficulties described in the previous paragraph. Autonomous robots should

be able to navigate in diverse indoor environments, including scenarios with plain and

scarce decoration.

62

6.1 Future work

It is important to stress the robustness of this visual algorithm against changes in the

environment. It assumes a dynamic world, although smoothly changing. This means

that some objects in the environment may change and the robot is still able to estimate

its current location. At the moment of the writing of this thesis, the author did not find

another cases of visual algorithms coping with occlusion in scientific literature. On the

other hand, an abrupt change in the environment visual features leads to uncertainty

about the current location. Some possibilities to study the degree of robustness against

visual occlusion is given in the following subsection.

As in the case of any other research project, the adequacy of testing platforms

and locations is reflected in the time required to obtain experimental results. Dur-

ing this project, the availability of robots increased from one Nao, to five Naos, three

wheel based platforms and available hardware for parallel processing. This significant

improvement can be enhanced even further by a designated area with domestic facili-

ties. This experimental setup can allow a greater control in the environment (lighting,

furniture, people walking around) and guaranteed access to testing facilities all year

around.

6.1 Future work

Machine learning algorithms can be added in several steps of the visual-topological nav-

igation algorithm. Form exploring behaviors, improved localization and mapping and

automated discrimination between useful landmarks (a sofa, windows) and misleading

objects (a suitcase, a person), to a fully automated visual SLAM. After all, the long

term goal in the field of Autonomous Robotics is to achieve the autonomy found in

natural systems.

It is important to learn the location of useful regions for visual exploration. The

robot should learn where to search for landmarks. In different settings, it could be

more informative to search around the ceiling, the horizon or even on the floor.

The effects of visual occlusion in the proposed algorithm is an important feature

that could be studied further. In the case of scene sections considered as single objects,

which proportion of the field of view is required by the algorithm to successfully estimate

the robot’s location? Even further, what is the precise effect of the number of keypoints

in a scene when testing different degrees of occlusion?

63

6. DISCUSSION

A possible alternative to the use of full images as objects is automatic clustering.

At the moment of writing, research was being conducted about the use of salient visual

features which do not necessary correspond to a specific object. The robot builds visual

models using clustering algorithms on the image keypoints. There is no longer need for

tedious handcrafted images for training.

Preliminary results show that a dendrogram algorithm performs very well for au-

tomatic clustering. Since this bottom-up procedure creates many more models than

those being used in this paper, the robot is expected to be more robust against changes

in the environment. On the other hand, the increase in the number of models requires

more time or computational power for processing the incoming images.

In the next months, automated pruning algorithms will be added to the system to

reduce the amount of vectors searched in the nearest neighbor space. There is no need

to constantly estimate the robot’s precise location it an entire building, if the robot

already knows it is located in certain room or subsection. At the moment there is

no need for this procedure, but the necessity will arise when the robot starts moving

in other areas than in and around our robot laboratory. The pruning algorithm is

expected to keep the nearest neighbor vector space lean and real-time.

The nearest neighbor success rate should have a threshold of optimality, with respect

to the locations and landmarks densities in a given scenario. This is, The chosen

distance between trained locations in a vector field, and the spreading of the landmarks

it the room. Answering this optimization problem could greatly speed up the training

procedure.

6.2 Conclusion

The algorithm described in this text provides a promising alternative for navigation in

a natural and changing environment. It provided a method where a humanoid robot

is using a passive sensor to detect visual features. Such features allow the perception

of trained objects for localization and mapping. These characteristics were observed

through experimentation and successfully answered the initial research questions.

Is it possible to implement a visual-topological mapping algorithm with the robot

Nao, relying on its embedded motor, sensing and computing capabilities? Certainly

the nature of computer vision algorithms require a higher computing power than that

64

6.2 Conclusion

found in Nao robot. The initial attempt of navigating with low resolution images proved

insufficient to recognize objects a few meters far from the robot.

Is it possible to improve the mapping algorithm after customizing it to the NAO’s in-

corporated sensing devices, and with the aid of additional computational power? Making

an optimized use of the Nao’s sensing devises improved the performance of the algo-

rithm. Nevertheless, the main boosting came from the use of additional computational

power. The inclusion of external processors, mainly to handle the computer vision

modules, allowed a greater exploitation of the robot’s embodiment.

Is this algorithm providing sufficient capabilities to the robot Nao to perform SLAM

in an indoor location in a reasonable amount of time, relative to the RoboCup@Home

competition requirements? The main problem to fulfill the navigation requirements

in the @Home competition is the Nao’s embodiment. The robot can explore only a

limited proportion of space at a time. Therefore, it requires valuable time to “look

around” at every step in the navigation procedure. Additionally, the robot’s motion

was found to be too slow to traverse the required distances in the provided time at

the competitions. Nevertheless, the mapping algorithm has shown to be a satisfactory

approach that could be expanded in to a complete SLAM algorithm.

The current use of a wheel based platform (Pioneer) in the RUG robot team, has

overcome all of the above mentioned difficulties. The Pioneer platform can tackle the

@Home competition time constraints. The displacement inside a domestic environment

is fast, and permits the incorporation of multiple cameras. Such cameras can perceive

“at once” several regions around the robot. Finally, the Pioneer is capable of carrying

the greater computational power required for computer vision algorithms.

It is clear the large amount of work remaining to make navigation a reality in

commercially available domestic robots. For the particular case of visual-topological

navigation, the expectations are high. Machine learning techniques can be implemented

in several modules, eventually leading to a fully automated procedure.

The creation of a localization and mapping system based on biological and cognitive

principles, has shown to be a good starting point for a full navigation system. Artificial

intelligence certainly is a promising source of inspiration to tackle problems in complex

and dynamic environments.

65

6. DISCUSSION

66

References

[1] A. Angeli, S. Doncieux, J.A. Meyer, and D. Filliat. Incremental vision-based topological

SLAM. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International

Conference on, pages 1031–1036. IEEE, 2008. 5

[2] A. Angeli, S. Doncieux, J.A. Meyer, and D. Filliat. Real-time visual loop-closure detection.

In Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pages

1842–1847. IEEE, 2008. 5

[3] R.C. Arkin. Behavior-based robotics. The MIT Press, 1998. 4

[4] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF).

Computer Vision and Image Understanding, 110(3):346–359, 2008. ISSN 1077-3142. 7, 23,

32, 34

[5] M. Becker, C.M. Dantas, and W.P. Macedo. Obstacle Avoidance Procedure for Mobile

Robots. In ABCM Symposium Series in Mechatronics, volume 2, pages 250–257. Citeseer,

2006. 12

[6] O. Booij, B. Terwijn, Z. Zivkovic, and B. Krose. Navigation using an appearance based

topological map. In Robotics and Automation, 2007 IEEE International Conference on,

pages 3927–3932. IEEE, 2007. ISBN 1424406013. 6, 9

[7] H.M. Choset. Principles of robot motion: theory, algorithms, and implementation. MIT

press, 2005. ISBN 0262033275. 11

[8] RoboCup@Home Technical Commitee. RoboCup@Home - Rules & Regulations. RoboCup

Federation, 2010. 30

[9] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags

of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1,

page 22. Citeseer, 2004. 9

[10] A.J. Davison, I.D. Reid, N.D. Molton, and O. Stasse. MonoSLAM: Real-time single camera

SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1052–

1067, 2007. ISSN 0162-8828. 5

67

REFERENCES

[11] A. Delorme and S.J. Thorpe. Spikenet: an event-driven simulation package for modelling

large networks of spiking neurons. Network: Computation in Neural Systems, 14(4):613–

627, 2003. 7, 33

[12] M.J. Fenske, E. Aminoff, N. Gronau, and M. Bar. Top-down facilitation of visual object

recognition: object-based and context-based contributions. Progress in brain Research,

155:3–21, 2006. ISSN 0079-6123. 10

[13] D. Filliat. A visual bag of words method for interactive qualitative localization and map-

ping. In Robotics and Automation, 2007 IEEE International Conference on, pages 3921–

3926. IEEE, 2007. ISBN 1424406013. 10

[14] F. Fraundorfer, C. Engels, and D. Nister. Topological mapping, localization and navigation

using image collections. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ

International Conference on, pages 3872–3877. IEEE, 2007. 9

[15] C.R. Gallistel. The organization of learning. The MIT Press, 1990. ISBN 0262071134. 2

[16] T. Goedeme, M. Nuttin, T. Tuytelaars, and L. Van Gool. Omnidirectional vision based

topological navigation. International Journal of Computer Vision, 74(3):219–236, 2007.

ISSN 0920-5691. 9

[17] J.L. Gould and C.G. Gould. The honey bee. Scientific American Library, 1988. ISBN

0716750236. 2

[18] S. Grossberg and T.R. Huang. ARTSCENE: A neural system for natural scene classifica-

tion. Journal of Vision, 9(4), 2009. ISSN 1534-7362. 10

[19] G. Kootstra. Selection of Landmarks for Visual Landmark Navigation. Order, 501:2942,

2002. 5

[20] B. Krose, O. Booij, and Z. Zivkovic. A geometrically constrained image similarity mea-

sure for visual mapping, localization and navigation. In Proceedings of the 3rd European

Conference on Mobile Robots, pages 168–174, 2008. 10

[21] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International journal

of computer vision, 60(2):91–110, 2004. ISSN 0920-5691. 10

[22] U. Nehmzow. Animal and robot navigation. Robotics and Autonomous Systems, 15(1-2):

71–81, 1995. ISSN 0921-8890. 2, 3, 10

[23] A. Ranganathan, E. Menegatti, and F. Dellaert. Bayesian inference in the space of topo-

logical maps. Robotics, IEEE Transactions on, 22(1):92–107, 2006. ISSN 1552-3098. 5, 8,

9

68

REFERENCES

[24] F. Savelli and B. Kuipers. Loop-closing and planarity in topological map-building. In

Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ Inter-

national Conference on, volume 2, pages 1511–1517. IEEE, 2004. ISBN 0780384636. 8,

9

[25] H. Shatkay and L.P. Kaelbling. Learning topological maps with weak local odometric

information. In International Joint Conference on Artificial Intelligence, volume 15, pages

920–929. Citeseer, 1997. 8

[26] R. Simmons and S. Koenig. Probabilistic robot navigation in partially observable envi-

ronments. In International Joint Conference on Artificial Intelligence, volume 14, pages

1080–1087. Citeseer, 1995. 8

[27] R.C. Smith and P. Cheeseman. On the representation and estimation of spatial uncertainty.

The international journal of Robotics Research, 5(4):56, 1986. 6

[28] S.J. Thorpe, R. Guyonneau, N. Guilbaud, J.M. Allegraud, and R. VanRullen. Spikenet:

Real-time visual processing with one spike per neuron. Neurocomputing, 58:857–864, 2004.

32

[29] S. Thrun. Learning metric-topological maps for indoor mobile robot navigation. Artificial

Intelligence, 99(1):21–71, 1998. ISSN 0004-3702. 5

[30] S. Thrun. Robotic mapping: A survey. Exploring artificial intelligence in the new millen-

nium, pages 1–35, 2002. 2, 5, 8

[31] N. Tomatis, I. Nourbakhsh, and R. Siegwart. Hybrid simultaneous localization and map

building: a natural integration of topological and metric. Robotics and Autonomous sys-

tems, 44(1):3–14, 2003. ISSN 0921-8890. 8

[32] C. Valgren, T. Duckett, and A. Lilienthal. Incremental spectral clustering and its ap-

plication to topological mapping. In Robotics and Automation, 2007 IEEE International

Conference on, pages 4283–4288. IEEE, 2007. ISBN 1424406013. 9

[33] T.H. Waterman. Animal navigation. Scientific American Library series (USA), 1989. 2

[34] T. Wisspeintner, T. van der Zant, L. Iocchi, and S. Schiffer. Robocuphome: Scientific

competition and benchmarking for domestic service robots. Interaction Studies, 10(3):

392–426, 2009. iv, 6

[35] C. Zhou, Y. Wei, and T. Tan. Mobile robot self-localization based on global visual ap-

pearance features. In Robotics and Automation, 2003. Proceedings. ICRA’03. IEEE In-

ternational Conference on, volume 1, pages 1271–1276. IEEE, 2003. ISBN 0780377362. 5,

10

69

REFERENCES

[36] Z. Zivkovic, B. Bakker, and B. Krose. Hierarchical map building using visual landmarks

and geometric constraints. In Intelligent Robots and Systems, 2005.(IROS 2005). 2005

IEEE/RSJ International Conference on, pages 2480–2485. IEEE, 2005. ISBN 0780389123.

6, 9

70

visual-topological mapping: an approach for indoor robot ...davila/davila_2011_visual... ·...

Documents