active semantic mapping for a domestic service robot · abstract title: active semantic mapping for...
TRANSCRIPT
Active Semantic Mapping for a Domestic Service Robot
Miguel Oliveira da Silva
Thesis to obtain the Master of Science Degree in
Electrical and Computer Engineering
Supervisors: Prof. Rodrigo Martins de Matos VenturaProf. Pedro Manuel Urbano de Almeida Lima
Examination Committee
Chairperson: Prof. Joao Fernando Cardoso Silva SequeiraSupervisor: Prof. Rodrigo Martins de Matos Ventura
Members of the Committee: Prof. Francisco Antonio Chaves Saraiva de Melo
October 2018
I declare that this document is an original work of my own authorship and that it fulfills all the
requirements of the Code of Conduct and Good Practices of the Universidade de Lisboa.
i
Acknowledgments
I would like to thank my parents for their friendship, encouragement and caring over all these years,
for always being there for me and for teaching me that success is a result of hard work.
I would like to thank Cristiana for the support and for being always available to help me in this work
with her writing and communication skills.
I would also like to acknowledge my dissertation supervisors Prof. Pedro Lima and Prof. Rodrigo
Ventura and Tiago Veiga for their insight, support and sharing of knowledge that has made this Thesis
possible.
I would also like to thank the SocRob team, for sharing a lot of knowledge about several subjects
related to robotics.
Last but not least, to all my friends and colleagues that I have met in the last 5 years in the University
that helped me to arrive at this point.
Thank you all.
ii
Abstract
Title: Active Semantic Mapping for a Domestic Service Robot
Abstract: Domestic service robots need to deal with complex and dynamic environments. In order to
interact with them, robots must keep an up to date representation of relevant information. In this work,
an architecture to solve that problem is presented, considering the uncertainty associated with that rep-
resentation, given the stochastic and not fully observable characteristics of a domestic environment.
The architecture needs to generate a semantic map of the domestic environment, maintain it up to date
and making use of that. A solution to the agent’s problem of driving its behavior to keep an updated
probabilistic representation of the world state and using that information to carry out some tasks is pre-
sented. The architecture presented is composed by two parts: a Knowledge Representation Engine that
keeps a global belief about the world state and is responsible for generating and controlling the second
part, the Decision Maker that is responsible for the agent’s behavior. The Knowledge Representation
Engine uses ProbLog to have a probabilistic world representation and to take advantage of the inference
process to generate the Decision Maker model and the world state. The Decision Maker is composed
by a set of POMDPs, where each one is responsible for having a partial representation of the global
knowledge of the world and for making decisions, if required, in order to reduce the uncertainty about
the world state and eventually reach a specific goal. The decision making problem is divided into several
problems to reduce the state space of each POMDP and to bypass the problem of finding the optimal
policy on a large POMDP, given the poor scalability of existing solution algorithms.
Keywords
ProbLog; Semantic Mapping; POMDP; Decision Making; Knowledge Representation.
iii
Resumo
Tıtulo: Mapeamento Semantico Activo para um Robo de Servico Domestico
Resumo: Os robos de servico domestico precisam de lidar com ambientes complexos e dinamicos.
Para poderem interagir com eles, precisam de manter uma representacao atualizada da informacao
que lhes e relevante sobre esses mesmos ambientes. Neste trabalho e apresentada uma arquitetura
para solucionar esse problema, considerando a incerteza associada a essa representacao, dadas as
caracterısticas de um ambiente domestico. A arquitetura precisa de gerar um mapa semantico, mante-
lo atualizado e fazer uso do mesmo. Para isso, e apresentada uma solucao para o problema do agente
em decidir o seu comportamento de forma a manter uma representacao probabilıstica atualizada do
estado do mundo. A arquitetura e composta por duas partes: um mecanismo de representacao de
conhecimento, que mantem uma crenca global sobre o estado do mundo e um tomador de decisoes,
que e responsavel pelo comportamento do agente. O mecanismo de representacao de conhecimento
usa o ProbLog, para ter uma representacao probabilıstica do mundo e tira partido do seu processo de
inferencia para gerar o modelo do tomador de decisoes e o proprio estado do mundo. Por sua vez, o
tomador de decisoes e composto por um conjunto de POMDPs, onde cada um e responsavel por uma
representacao parcial do conhecimento global do mundo e por tomar decisoes, se necessario, a fim de
atingir o objetivo do sistema. O problema de tomada de decisao e dividido em varios subproblemas
para reduzir o espaco de estado de cada um e contornar o problema de encontrar a polıtica ideal em
POMDPs de grande dimensao, dada a baixa escalabilidade dos algoritmos existentes.
Palavras Chave
ProbLog; Mapeamento Semantico; POMDP; Tomada de decisao; Representacao de conhecimento.
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Document outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Theoretical Background 4
2.1 Semantic mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Logic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Probabilistic Logic Programming and ProbLog . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Decision Making Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Markov Decision Processes (MDPs) . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Partially Observable Markov Decision Processes (POMDPs) . . . . . . . . . . . . 12
2.4.2.A Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2.B Point-Based Value Iteration (PBVI) . . . . . . . . . . . . . . . . . . . . . . 15
2.4.3 POMDP with Information Rewards (POMDP-IR) . . . . . . . . . . . . . . . . . . . 16
3 Related Work 18
4 Proposed method 22
4.1 Architecture Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 Knowledge Representation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Decision Maker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2.A POMDP selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2.B Global knowledge representation update . . . . . . . . . . . . . . . . . . 26
4.2 Semantic Mapping Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Experimental Results 31
5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 World Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
v
5.3 POMDP-IR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Simulated Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.1.A Static Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.1.B Dynamic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4.1.C Carrying out a task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4.2.A Objects changing position . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4.2.B Incorrect observations robustness . . . . . . . . . . . . . . . . . . . . . . 42
5.4.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Real scenario experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Conclusion 51
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A P (Xk|Z) derivation 56
B World Model Files Examples 57
B.1 Furniture Model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
B.2 Objects Model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
C Experiment tables and figures 58
C.1 Scenario 1 - Static Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
C.2 Scenario 1 - Dynamic Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
C.3 Scenario 2 - Objects Changing Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
C.4 Scenario 2 - Wrong Observations Robustness . . . . . . . . . . . . . . . . . . . . . . . . 65
vi
List of Figures
2.1 Example of a 2D map of an indoor environment. Figure adapted from [1] . . . . . . . . . . 5
2.2 MDP diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 POMDP diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Example of a Value Function with two states. Figure adapted from [2] . . . . . . . . . . . 15
3.1 The layered structure of the spatial representation in [3], showing the different levels of
abstraction of the spatial knowledge. Figure adapted from [3] . . . . . . . . . . . . . . . . 20
4.1 Scheme of the architecture operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 POMDP model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 ISRoboNet@Home Testbed layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Progression of objects distributions entropy in a static model . . . . . . . . . . . . . . . . 36
5.3 Progression of objects distributions entropy in a dynamic model . . . . . . . . . . . . . . . 38
5.4 Progression of Hellinger distance until the robot reaches the goal for two different config-
urations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Progression of Hellinger distance, with modifications in the location of the objects . . . . . 41
5.6 Testbed and robot used for the real scenario experiments . . . . . . . . . . . . . . . . . . 46
5.7 Hellinger distance over the time for each object, in the real behavior experiment . . . . . . 47
C.1 Hellinger distance considering wrong observations for objects in configuration 1 . . . . . . 66
C.2 Hellinger distance considering wrong observations for objects in configuration 2 . . . . . . 67
vii
List of Tables
5.1 POMDP reward values and observation probabilities ranges for the different actions type 33
5.2 Testbed scenarios used in the experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Location of the objects in the static model . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Expected rewards for POMDP selection steps in a static model example . . . . . . . . . . 37
5.5 Location of the objects in the dynamic model . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.6 Expected rewards for POMDP selection steps in a dynamic model example . . . . . . . . 38
5.7 Location of the objects for experiments with the goal of moving the cocacola to close to
the pringles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.8 Expected rewards for POMDP selection steps to carrying out a task . . . . . . . . . . . . 39
5.9 Location of the objects for each step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.10 Comparison between living room actions, before(left) and after(right) pringles changing
location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.11 Location of the objects for experiments with wrong observations . . . . . . . . . . . . . . 43
5.12 Mean Hellinger distance for different multiples of the expected false and negatives rates . 43
5.13 Mean Hellinger distance for 100 steps for different object configurations in Scenario 1.
KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining
Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.14 Mean Hellinger distance for 100 steps for different object configurations in Scenario 2.
KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,
BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand . . . . . . . . . . . . . . . . 44
5.15 Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 1.
KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining
Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.16 Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 2.
KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,
BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand . . . . . . . . . . . . . . . . 45
viii
5.17 Location of the objects in the real scenario experiment . . . . . . . . . . . . . . . . . . . . 46
5.18 Scalability analysis of different POMDP models used in the Decision Maker . . . . . . . . 49
C.1 State variables distributions and actions for a static model of the environment . . . . . . . 59
C.2 State variables distributions in POMDP selection steps, for a dynamic model of the envi-
ronment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
C.3 State variables distributions in POMDP selection steps, with changes in the location of
the objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
C.4 Expected rewards for POMDP selection steps in a experiment with objects changing location 64
C.5 Probabilities of observing each object in Scenario 2 . . . . . . . . . . . . . . . . . . . . . 65
ix
List of Algorithms
2.1 Value iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
x
Acronyms
COARSE Cognitive lAyered Representation of Spatial knowledgE
DTPDDL Decision-Theoretic Planning Domain Definition Language
ERL European Robotics League
LHM Least Herbrand Model
LP Logic Program
MDP Markov Decision Process
OWL-DL Web Ontology Language - Description Logic
PBVI Point-Based Value Iteration
POMDP Partial Observable Markov Decision Process
POMDP-IR Partial Observable Markov Decision Process - Information Rewards
PTLplan Temporal-Logic Progressive Planner
PWLC Piecewise Linear and Convex
ROS Robot Operating System
SLAM Simultaneous Localization and Mapping
TBM Task Benchmark
xi
1Introduction
1.1 Motivation
During the last decades, the research in mobile and service-oriented robots has been growing and
many different algorithms have been developed, allowing a robot to operate in realistic environments.
There are a lot of algorithms for mapping and navigation that are almost ready to be implemented in do-
mestic service robots and that have really good performance. For example, the problem of Simultaneous
Localization and Mapping (SLAM) has been studied during the last 30 years and there is a progress in
this problem [4]. Nowadays, the SLAM problem can be considered solved but most of the solutions
do not allow the robot to understand and interpret the respective environment, providing only a floor
plan or a geometric map of the environment and localizing the robot on it. When the goal is to have
autonomous and intelligent robots, this kind of information is important but it is not enough. For Hu-
mans, if they are in an environment and they want to do a task planning, their first approach will be
to identify different regions, objects or the presence of other agents. For example, in a house it could
be trying to identify different rooms, furniture, objects and residents. In that way, a domestic service
robot that is always in contact with a domestic environment and that has the goal to service a human,
should perceive the environment the way a human does because it has to interact and communicate
in human-compatible ways. For that purpose, the robot needs a different skill that no geometrical map
can provide it. As motivation, one consider the Task Benchmark (TBM) Getting to know my home of
the European Robotics League (ERL)1. In this task, also known as TBM1, the main goal of the robot is
acquiring knowledge about the environment, organize it and use it for task planning. The robot needs
to identify some changes in the environment, such as objects and furniture location, detect if a door is
open or closed, or even detect the presence of an unknown object on the floor, considered as trash. Af-
1https://www.eu-robotics.net/robotics_league/upload/documents-2018/ERL_Consumer_10092018.pdf.Accessed 1 Oct 2018
1
terward, it needs to move the objects that change the location to the default position, using the acquired
knowledge.
1.2 Problem Formulation
The main goal of this thesis is to explore an efficient way for a domestic service robot can be able
to accurately represent its cognitive knowledge about the environment in a semantic map and keeping
it updated, in order to have a reduced uncertainty about the world state. Eventually, it can also use that
information to influence its behavior in order to carry out some tasks. For this propose, it is important
that the robot interacts with the surroundings, updating the knowledge about it, which will also help the
agent to perform such tasks. This kind of domestic environments are non-deterministic and may have
more agents interacting with it. Then, the robot is not able to get information about the full state of the
environment at each point in time. So, in order to the robot behave in that kind of environments, it needs
to make decisions under uncertainty. In this work it is assumed that:
• The domestic environment is composed by a number of rooms, objects and possible locations to
the objects (placements)
• Each object is always placed in one of the placements
• The robot has a perception capability to detect and recognize the objects of interest
Most of the times this kind of environments are complex, given the large number of rooms and
places and the dynamism. Then, having a system able to make decisions in a big environment that
keeps changing its layout and state, can be a difficult problem.
Summarizing, this work intends to answer the questions: How to create and keep updated a knowl-
edge representation of complex and dynamic environments? How to use that information to make
decisions and eventually carrying out some tasks?
1.3 Proposed Approach
In order to solve the problem presented in Chapter 1.2 it is important to consider a probabilistic
representation of the world, to be able to represent uncertainty. This approach has the goal of create
an architecture that is able to keep a global probabilistic representation of the world state but in order
to make decisions splits the problem of decision making into multiple subproblems, that are part of the
Decision Maker structure. Each one is responsible for partial representation of the global world and
make decisions inside in that subworld. In semantic mapping application each subworld corresponds
2
to each room. The decision maker mechanism used to make decisions in each subworld is a Partial
Observable Markov Decision Process (POMDP).
To coordinate and generate this Decision Maker structure, it is necessary to have a Knowledge
Representation Engine. It keeps an updated belief about the global world state, using probabilistic
programming language, and it uses that information to coordinate the Decision Maker structure. Using a
inference process under uncertainty, the Knowledge Representation Engine is also able to full generate
each POMDP model of the Decision Maker, given the world model that it receives as input.
1.4 Contributions
With this work, it is presented an architecture for knowledge representation and decision making for
systems that need to deal with uncertainty and complex environments. It is suggested an architecture
for application in semantic mapping problem, able to keep the probabilistic representation of objects
locations in a dynamic environment, deciding the agent behavior.
It is also presented the contribution to the project SocRob@Home of Institute for Systems and
Robotics from Instituto Superior Tecnico, using the framework developed to solve some problems pre-
sented in TBM1 of ERL, explained in Chapter 1.1.
1.5 Document outline
This thesis is organized as follows: Chapter 3 presents a short description of some related work
in semantic mapping and decision making problems. Chapter 2 introduces some of the concepts and
subjects that are used in the presented work, as well as the nomenclature and notation used throughout
this thesis. It is explained the principles of semantic mapping, probabilistic logic programming and
decision making under uncertainty. Chapter 4 presents the proposed approach to solve the problem
formulated in Chapter 1.2, explaining the architecture of the solution on the whole and how to adapt it for
the semantic mapping application. Chapter 5 demonstrates the architecture performance for simulated
experiments and a real scenario. Chapter 6 wraps up this work, discusses achievements and presents
suggestions for future work.
3
2Theoretical Background
2.1 Semantic mapping
There are many different ways and characteristics that can be used to describe the world and con-
struct a map. A really important kind of map is a geometrical representation of the environment. These
geometrical map methods are focused on representations of the spatial structure of the environment
and it has been an active area of research in robotics [5] and there are a lot of solutions for this problem.
According to the way that each geometrical mapping method uses to perceive the environment, it is
possible to classify it in metric maps, topological maps and topometric maps [1]. Metric maps describe
geometric characteristics of the environment, with a coordinates system representing the shape of the
objects and rooms but without having an interpretation of the shapes or classification (see Figure 2.1(a)).
Topological maps represent a map as a graph where the vertices represent the places, points or re-
gions and the edges are the connections or relation between them (see Figure 2.1(b)). Topometric
maps have the combination of metric and topological map, merging the advantages of both, the accu-
racy of metric maps with the scalability of topological maps (see Figure 2.1(c)). These kind of maps are
also known as hybrid metric-topological maps [6].
For a robot, these geometrical maps are important for navigation. They retain all those geometrical
features that the robot should be aware of, in order to avoid obstacles to find possible paths. However
they do not provide more qualitative type of information, such as the kind of obstacles and if they are fixed
or not. This type of qualitative information is important to perform simple tasks that require knowledge
about more complex features about the environment. These geometrical maps are navigation-oriented,
which means that all of them are useful only in the navigation context and fail to encode the semantics
of the environment, which is also a really important context. For example, in a domestic environment, a
robot needs to be endowed with the aptitude of understanding the different functionality of each room,
4
the difference between objects and walls and the relation between some objects with some rooms.
This kind of information is not provided in a geometrical map, and the solution for that is in semantic
mapping.
(a) Metric map (b) Topological map (c) Topometric map
Figure 2.1: Example of a 2D map of an indoor environment. Figure adapted from [1]
A semantic map has a qualitative description of the robot’s environment, allowing the robot to get an
augmented representation of what surrounds it, complementing the geometrical knowledge with seman-
tic knowledge from different sources. The word semantic, in the dictionary 1 is defined as, ”relating to
meaning in language or logic”, and for that reasoning it is expected that a semantic map represents the
meaning given by a qualitative description of what is mapped.
The semantic mapping contains assignments of some mapped features to classes, that represent
their meaning and characteristics. Furthermore, it is also possible to create a relation between these
classes and use the knowledge about them to give some reasoning skills to robots. In that way, the agent
has a qualitative description of the environment, that is closer with the human conception of the world
and a knowledge base used for reasoning. For example, a semantic map using a metric map augmented
with labels of objects and rooms that are of interest to the robot and which it should be aware, allows
it to accomplish tasks in a domestic environment like: ”Robot, bring me a cup”. A semantic map can
allow the robot to have a knowledge base to reasoning about the characteristics of a cup, where it can
be, and how to arrive there, of course, providing a reasoning engine to the robot. Basically, a semantic
map has the capacity to augment the navigations and task-planning skills and helps in the human-robot
interaction, since it provides a conception of the world close to the human being.
All of this semantic information, that a robot can get from the environment, grants robots the ability
to represent and reason its surrounding in a semantic map, and can also be organized and divided into
different modalities. The inference method used to reason about what is observed is crucial and there
is a lot of information that can be used from different sources, such as the geometry, general appear-1Online Oxford English Dictionary, en.oxforddictionaries.com. Accessed 14 Oct 2018
5
ance and shape of places, recognized objects, topology of the environment and human input. In many
methods, only single sources are used to infer some semantic information about a place, while some
other methods exploit multiple sources. There is also another important feature in semantic mapping
techniques which is the temporal coherence. It is useful due to the fact that the information acquired,
at a single point in time, is not enough to provide an evidence for reliable categorization of places, or
objects. The confidence degree is related to the time that information was acquired because most of the
environments are stochastic and dynamic.
Most of the times, a semantic map also has incorporated a topological map incorporated which can
retain both geometrical information of the places arrangement and conceptual information about them.
2.2 Logic programming
Logic programming is a method of expressing knowledge in a formal language and trying to solve
some problems running inference processes on that knowledge. The basic objects in logic programs
are variables, constants, functors and predicates [7, p. 40-41]. The variables are denoted by strings that
start with uppercase letters and the others are also denoted by strings but start with lower case letters.
A term is a variable, constant or a functor of arity n, that depends on n terms, i.e. f(t1, ..., tn). An
atom or atomic sentence is formed from a predicate of arity n that depends on n terms, i.e. p(t1, ..., tn).
A ground term is a term with no variables. A literal is an atom (positive literal) or a negated atom
(negative literal). A clause is a disjunction of literals and a unit clause is a clause with a single literal. A
definite clause is a disjunction of literals of which exactly one is positive and as the form h : −a1, . . . , anwhere h and the ai are atoms. A rule, also called normal clause, has the form: h : −l1, . . . , ln and is
a universally quantified expression form that means l1 ∧ . . . ∧ ln ⇒ h where l1, . . . , ln (the body of the
rule) are literals and h (the head of the rule) is an atom. A rule that does not have a body is a fact
and represent an unconditional truth. An important concept in logic programming is also the Herbrand
base [8, p. 351] that is the set of ground atoms, which can be constructed using the predicates, functors
and constants in the theory. Herbrand interpretations are subsets of Herbrand base.
A Herbrand interpretation can be considered as a model of a clause, (which corresponds to a world
that satisfies that clause) if for every substitution θ in the body and in the head of the clause the resulting
body is in the interpretation as well. A substitution θ is a finite set of pairs V1/t1, V2/t2, ..., Vn/tn,
where Vi are different variables and ti are the terms that will replace the respective variable. A Herbrand
interpretation is a model of a logic program if it is a model of all clauses in the theory.
For negation-free Logic Programs (LPs), or definite clause programs, the model-theoretic semantics
is given by the smallest Herbrand model, also known as Least Herbrand Model (LHM), and it is assured
that it exists and it is unique. The main goal of a LP system is to check if a given atom is true in the LHM.
6
2.3 Probabilistic Logic Programming and ProbLog
The introduction of probabilities in logic programming allows it to encode this inherent uncertainty that
are present in real-life situations. Probabilistic logic programs are logic programs in which some of the
facts are annotated with probabilities, supporting probabilistic inference and learning. In this Chapter, it
will be presented a probabilistic logic programming language called ProbLog.
A ProbLog program has a set of ground probabilistic facts and a set of rules and non probabilistic
facts [9] . The last one is the same as in logic programming. A ground probabilistic fact is a fact f
with no variables and probability p, and can be written as p::f. It is also possible to write an inten-
tional probabilistic fact, which is a syntactic sugar for compactly specifying an entire set of ground
probabilistic facts. In Example 2.1, the statement 0.5::male(V):-vertebrate(V) is an intentional prob-
abilistic fact and is a compact way to write the ground probabilistic fact 0.5 :: male(v1) : −vertebrate(v1)
and 0.5 :: male(v2) : −vertebrate(v2). ProbLog also allows the use of annotated disjunctions [10], like
the sentence 0.15::bird(V); 0.09::mammal(V); 0.5::fish(V) :− vertebrate(V), with the struc-
ture p1 :: h1 ; ... ; pn :: hn : − body.
Example 2.1.
vertebrate(v1).
vertebrate(v2).
0.5 :: male(V) : −vertebrate(V).
The different atoms in a ProbLog program can be divided into probabilist atoms and derived atoms.
The first ones are the atoms that appear in a ground probabilistic fact and the second ones are the
atoms that appear in the head of some rule in a logic program. It is also important to refer that all the
variables in the head of a rule should also appear in a positive literal in the body of the rule.
The ProbLog allows to make inference in probabilistic logic systems, and can be considered different
inference tasks [7] [9] :
• SUCC(q), where q is a ground query. The task computes the success probability of the query q
• MARG(Q|e), where Q is the set of ground atoms of interest (query atoms). The task is to compute
the marginal probability distribution of each query atom q ∈ Q given the evidence e.
• MAP (Q|e) task is to find the most likely truth-assignment q to the atoms in Q given the evidence e.
• MPE(U |e), where U is the set of all atoms in the Herbrand base that do not occur in e (unobserved
atoms). Thes task is to find the most likely world of all the unobserved atoms given the evidence.
7
2.4 Decision Making Under Uncertainty
An agent, like a robot and a Human, act based on observations taken from the environment and
there is a cycle between the agent and the world. Over time, the agent receives an observation of the
world, chooses an action through some decision-making process, applies that action on the world and
that action effects the world, which forms a cycle. For an intelligent agent, the decision-making process
of choosing an action has the goal to achieving some objectives over time, given the set of observations
and knowledge about the environment.
Most of the agents, and clearly the robots, need to deal with uncertainty during this cycle, due to
the fact that the environment is uncertain, or in other words, it is not fully-observable, non-deterministic
or both [8, p. 42-45]. An environment is considered as fully-observable if the agent sensors provide
it with information about the full state at each point in time, or in other words, if the agent has access
to all the relevant aspects about the environment to decide which action to take. For that reason, an
environment can be partially observable if part of the state is missing from the observation data, for
example, occlusion of a small object by a bigger one, or if observations are noisy and inaccurate due to
the sensors used. In a nondeterministic environment, the next state is not fully determined by the actual
state and by the agent’s action. For that reason, the actions are characterized by the possible outcomes
and if we are characterizing and quantifying these possible outcomes, using probabilities, we consider
a stochastic environment.
Thus, an agent dealing with an uncertain environment may never know for certain in what state it’s
in, considering uncertainty in perception, or where it will end up after doing a given action, considering
uncertainty in action effects. The first one is related with not fully-observable environments and the
second one, with the stochastic environments.
When an agent is dealing with uncertainty, it should be able to compare the plausibility of different
statements, even if it is not sure about them. For example, even if a robot is not sure about an object’s
color, it should be able to represent that the belief in a color is stronger, weaker or more equal than the
belief in another color. For that reason, the agent may represent the degree of belief in some statement
using some tool and the main one is the probability theory.
For an agent, like a domestic service robot, the decision problem of which action should the robot take
at each time is a sequential decision problem, in which the agent is not interested in a single decision.
The agent is interested in taking a series of decisions to solve a problem, as search and planning
problems, for example. An algorithm to make a sequential decision in stochastic environments under
the assumption that the model is known and the environment is fully observable, this is presented in
Chapter 2.4.1, with Markov Decision Process (MDP). Furthermore, Chapter 2.4.2, presents a process
where both types of uncertainty, in action effects and perception, are considered. This model is called
Partially Observable Markov Decision Process (POMDP).
8
2.4.1 Markov Decision Processes (MDPs)
Considering that the agent has perfect perception abilities about the environment, which means that
the state of the world is fully observable at any point in time, a Markov Decision Process (MDP) assumes
that there is uncertainty about the effects of the agent’s actions. In an MDP, at each time t, the agent
chooses the action at based on observing state st and receives a reward rt for taking that action in that
state.
An MDP can be described as a tuple 〈S,A, T,R〉 [2], where S is a set of states of the world, A is a
set of actions, T is the probabilistic state transition function and R is the reward function. These sets
can be considered finite or infinite, but in this Chapter, it will be discussed only the finite case.
The state transition function T (s, a, s′) represents the transition probability of ending up in state s′,
given that starts in state s and executes action a. It can also be written as Pr (s′|a, s). R(s, a) represents
the expected reward received for taking action a from state s. The reward function depends only on
the current state and action. In this model, it is also assumed that the transition depends just on the
previous state and on the action taken, not considering any state or action from the previous history of
earlier states and actions. An MDP can be represented as in Figure 2.2. The assumption associated
with this property is the Markov assumption - the state at time t only depends on the state and action
taken at time t− 1.
Figure 2.2: MDP diagram.
It is also important to define how the solution to this problem looks like because it is already known
that any fixed action sequence will not solve the problem. The uncertainty about action effects can make
the agent end up in a state different to the goal. For that reason, it is important to define a policy denoted
by π, whose result of π(s) is the action specified by the policy π for state s. A policy is a description of the
behavior of an agent, specifying what action the agent should take for any state that it might reach. And
Two kinds of policies are considered: stationary and nonstationary, [2]. A stationary policy considers
that the choice of an action depends only on the state, independently of the time step. A nonstationary
policy takes into account the time, and it is represented with a subscript t.
9
What is desired with a sequential decision process, is that the agent acts to get the best performance.
For an MDP, this performance is represented by an additive utility function of the long-term rewards.
For that reason, the quality of a policy is therefore measured by the corresponding expected utility, that
for MDPs is often referred as the value function Vπ. An optimal policy is a policy that yields the highest
value function and it is denoted by π∗. In order to find the optimal policy, it is important to define if there
is a finite horizon or an infinite horizon for decision making and for finding the optimal policy.
When dealing with a finite horizon, the agent should act to maximize a finite horizon of K steps,
maximizing the value function given by the sum of rewards of the next K steps, presented in Equa-
tion (2.1).
E
[K−1∑t=0
rt
](2.1)
In an infinite horizon the number of steps is unbounded and the sum of the rewards can become
infinite. One way to solve the problem of defining the value function in the infinite horizon case is using
a discounted model, with a discount factor γ between 0 and 1 and the value function of Equation (2.2).
E
[ ∞∑t=0
γtrt
](2.2)
For the value function of Equation (2.2), the rewards in the current time are worth more than rewards
in the future because they have more value to the agent. If γ is close to 0, rewards in the future are
considered insignificant and the closer to 1 the discount factor is, the more the effect future rewards
will have on current decision making. The discount factor ensures that the value function is finite if the
rewards are also finite.
In the finite horizon model, the optimal policy is typically nonstationary because with a finite horizon
the optimal action for a given state depends on time. For example, if the agent has a goal and it has
a short horizon, it must head directly for it, perhaps in the bigger horizon, the agent may act avoiding
more uncertainty in the actions’ result. The way that the agent chooses its actions when it has a long
journey ahead is generally different than when it decides which action to take in the last step. One can
use dynamic programming to evaluate the utility of a policy π for t steps. Thus, in the finite horizon model
the value function Vπ,t(s) is the expected utility from starting in state s and executing the policy π for t
steps, given by the recursive Equation (2.3).
Vπ,t(s) = R (s, πt (s)) + γ∑s′∈S
T (s, πt (s) , s′)Vπ,t−1 (s′) (2.3)
The step t = 1 is the last step and the respective value Vπ,1(s) = R (s, π1 (s)) is just the expected
reward for taking action of policy π1. For a generic step t, should be also added the discounted value
of the the remaining t− 1 steps, considering all the possible states s′ under the policy π and respective
10
likelihood T (s, πt (s) , s′).
In the infinite horizon model, the agent has always the same time remaining. For that reason makes
no sense to change action strategy depending on time, which is why the optimal policy is stationary and
the value function Vπ(s) is given by the unique simultaneous solution of the set of Equations (2.4).
Vπ(s) = R (s, π (s)) + γ∑s′∈S
T (s, π (s) , s′)Vπ (s′) for all s ∈ S (2.4)
This process of computing the value function from executing a policy is known as policy evaluation.
To find optimal policies for MDPs, it can be used several methods but in this Chapter, it will be
presented the value iteration method, because it will also serve as the basis for finding policies in
POMDPs in Chapter 2.4.2.
To get the optimal policy π∗ for the finite-horizon, it is only needed a complete sequence of optimal
value functions and π∗ is defined by Equation (2.5).
π∗t (s) = argmaxa
[R (s, a) + γ
∑s′∈S
T (s, a, s′)Vπ∗t−1,t−1 (s′)
](2.5)
Considering that Vπ∗t−1,t−1 is the optimal value function for the step t − 1 and that it is derived from
policy π∗t−1 and value function Vπ∗t−2,t−2, this is a recursive function until the last step t = 1 when the
optimal policy π∗1 is given by Equation (2.6).
π∗1(s) = argmaxa
R (s, a) (2.6)
In infinite horizon discounted models, computing the optimal stationary policy is independent of the
starting state. It can be proven [8, p. 654-656] that the value of an optimal policy satisfies the Bellman
Equation (2.7), given that the value iteration Algorithm 2.1 eventually converges to a unique set of
solutions of the Bellman equations for all s ∈ S.
Vπ∗(s) = maxa
[R (s, a) + γ
∑s′∈S
T (s, a, s′)Vπ∗ (s′)
](2.7)
The initialization of V0(s) may not be 0 if there is a guess of the optimal value function. In that
case, the guessed value is used in an attempt to speed up the convergence. But independently of the
initialization if |V0(s)| < ∞ value iteration can be proven to converge [8, p. 654-656]. The algorithm
terminates when the maximum difference between two successive value functions is less than some ε,
that can be chosen in order to define the policy loss. The policy loss is the most the agent can lose by
executing near-optimal policy extracted from V ′π∗ instead of the optimal policy.
11
Algorithm 2.1: Value iterationt←− 0V0(s)←− 0 for all s ∈ S
repeatt←− t+ 1forall s ∈ S do
Vt(s)←− maxa
[R (s, a) + γ
∑s′∈S T (s, a, s′)Vt−1 (s′)
]until |Vt(s)− Vt−1(s)| < ε for all s ∈ SV ′π∗(s)←− Vt(s)
Once V ′π∗ is obtained, the near-optimal policy can be easily extracted using Equation (2.8).
π∗(s) = argmaxa
[R (s, a) + γ
∑s′∈S
T (s, a, s′)Vπ∗ (s′)
](2.8)
2.4.2 Partially Observable Markov Decision Processes (POMDPs)
In the previous Chapter 2.4.1 the environment was considered fully observable and with that as-
sumption the agent always knows in which state it is in. But, most of the times, because of sensor
limitations or noise, the state might not be perfectly observable and for that reason the Partially Observ-
able Markov Decision Processes (POMDPs) take into account the state uncertainty. In POMDPs there
is also a probabilistic model of the chance to make a particular observation given the current state.
A POMDP can be described as a tuple 〈S,A, T,R,Ω, O〉 [2], where S,A, T and R are the same as
described for MDPs in Chapter 2.4.1, Ω is a finite set of observations that the agent can experience and
O is the observation function that gives a probability distribution over possible observations, given an
action and resulting state. So, O(s′, a, o) can be defined as the probability of making an observation
o, given that the agent took an action a and end up in state s′, then is Pr (o|s′, a). A POMDP can be
represented in a diagram, has presented in Figure 2.3.
When considering optimal decision making in POMDP, a direct mapping of observations to actions
is not sufficient. The agent should have a memory about its past history, so it can choose actions
successfully in partially observable environments. For that reason, the agent can keep an internal belief
state b, that summarizes all information about its past. The belief b that will be used is a probability
distribution over all the states of the set S because it is a sufficient statistic of the history, which means
that extra data about its past actions or observations would not supply any further information about the
current state [11, p. 392]. The agent is responsible for updating this belief based on the previous belief
state, the last action and the current observation. Considering b(s′) as the probability of the belief state
b assigned to the state s′. It is possible to compute boa(s′), that represents the new degree of belief after
12
Figure 2.3: POMDP diagram
doing action a and get observation o, in state s′, by Equation (2.9).
boa(s′) = Pr(s′|o, a, b) =O(s′, a, o)
∑s∈S T (s, a, s′) b(s)
Pr(o|a, b)(2.9)
The complete derivation of Equation (2.9) can be founded in [2, p. 107]. After computing Equa-
tion (2.9) for all s ∈ S, it is possible to obtain the new belief state boa. This process can be labeled as the
update belief function UB(b, a, o) and has the new belief state boa as its output.
A POMDP can be considered as an MDP in which the states are belief states, called belief-state MDP.
The set of belief states of this kind of MDP, can be considered as B and it comprises the state space.
The set of actions A remains the same and the state transition function τ(b, a, b′) is now defined as
Equation (2.10).
τ(b, a, b′) = Pr(b′|a, b)
=∑o∈Ω
Pr(b′|a, b, o) Pr(o|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
Pr(o|s′, a, b) Pr(s′|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
Pr(o|s′, a, b)∑s∈S
Pr(s′|a, b, s) Pr(s|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
O(s′, a, o)∑s∈S
T (s, a, s′)b(s)
(2.10)
Where Pr(b′|a, b, o) is equal to 1 if b′ = boa and 0 otherwise. The reward function for belief states can
13
be written as ρ(b, a) and is given by Equation (2.11).
ρ(b, a) =∑s∈S
b(s)R(s, a) (2.11)
The belief-state MDP has a continuous belief space since it is the space of all distributions over the
finite state space and for that reason solving a belief-state is challenging. But if it is possible to get the
optimal policy π∗(b) for it, it can be shown that the policy is also the optimal one for the original POMDP.
The problem is that the method to solve MDPs presented in Chapter 2.4.1 is not directly applicable
to this belief-state MDP given its continuity over the belief state. A possible solution to the problem is
presented in Chapter 2.4.2.B.
2.4.2.A Value Function
The quality of a policy π(b) in belief-state MDP is measured by the value function V π(b), similarly to
what is done for MDPs. The main goal is to maximize the expected rewards for each belief, following the
optimal policy π∗ that is defined by the optimal value function V ∗. The optimal value function satisfies
the Bellman equation V ∗ = HV ∗:
V ∗(b) = maxa
[ρ (b, a) + γ
∑b′∈B
τ (b, a, b′)V ∗ (b′)
]
= maxa
[ρ (b, a) + γ
∑o∈Ω
p (o|a, b)V ∗ (boa)
].
(2.12)
It has been proved that the value function V (b) presents a particular structure, given the geometric
characteristics of its form. The value function for finite-horizon POMDPs are Piecewise Linear and
Convex (PWLC) and it can be represented by a set of piecewise linear functions over the belief space:
Vt =αit
, with i = 1, . . . , |Vt| , (2.13)
where αit is a vector, with dimension equal to the number of states. It represents a hyperplane and
it defines the value function over a bounded region of the belief. Each α-vector is associated with an
action. Then, Vt can be defined as the inner product presented in Equation (2.14).
Vt(b) = maxαi
t∈Vt
αit · b (2.14)
Given these characteristics of the value function Vt, the belief space can be divided into regions. The
regions are defined by the upper surface of the α-vectors because the maximizing vector dominates the
set of vectors for that particular region, given the goal of maximizing the Value function.
14
Figure 2.4 is an example of a value function for a two-state problem, represented as a set of α-
vectors.
Figure 2.4: Example of a Value Function with two states. Figure adapted from [2]
2.4.2.B Point-Based Value Iteration (PBVI)
The limited scalability of value iteration algorithms to solve POMDPs is motivated by the dimension of
the problem and leads to several approximations to POMDP solving. In a problem with n states, POMDP
planners must reason about belief states in a continuous space with dimension n-1. For that purpose,
discretize the belief space and selecting a small set of representative belief points B is the proposed
approach of Point-Based Value Iteration (PBVI) algorithm, presented in [12].
Point-based methods, using the approximations presented, can derive Equation (2.15) to compute
the value function at each particular belief b.
Vt+1(b) = maxa
[b · αa0 + γb ·
∑o∈Ω
arg maxgia,oi
b · gia,o
]= max
gba
b · gba,(2.15)
where,
gia,o(s) =∑s′
Pr(o|s′, a)Pr(s′|s, a)αit(s′) (2.16)
and
gba = αa0 + γ∑o∈Ω
arg maxgia,oi
b · gia,o (2.17)
The backup operator which selects the maximizing vector for the belief b becomes:
backup(b) = arg maxgbaa∈A
b · gba (2.18)
15
The value function, at each step, is the union of all the vectors resulting from previous backup of all
the belief points in the set B.
There are several PBVI algorithms in the literature. In [13] one, called Perseus is presented. This
randomized PBVI algorithm performs approximate value backup stages, ensuring that in each backup
stage the value of each point in the belief set is improved, but with the important characteristic that a
single backup may improve the value of not just the respective belief point. Perseus backs up only a
(randomly selected) subset of points in the belief set that is sufficient for improving the value of every
belief point in B.
2.4.3 POMDP with Information Rewards (POMDP-IR)
In an active perception task, the goal is typically to increase the available information by reducing
the uncertainty regarding the state. It means that the agent, considering the effects of its actions, must
decide what actions it should take to efficiently reduce the uncertainty about the state variables. A
typical POMDP is a possible decision-theoretic model for active perception. However, usually reducing
the uncertainty about the state it is not expressed as the goal but is the consequence in order to achieve
it. For example, if the goal is to pick an object, the agent may take actions that reduce its uncertainty
about the object’s location. However, rewarding an agent for reaching a certain level of belief may not be
easy to be done in these typical POMDP models. For that purpose in [14] a Partial Observable Markov
Decision Process - Information Rewards (POMDP-IR) is presented. In POMDP-IR, a reward information
gain is given, keeping the characteristic of a classical POMDP, having value functions PWLC.
POMDP-IR introduces the addition of a new set of “information-reward” actions (prediction actions)
to the problem definition. Considering that the state space can be factored as presented in Equa-
tion (2.19).
S = X1 ×X2 × ...×Xk × ...×XK (2.19)
At each time step, the agent simultaneously chooses a normal action an and a prediction action ak for
each particular state variable Xk that the agent wants to have low uncertainty. Prediction actions have
an action space AK = commit, null and they have no effect on states or observations, but may affect
rewards. The reward function in the POMDP-IR is equal to the sum of the original reward function R of
the POMDP and a reward Rk for each Xk, given by Equation (2.20).
Rk (b, ak) =
P (Xk = xk) · rcorrecti −(1− P (Xk = xk)
)· rincorrecti if ak = commit
0 if ak = null(2.20)
At every time step, the agent can choose to either execute only a normal action, choosing ak = null,
or in addition also receive a reward for its belief over Xk, choosing ak = commit. Thus, the expected
16
reward of choosing commit is only higher than the null action when Rk(b, ak) > 0, which implies
P (Xk = xk) >rincorrecti
rcorrecti + rincorrecti
. (2.21)
If rewarding the agent for having a degree of belief, P (Xk = xk), of at least β, is desired, then it is
important to set the relation between rcorrecti and rincorrecti in order to the expected reward of choosing
commit being higher than the null action, when P (Xk = xk) > β. The precise values of rcorrecti and
rincorrecti depend on the model and the original reward function R.
17
3Related Work
Some work has been developed in the research area in order to find a way for a robot to get better
conception of the environment that surrounds it. This conception is not only related with the geomet-
rical characteristics of the environment, but also related with the semantic information. The semantic
information is related to some cognitive interpretation capacities that the human has and that with the
semantic mapping methods has been applyed to the robots. The work in [1], presents an overview
about what has been done in semantic mapping, for different types of environments and different type of
applications, as it is explained in more detail in Chapter 2.1. This work will focus on semantic mapping
for domestic indoor environments. In [15], the authors present a layered model of the world at different
levels of abstraction, metric line map, navigation graph, topological map and conceptual map. The lower
levels are derived from sensor input and are used to robot localization and navigation, and the higher
levels provide a human-like categorization of the world. The metric map is obtained by SLAM. The nav-
igation graph establishes a model of free space and its connectivity, adding some semantic information
on this level, storing the objects detected and using label history, assigning the navigation nodes to one
of the classes: room, corridor, or doorway. The topological map divides the nodes in the navigation
graph into groups that are separated by a doorway node. In the last level, there is a conceptual map
and conceptual knowledge is encoded in Web Ontology Language - Description Logic (OWL-DL). With a
description-logic based reasoning software, based on the knowledge representation it is possible to infer
new knowledge about the world that is neither perceived nor given verbally. However, this work does
not provide decision making capabilities needed when performing tasks, given the knowledge acquired
about the environment.
In [16] another approach for semantic mapping representation and the way to use that information
in the performance of navigation tasks is introduced. This approach uses two parallel hierarchical rep-
resentations of the space, a spatial representation and a semantic one. The first one is related to the
18
sensor-based representations of the environment and the second one has the symbolic representation
of the space. The link between both uses the concept of anchoring. In each of the representations the
hierarchy is related to the level of detail of the information and the level of abstraction is bigger in higher
levels. Making use of the anchoring connection between both representations, two kinds of inference
were developed. Based on recognized objects the inference system is able to classify, semantically, the
room where the object was recognized and based on semantic information about rooms the inference
process deducts the probable location of a non-previously seen object. The authors validate their ap-
proach, testing the learned model by executing navigation commands. Some of the authors of [16], in
the article [6], using the semantic map representation explained above, present a task planning process
using a Temporal-Logic Progressive Planner (PTLplan), that is able to deal with partial observability and
uncertainty, however the knowledge representation system is Loom, which only supports declarative
knowledge, not allowing probabilistic annotations to the facts and probabilistic inference.
In [17], the authors propose a formalization and a standardization in the representation of seman-
tic maps and they make a proposal for evaluation and benchmarking semantic mapping methods. A
”formalization of a minimal general structure of the representation that should be implemented in a se-
mantic map” is proposed, where the representation is defined by a global reference system, a set of
geometrical elements obtained as raw sensor data and a set of predicates that provide an abstraction
of geometrical elements. Based on the idea that a ground truth for semantic maps exists, building a
dataset to be shared by the scientific community is proposed and that allows a fair comparison between
different semantic mapping methods.
The approach in [18] presents a 3D semantic mapping technique that uses the point cloud consisting
of multiple 3D scans, obtained by 6D SLAM, to do scene interpretation and labeling of the basic elements
in the scene, as for example walls, ground, doors and others. Afterwards, data is transformed into 2D
images that are used to detect and localize objects and after the object localization is transformed back
into the 3D data. For interpreting planes in the scene, a constraint solver in Prolog was used, but there
were no more inference methods used.
In [3] a spatial knowledge representation is presented, called by the authors Cognitive lAyered Rep-
resentation of Spatial knowledgE (COARSE). It is based on layered representation with different levels
of abstraction and it is designed for representing complex, cross-modal, spatial knowledge considering
the uncertain and dynamics of the space, as presented in Figure 3.1. This representation is the main
principle of the work presented in [19], where it is assumed that knowledge should be abstracted to
keep the representations compact, allowing the robot to infer additional knowledge about the environ-
ment based on combining background knowledge with observations. In order to characterize the space
in a higher level of abstraction, the system assigns properties to places, such as objects, shape, size
and appearance.
19
Figure 3.1: The layered structure of the spatial representation in [3], showing the different levels of abstraction ofthe spatial knowledge. Figure adapted from [3]
To represent the conceptual map, a probabilistic chain graph model is used and the structure is
adapted at runtime according to the state of the topological map. In order to perform inference this model
is first converted into a factor graph representation and afterwards an approximate inference engine
is applied, Loopy Belief Propagation to consider time constraints. However this work only supports
inference of unexplored concepts, such as objects or rooms, and it lacks in inference about explored
concepts. The characterization of explored concepts can suffer modifications given the fact that the
environment is stochastic. The inference process also allows, for goal-oriented exploration, to use a
distribution of possible extensions to the known world.
In [20] the authors, propose a representation of the semantic map, which they refer to as SOM+
(semantic object maps), using a symbolic knowledge in description logic, having a spatiotemporal rep-
resentation of object poses. It is also associated Prolog predicates for inference process. The SOM+ is
an abstract representation of the environment that contains facts about objects and links objects to data
structures such as appearance models or other features used by the perception system to recognize
the objects. The work was developed with the objective of making the robot able to interact with a small
environment, more specifically, a kitchen.
The authors in [21] present a system that allows acquiring new objects in the representation through
a continuous human-robot interaction. At the beginning, the robot is guided by a user in a recognition
tour that allows an initial construction of the semantic map but the robot is also able to acquire additional
20
knowledge about the environment after the initial set-up, through a multi-modal human-robot interaction.
The behavior of the robot to interact with the humans and to collect information to update the semantic
map is implemented using Petri Net Plans. Prolog is also used to store information about the topological
graph of the environment and for each object is created predicates with information about object’s type,
localization, position and properties in order to perform inference on it.
In [22], probabilistic conceptual maps and probabilistic planning have been also combined in object
search tasks, where the conceptual map is represented as the higher layer of the hierarchical knowl-
edge representation in [3]. In order to do planning, a switching continual planner was presented which
switches between Decision-Theoretic Planning Domain Definition Language (DTPDDL) and classical
modes of planning at different levels of abstraction.
The most similar work, with what is proposed in this master thesis, is the work in [23], where the prob-
abilistic representation of the semantic map is based on probabilistic programming language ProbLog.
However, probabilistic inference tasks were used to infer a query given an evidence, inferring the prob-
ability of an object to be in such a place given a statement that expresses the probability of observing
an object in that place and an evidence (observation) confirming it. This work [23] not only presents a
probabilistic knowledge representation, but also a framework for planning under uncertainty, that was
a POMDP, computing approximate solutions in order to manage the scalability problems of POMDPs.
The decision maker also takes into account phenomena that may affect the perception algorithm, as an
error in vision algorithms and possible occlusions. In this work, a POMDP with Information Rewards
(POMDP-IR) [14] is used. This framework intends to reward the agent for reaching a certain level of
belief regarding a state feature. Because, if more certain information about the state improves task
performance, it is important to increase the available information by reducing the uncertainty regarding
the state. On this paper, the work was developed to active cooperative perception for fusion of sensory
information with the goal of maximizing the amount and quality of perceptual information available to the
system.
In [24] a solution for POMDPs is also presented when the problem of having an explicit measure
of the agent’s knowledge about the system, based on the beliefs instead of states, is incorporated in
the performance criterion. For that reason, the defining rewards are based on the acquired knowledge
represented by belief states. This framework is called ρPOMDP. If the reward function for beliefs ρ
preserves the convexity, the convexity of the concerned belief-based value function is proved. If ρ is
PWLC and the initial value function is equal to 0, then the belief-based value function is also PWLC and
it is easy to adapt POMDP algorithms to solve ρPOMDPs.
21
4Proposed method
4.1 Architecture Description
As presented in Chapter 1.3, the proposed approach for the problem is to create a probabilistic
knowledge representation of the world, that is able to provide enough information to the agent to take
decisions and keeping it updated. For that reason, the architecture that was developed can be divided
into two main parts and has the structure presented in Figure 4.1. The first part, designated as Knowl-
edge Representation Engine receives the world model as an input and is responsible for the operation
of all the architecture, as explained in detail in Chapter 4.1.1. This part is also responsible for the full
generation of the second part (Decision Maker). The Decision Maker part is composed of a set of
POMDPs, where each one is responsible for having a partial representation of the global knowledge of
the world. If selected, the POMDP takes the role of deciding which actions the agent should take. For
semantic mapping in a domestic environment, as proposed, a model that makes sense to use is to have
a Decision Maker with a POMDP for each room. If the world model is a house with N rooms, the Decision
Maker will have N POMDPs. The Decision Maker is also explained in more detail in Chapter 4.1.2.
Given the fact that the Decision Maker has multiple POMDPs, it is also necessary to choose which
one takes the role of driving the behavior of the agent, at each moment. The Knowledge Representation
Engine is also responsible for taking that decision and for that purpose, before taking the decision, it
analyses the Value function of each POMDP, given the current belief state. How to make this choice is
also explained in more detail, in Chapter 4.1.2.A.
Summing up, the architecture needs to be initialized and for that the Knowledge Representation En-
gine needs to generate a global belief about the world state in ProbLog and the different POMDPs,
given the world model provided. Then, using that initial global belief, it analyses the Value Function of
each POMDP created, to choose which one should drive the agent’s behavior. The chosen POMDP
22
will keep driving the agent’s behavior, updating the internal belief of the POMDP, given the set of pairs
action-observation. This internal belief also keeps updating the global belief in Knowledge Representa-
tion Engine, as explained in Chapter 4.1.2.B. When the POMDP starts to take the action to do nothing,
it means that the agent as already accomplished the goal and it stops being the POMDP taking care
of the agent’s behavior. At this point, it will return to the Knowledge Representation Engine the final
POMDP internal belief, updating the global world representation. Given the new updates in the global
world representation, the Knowledge Representation Engine decides again which POMDP should be
chosen, repeating the cycle.
ProblogSWorld Model
POMDP 2S2
POMDP 1S1
POMDP NSN
...
action
observation
action
observation
action
observation
Knowledge RepresentationEngine
Decision maker
b1
b2
bn
Figure 4.1: Scheme of the architecture operation
4.1.1 Knowledge Representation Engine
The Knowledge Representation Engine, as it was explained before, is mainly responsible for the
architecture operation. It has the global world representation and chooses which POMDP should drive
the agent behavior, based on the current global belief. For that purpose, it starts by receiving an initial
world model, that in a semantic mapping context can be considered as a list of objects, furniture and
rooms with their characteristics, such as position, volume, size and others. That information is used to
create a set of facts in ProbLog. The Knowledge Representation Engine is also responsible for having a
representation of the interactions and relations between the world model components, considering the
uncertainties that are present in real-life models. In the semantic mapping context, it is necessary to
define the relationship between different objects, objects and furniture, furniture and rooms, etc. This
is possible to be done, defining a set of rules and probabilistic facts in ProbLog, which specifies the
23
behavior guidelines, and then taking advantage of the inference process of ProbLog. That information
will be useful to make some inference about the world state and to generate the POMDPs. The global
world representation presented in the Knowledge Representation Engine can be called global belief b,
representing the probability distribution over the set of possible world states S.
As referred before, the Knowledge Representation Engine is initially responsible for the full genera-
tion of different POMDPs, dividing the global world representation into subworlds, based on the criterion
of division defined a priori by the model. Each tuple 〈Sn,An, Tn, Rn,Ωn, On〉, that defines the POMDP n,
as explained in Chapter 2.4.2, is completely defined by the Knowledge Representation Engine. The goal
of making this division is to simplify the global world representation in a set of smaller worlds, in order
to be easy to make decisions in each one. For that reason, the dimension of Sn for each POMDP is
smaller than S, that considers all the possible world states.
The Knowledge Representation Engine considers that, at each time, a state S can be defined as
the joint discrete probability distribution of a set X = X1, X2, . . . , XK of independent discrete random
variables. Each state S in S can be defined as:
S = X1 ×X2 × . . .×Xk . . .×XK (4.1)
Each variable Xk is denominated as state variable and it has a set of possible outcomes Dk, that
corresponds to its domain. Then, the dimension of the world state |S| is equal to the combinations of
the domain Dk of each state variable Xk,
|S| =K∏k=1
Dk. (4.2)
Each POMDP n of the Decision Maker has a set of states Sn. Each state S′ in Sn is defined as the joint
discrete probability distribution of a set Xn of independent discrete random variables. It is important to
notice that,
|Xn| ≤ |X | (4.3)
and that each variable X′
k ∈ Xn has a match with the variable Xk ∈ X , because they are representing
the same feature in the world model. However, they are not the same because they have different
domains. The domain of X ′k is D′k and it is adapted to the subworld of the respective POMDP. It should
be noted that there is an important characteristic of the relation between Xk and X ′k domains, given by
Equation (4.4), because the subworld in each POMDP is restricted, compared with the global world.
|D′k| ≤ |Dk|, (4.4)
The conditions presented in Equations (4.3) and (4.4) are the reason for the dimension of Sn be smaller
24
than S.
To construct the domain of the variables D′k in a POMDP, the domain values that are not available on
that specific subworld need to keep represented, because those values are still valid in the global world
representation. For that reason, all those values can be aggregated in a single value that for example,
can be called as none. It keeps representing those values, but not discriminating each one individually,
minimizing the number of POMDP states, as desired. Every time that the Knowledge Representation
Engine needs to calculate the belief bn of each POMDP n, it needs to generate the new probability
distribution for each state variable X′
k of each POMDP n. For that purpose, it is considered a function
fk,n for each state variable X′
k of each POMDP, that associates each element of the domain Dk to a
single element of the domain D′k,
fk,n : Dk → D′
k. (4.5)
If Dk = D′k, fk,n is an endofunction, however, the most common case is to have D′k ⊆ Dk ∪ none
where none represents the set of elements Dk \D′k and then
P (X′
k = x′) =
∑
x∈Dk\D′k
P (Xk = x) , if x′ = none
P (Xk = x′) , otherwise(4.6)
4.1.2 Decision Maker
The Decision Maker designed needs to deal with uncertainty in different aspects, such as observa-
tions and action results. As was explained in Chapter 2.4.2, a POMDP is able to consider that uncertainty
and make decisions on those conditions, however, finding the optimal policy on large POMDPs is limited
by the poor scalability of existing solution algorithms and the large state spaces is one important source
of intractability. This problem can be minimized, dividing the decision making task into several POMDPs,
where which one is responsible for taking limited decisions, given the limitation of the possible states,
actions and observations of the subworld that it represents. However, all of them, together with the
Knowledge Representation Engine, can be able to get an agent behavior close to the one given by the
optimal policy, obtained from a single POMDP representing the global world model. For that purpose,
at each moment, it is important to have an engine able to select the POMDP that makes more sense
to guide the agent’s behavior, given the current global belief b and the agent’s goal, as explained in
Chapter 4.1.2.A.
When initialized, each POMDP n of the Decision Maker also needs to be solved, using a POMDP
solver to compute the optimal policy π∗ that maps all possible beliefs bn, in the belief space B, to an
action a in the set An of the possible actions that the robot can perform in that subworld. The POMDP
solver computes an approximation of the optimal policy π∗, that is the one that maximizes the agent’s
expected total reward, given by the value function V (b, π).
25
4.1.2.A POMDP selection
The POMDP selection needs to be done by the Knowledge Representation Engine, taking into ac-
count the current global belief b, that provides the information about the distribution over the possible
world states. So, it starts by computing each POMDP belief bn, as explained in Chapter 4.1.1. Then,
those initial beliefs can be used to calculate the expected total reward of each POMDP n, using the
value function Vn(bn, π∗) that was calculated previously. Therefore, comparing the expected total re-
ward of each POMDP, it is possible to use different selection criteria to choose the POMDP that should
conduct the agent’s behavior. Those different criteria are related to the main goal of the agent.
The different POMDP value functions can be compared, because the model used by the Knowledge
Representation Engine to generate them is the same. In other words, the rewards values and the
observation and transition probabilities are similar and the differences are just related with the specific
characteristics of the subworld that the POMDP represents and those characteristics are supposed to be
reflected in the Value function. The fact that the POMDP states are not the same is also a differentiating
factor and influences the value of the value functions as desired, in order to characterize the POMDP.
4.1.2.B Global knowledge representation update
Each time that a POMDP is selected, it guides the agent’s behavior, updating the internal belief bn,
with the collected information. At the same time, the updated internal belief of the POMDP also updates
the global belief b of the Knowledge Representation Engine.
Both beliefs b and bn are defined as the joint probability distribution of the sets X and Xn of inde-
pendent discrete random variables, respectively. Remembering the independence between the state
variables in Xn and in X , updating the belief b with the new belief bn is the same as updating the prob-
ability distribution of each variable Xk given the probability distribution of the respective X ′k and then
calculating the joint probability distribution of the variables updated in the set X . The probability distri-
bution of the state variables Xk remains the same when there is no correspondent X ′k in the set Xn. On
the other hand, the probability distributions of the remain variables Xk is updated individually, taking into
account the probability distribution P(X′
k|Z)
that come up from the POMDP n. This probability repre-
sents the distribution of the variable X′
k, given the observations Z that the agent collected. Considering
P (Xk) as the prior probability distribution of Xk, the subsequent probability P (Xk|Z) is given by the
Equation (4.7), where fk,n is the function that associates each element in Dk to an element in D′
k, for
the POMDP n. The complete derivation of Equation (4.7) is presented in Appendix A.
P (Xk|Z) =P (Xk)∑
x∈Dk
P(X′k|Xk = x
)P (Xk = x)
P(X′
k|Z), with X
′
k = fk,n (Xk) (4.7)
26
Considering that P(X′
k|Xk
)is given by Equation (4.8), Equation (4.7) can also be written as in (4.9).
P(X′
k|Xk
)=
1 , if X
′
k = fk,n (Xk)
0 , otherwise(4.8)
P (Xk|Z) =
P(X′
k|Z)
, if Xk ∈ Dk ∩D′
k
P (Xk)∑x∈Dk\D
′k
P (Xk = x)P(X′
k|Z)
, otherwise (4.9)
4.2 Semantic Mapping Application
The architecture designed can be applied in different contexts. The semantic mapping in a domestic
environment application is the main motivation of the work presented. In semantic mapping, it is possible
to consider a house configuration as the world model, with all its rooms, furniture, objects and their
respective characteristics and relations. In a domestic environment, using this architecture it is possible
to have a semantic map, representing the probability distribution of any objects being placed over the
considered placements (furniture where an object can be placed) or being located over the possible
rooms. Knowing that at each moment the position of each object does not depend on the position of
the remaining objects and robot, the set of state variables Xk, can be the robot and the location of the
objects, and the domain Dk can be possible robot locations and possible places where the objects can
be located, respectively. Using this world model of the house, the Knowledge Representation Engine
can generate a POMDP for each room, that represents that subworld. In each POMDP, the possible
states correspond to the different possible combinations of the location of the objects and robot location,
inside that room. The state variables X′
k are robot and objects location, considering of course that the
domain D′
k is a smaller set than Dk, because of the restrictions in the placements and robot locations.
D′
k can also take the value none in order to represent the possibility of the robot or object being located
in another room, where applicable.
The Knowledge Representation Engine also needs to define the possible actions and observations
of each POMDP. In an architecture where the goal is to generate and keep updated a semantic map
of the environment, it makes sense to have actions for moving the robot, search for objects and an
extra action for doing nothing. This action is chosen by the optimal policy when there is no more big
value to explore that room. Then, the robot behavior should stop being guided by that POMDP and the
POMDP evaluation and selection process should be repeated using the new information collected. The
observation function presented in each POMDP can be represented as the possibility of observing an
object or not, based on the object characteristics. In the semantic mapping context, the main goal of the
agent is to reduce the uncertainty about objects’ location. In order to represent this goal in the POMDP
27
model, the advantage of the POMDP-IR presented in Chapter 2.4.3 is used, rewarding the agent for
reaching a state with lower uncertainty in the location of the objects. For that reason, a reward Rk for
each state variables Xk of an object is used.
Summarizing, each POMDP model of the Decision Maker, for the semantic mapping application is
defined by:
1. States and Transitions: The model considers one state variable for the robot and a state variable
for each object that can be located in the room. The robot and the object state variables repre-
sent the location of the robot and objects, respectively. The state transition model for the robot
represents the probabilities of it being located in a certain location, given the previous one and the
action taken.
2. Observations: The model has an observation binary variable for each object variable considered
to indicate the probability of the object being observed by the perception module or not.
3. Domain Actions: There is one action for searching for objects, triggering the perception model,
one action for moving the robot to each placement and an action just to stop the robot, indicating
the end of a searching process in that room.
4. Prediction Actions: A prediction action variable for each object is considered, indicating whether
an object is believed to be in some location in the room, not found in this room, or null if there is
not enough information.
5. Rewards: Each reward value of taking an action, given the robot and object location, depends
on the environment and the desired agent’s behavior. However, given the usage of information
rewards, in general, it makes sense to give higher rewards to the stop action than to the search
object action and higher rewards to the search object than the move actions, in order to represent
the action effort.
6. Informarion Rewards: The information rewards considered depend on the desired degree of
belief about the location of the objects.
An example of a POMDP model with two objects is presented in Figure 4.2, where the arrows repre-
sent the dependencies.
The fact that objects can change locations with time, given the natural dynamic characteristic of a
domestic environment, this can be represented in the model. The Knowledge Representation Engine
updates the distribution of the global belief, considering that there is an exponential decay, in the prob-
ability distribution of each object state variable Xk. For that purpose, the probability distribution of each
variable Xk is equal to Equation (4.10), where Pprevious(Xk = x) is the value of the probability distribu-
28
Figure 4.2: POMDP model example
tion of Xk at the previous time step, Puniform(Xk = x) is the probability value for a uniform distribution
and λ is the decay rate.
P (Xk = x) = Puniform (Xk = x) +[Pprevious (Xk = x)− Puniform (X = x)
]e−λt (4.10)
Then, when the Knowledge Representation Engine receives new information about the probability
distribution of the state variables in a specific room, the update is done as presented in Equation (4.7).
Where the P (Xk), that is the previous probability distribution of the state variable Xk, has already been
updated by the exponential decay presented in Equation (4.10).
In the semantic mapping application, the agent has two different kind of goals that are related to two
different kinds of POMDP selection criteria:
1. Generate and keep updated a semantic map of the environment
If the goal of the architecture is just to generate and keep a semantic map of the environment
updated, reducing the uncertainty about the location of the objects, it just needs to deal with an
active perception task. The agent aims to select actions that reduce its uncertainty about the world
state. This goal can be represented as the ambition of maximizing the POMDP value functions
sum because they are PWLC functions, where the expected reward will be lower towards the
center of the belief space. Then, the higher the entropy (5.1) of the belief state, the closer to the
29
middle of the belief space the system is in and the lower the subsequent expected reward. For that
reason, maximizing the value function corresponds to minimizing the entropy of the belief state,
that can be translated in having a lower uncertainty about the objects’ arrangement, as desired.
For that purpose, what is desired is maximizing the sum of the value functions of each POMDP,
means that the goal is to have an updated and confident semantic map of the global world, not just
having a really high confidence about the objects arrangement in one room. In order to maximize
the value functions sum, the criterion of the Knowledge Representation Engine for choosing the
POMDP is to choose the one with the lower expected rewards value associated, for the belief
b at that point. This can be explained by the convexity of the value functions because the POMDP
has implicit the goal of minimizing the entropy of the belief state. Then it will maximize the value
function to a value close to the maximum. This means that maximizing the Value function with
the lower initial value has a higher potential to obtain the highest cumulative expected rewards
increase. The belief bn that will derive from the POMDP chosen will have a lower entropy than
initially, and when it updates the global belief b, it will also decrease the entropy of b. In turn, this
will decrease the entropy of the beliefs bn of each POMDP, increasing the cumulative expected
rewards of each POMDP.
2. Carry out a task
In an architecture where the main goal is to carry out some tasks, such as find a specific object
or move an object to a place, the agent needs to have a lower uncertainty about the world state,
to be able to reach the goal. For that purpose, in this case, the agent can take an action that
cannot be directly related to task accomplishment, the action can have the intention of reducing
the uncertainty about the environment. In this case, the criterion for choosing the POMDP is to
choose the one with the higher expected rewards value, for the belief b at that point. A POMDP
model for carrying out a task has in its basis, the structure of the model presented before, however
it is necessary to add some actions that help with the task accomplishment. Depending on the
task, it may be necessary to add new variable states to the model or just new possible values to
the variable states, new observation variables and new rewards to the new states, that must be big
enough to make the agent give priority to the task accomplishment.
30
5Experimental Results
5.1 Implementation
In order to analyze, test and validate the architecture designed and presented in Chapter 4, it was
necessary to implement it. The implementation presented is based on the Robot Operating System
(ROS) framework, because this architecture is designed with the purpose of having a robotic application
and ROS is a flexible framework for writing robot software. The Knowledge Representation Engine is
implemented as a ROS node in Python because it allows importing ProbLog 1 as a package, in order to
interact directly with it. The Decision Maker can be represented as multiple instances of a node that is
responsible for each POMDP. For this, a Matlab implementation of the Symbolic Perseus 2 algorithm,
able to solve POMDP-IR is used. Symbolic Perseus is a point-based value iteration algorithm that is
able to tackle large factored POMDPs. The Knowledge Representation Engine is able to generate the
POMDP files, with all the information needed. These files can be opened in the software OpenMarkov 3,
enabling us to have a graphical representation of the POMDP model. These files are also used by the
Matlab solver, in order to compute offline an approximation to the optimal Value function.
In the real environment experiments, it is necessary to use a perception model to detect and recog-
nize objects in real-time. So, the experiments realized use a ROS wrapper of YOLOv3 4, trained with a
dataset for the objects that need to be recognized.
1https://dtai.cs.kuleuven.be/problog/2https://cs.uwaterloo.ca/~ppoupart/software.html3http://www.openmarkov.org/4https://github.com/pjreddie/darknet
31
5.2 World Model
In order to test the architecture operation in the semantic mapping context, it is important to start
by defining the domestic environment design used in the experimental results. The environment is
an apartment based on the ISRoboNet@Home Testbed, that is a certified by ERL Consumer Service
Robots, used to benchmark domestic robot features and tasks. The testbed layout is presented in
Figure 5.1.
Figure 5.1: ISRoboNet@Home Testbed layout
For each placement considered in the testbed, it is important to define different characteristics about
it. The furniture characteristics that will be considered in the presented experiments are positioned in
the 2D plane and furniture area. Given the position of each placement, it is possible to get the euclidean
distance between each pair of placements. The ProbLog model in the Knowledge Representation En-
gine uses this distance between objects to define the rewards of taking the action of moving from one
placement to another, for each POMDP. Each POMDP only considers the placements inside the re-
spective room. Then, the distance between outside and a placement inside the room is the average of
the distances to all the placements outside. In the case of taking the action of moving from one place to
another, the reward is actually a negative reward (penalty), representing the effort of moving the robot.
The bigger the distance, the more negative is the reward. The value range used for these rewards is
[−0.5,−0.7]. Besides that, in the model presented, the distance between furniture is also used to define
32
the low probability of observing an object placed where it is not, considering a higher probability, when
the object is placed in a furniture close to the one where it was observed. The value range considered
for this is [0.05, 0.1].
In the model presented, the furniture area is the available area to place objects and is used to find
the POMDP rewards of searching objects in that place. This reward is in the range [−0.25,−0.35]. It is
negative for the same reason that the moving actions and the smaller the area, the more negative is the
reward.
In addition to furniture, the world model needs also to define the objects that the robot needs to
consider for semantic mapping and its characteristics. For each object is given the volume that is used
to find the probability of observing an object where it really is. This enables us to represent the fact
that objects with smaller volume have a lower probability of being observed because they are small
and they can be easily occluded. The value range of the observation probabilities for these cases
is [0.8, 0.9], considering a probability of having false negative observations in the range of [0.1, 0.2],
depending on the object. The distance between furniture is also used to define the probability of having
false positive observations, which means the probability of observing an object where it is not located.
For that purpose, the probability of having false positive observations, if the object is not even in that
room, is equal to 0.01. If the object is located in the same room where the robot is searching, but not in
the same placement, the probability of getting a false positive observation is in the range of [0.05, 0.1],
depending on the distance between the furniture that the robot is looking for and the furniture where
the object is placed. The reward values and the observation probabilities model can be summarized in
Table 5.1.
Do Nothing Search Object Move Robot
Reward Value 0 [−0.35 ; −0.25] [−0.7 ; −0.5]
Observation
probabilities
object
location=
robot
location
0
[0.8 ; 0.9]
0object
location=
robot
location[0.05 ; 0.1]
object
location= none 0.01
Table 5.1: POMDP reward values and observation probabilities ranges for the different actions type
In appendix B it is presented an example of the world model files that the architecture receives as
input.
For the different experiments, two different scenarios for the testbed are considered, as presented
in Table 5.2. Scenario 1 considers 5 placements spread over 3 rooms and Scenario 2 considers 8
33
placements spread over 4 rooms.
Kitchen Living Room Dining Room Bedroom
Scenario 1kitchen table coffeee table
dining tablekitchen cabinet sideboard
Scenario 2kitchen table
coffeee tabledining table
bedkitchen cabinet
sideboardnigh stand
bookshelf
Table 5.2: Testbed scenarios used in the experiments
5.3 POMDP-IR Model
As mentioned in Chapter 4.2, in semantic mapping application, the used model has a POMDP for
each room, where the set of states are all the combinations of possible positions of each object and
robot, inside that room. The set of possible actions has, for each placement of the objects, an action to
make the robot go there (e.g. goSideboard). It also has one action to search for objects (searchObject)
and another one to do nothing (doNothing). The set of observations is composed by a binary variable
for each object, representing the probability of observing it or not.
The tool used to solve the POMDP allows solving factored POMDPs, where the transition, observa-
tion and reward functions are defined in terms of the state variables Xk, action variables and observation
variables, allowing to have compacted factored representations.
In the experiments, the POMDP-IR information rewards used are rcorrect = 0.53 and rincorrect = −4.78,
as proposed in [14] , in order to get β = 0.9 and reward the robot for having a degree of belief on the
object’s location, higher than 0.9.
Another assumption that is made in the experimental results presented, to simplify the model, is that
the environment is static, while the agent is taking decisions inside a room. Basically, the state variable
transitions T of each POMDP considered are deterministic. Then, the model assumes that, while the
robot is taking actions inside the room, the objects do not change position and the robot changes its
position, according to the move actions that it takes. Those are realistic assumptions, given that it is not
expected that the objects change location too often. Then, in the time span that the robot is exploring
a room, one can consider that the objects do not change position and even if it happens, without being
modeled, the object may detect those changes in the following searching episodes. On the other hand,
nowadays there are reliable and accurate navigation algorithms that work well in this kind of domestic
environments and the problem of navigation can be separated from the decision-making task.
However, the dynamic characteristic of a domestic environment keeps being represented in the
34
model, as explained in Chapter 4.2, using an exponential decay in the object state variables X in the
Knowledge Representation Engine.
5.4 Simulated Experiments
In order to analyze the behavior of the architecture designed, it is possible to do some experiments
in a simulated environment to analyze the performance of the architecture. For that reason, what needs
to be simulated is the perception model and the actions of moving the robot. In Scenario 1, three simple
cases will be presented, one considering a static environment and two considering it dynamic but without
changing the location of the objects. In Scenario 2, how the architecture deals with environment changes
and with errors in the perception model will be presented. For the experiments with a dynamic model, as
proposed in Chapter 4.2, one consider an exponential decay in the probability distributions of the location
of the objects, augmenting the entropy with the time. In the simulated experiments with a dynamic model
of the environment, a mean life λ−1 of 5 time slots was considered, taking into account that the action of
moving the robot corresponds to a time slot, the action of searching for objects corresponds to 0.3 time
slots and the action of doing nothing corresponds to 0.1 time slots.
5.4.1 Scenario 1
5.4.1.A Static Model
In order to analyze the architecture behavior in a simple scenario, it is considered that the environ-
ment is static and for that reason, the position of the objects does not change with time. At each time, the
states variables distribution is updated, using t = 0 in Equation (4.10), ignoring the exponential decay.
In this first experiment, the objects and their respective locations are considered, as presented in
Table 5.3. It is also considered an initial uniform distribution for each object state variable over the
possible 5 placements in Scenario 1. The probability distributions, the robot position and the consequent
action, for each time step, are presented in Chapter C.1.
object locationcocacola dining tablepringles kitchen table
mug gray coffee tablemug black kitchen cabinet
Table 5.3: Location of the objects in the static model
In Figure 5.2 is possible to analyze the progression of the objects distribution entropy. The degree
of uncertainty about the location of the objects can be quantified using the entropy E. For a discrete
35
probability distribution P = (p1, ..., pn), the Entropy E is defined by Equation (5.1). For each state
variable Xk, it decreases with time, as desired.
E (P ) = −n∑i=1
pi logn (pi) , (5.1)
At the time of step 0, the robot does not have any idea about the location of the objects, so the
objects distribution have a maximum entropy, E(Xk) = 1. Then, for each time step that the robot takes
in the action of searching for an object, the entropy decreases. When the object is seen, the entropy
decreases abruptly, because the uncertainty about the object also reduces a lot. The entropy of the
cocacola state variable keeps higher than the others because it is never observed by the robot and after
14 steps, the level of uncertainty about the world state is lower enough, so that the agent will always opt
for the action doNothing thereafter.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
step
0
0.2
0.4
0.6
0.8
1
en
tro
py
cocacola
mug black
mug gray
pringles
Pomdp selection
Figure 5.2: Progression of objects distributions entropy in a static model
In Table 5.4, it is possible to analyze the expected rewards of each POMDP for each time step that
the Knowledge Representation Engine needs to select which POMDP should control the robot behavior
and which room the robot should explore. At step 0, the expected rewards of the kitchen and the living
room POMDPs are roughly the same, because the number of placements and the initial belief are the
same, given the initial global uniform distribution of the location of the objects. The small difference is
related to the placements characteristics of each room, making the agent start by exploring the living
room. The dining room has just one placement and that is the reason to have a higher value of the
expected rewards because it can easily reach a lower uncertainty about the room state. At step 8,
the living room has already a higher value because the uncertainty about the room state was reduced
after exploring it before. The same happens at step 15, with the expected rewards increment of kitchen
POMDP.
36
step 0 8 15 16 17 18kitchen 22.53 21.24 38.71 38.65 38.65 38.65
living room 21.18 40.21 40.93 40.93 40.93 40.93dining room 34.46 34.43 38.13 38.13 38.13 38.13
Table 5.4: Expected rewards for POMDP selection steps in a static model example
5.4.1.B Dynamic Model
In realistic domestic environments, the static model is not appropriate, because this kind of behav-
iors are typically dynamic and for that purpose it is necessary to consider that the objects can change
location, using the exponential decay with the parameters referred in Chapter 5.4. For this experiment,
a different configuration of the objects locations, as presented in Table 5.5 is used. This configuration
remains the same throughout the experiment. However, in this case, there is a possibility of the location
of the objects had been changed, so it is used a dynamic model.
object locationcocacola kitchen tablepringles coffee table
mug gray dining tablemug black sideboard
Table 5.5: Location of the objects in the dynamic model
The initial distribution of each object state variable is uniform, as in the previous experiment, so the
agent starts with the maximum entropy. Henceforth, it is possible to verify in Chapter C.2 that the agent
keeps moving between the living room and the kitchen and it never visits the dining room. This can be
explained by the fact that there is just one placement in that room, and the agent can infer if the object
is there or not, verifying the hypothesis of being or not in another placement and taking into account that
the object needs to be in one of the considered placements. However, the entropy of the mug gray state
variable continues higher than the remaining objects’ entropy, as can be seen in Figure 5.3. After the
system stabilization, the P (Xmug gray = dining table) keeps oscillating in the range [0.758, 0.797], giving
a good confidence about the mug gray position and a stable entropy, even without observing it.
The expected rewards of the dining room are always higher than the remaining rooms, as presented
in Table 5.6. After exploring a room and decreasing the uncertainty about the objects that are located in
that respective room, the dynamic model designed increases the entropy of that objects state variables,
while the agent is exploring another room. This increase is enough to make the expected rewards of
exploring the first room lower than the others. This conduct justifies the behavior of keeping alternation
between exploring the kitchen and the living room and can be observed in Figure 5.3. After some
time, the system stabilizes keeping a periodic entropy oscillation. At that point, the robot keeps taking
37
0 10 20 30 40 50 60 70 80 90 100 110
step
0
0.2
0.4
0.6
0.8
1
entr
opy
cocacola
mug black
mug gray
pringles
Pomdp selection
Figure 5.3: Progression of objects distributions entropy in a dynamic model
the same decisions, until some object changes place, alternating between exploring the kitchen and
the living room, decreasing the entropy of the objects that are in the kitchen and in the living room,
respectively.
step 0 10 20 28 36 43 50 58kitchen 22.53 20.61 41.67 25.28 41.82 25.40 41.56 25.13
living room 21.18 41.85 22.00 41.38 23.39 41.46 23.53 41.56dining room 34.46 34.57 34.50 34.56 34.52 34.56 34.52 34.56
step 66 73 80 88 96 103 110kitchen 41.80 25.40 41.56 25.13 41.80 25.40 41.56
living room 23.35 41.46 23.53 41.56 23.35 41.46 23.53dining room 34.52 34.56 34.52 34.56 34.52 34.56 34.52
Table 5.6: Expected rewards for POMDP selection steps in a dynamic model example
On the contrary, if the architecture decided to explore the dining room, the entropy of the mug gray
would decrease. However, the entropy of the remaining objects would increase and, therefore, also
the entropy of the global belief, taking higher values then if the behavior was the one presented by the
architecture.
5.4.1.C Carrying out a task
As was referred in Chapter 4.2, it is also possible to use the architecture to model situations where
the agent has the goal of carrying out a specific task that requires a good knowledge about the world
state. However, that requirement is just a consequence for reaching it, and it is not strictly the goal
as before. Instead of having just the ambition of reducing uncertainty about the position of the objects
in general, in the experiments presented the main goal is to move the cocacola close to the pringles.
So, it is necessary to add, to each POMDP model, the actions graspCocacola and releaseCocacola, in
order to the agent be able to change the cocacola location. It is also necessary to add the possibility
of cocacola being placed on the robot gripper. For the cocacola state variable, the domain has the new
38
value gripper. It is also necessary to add big rewards to grasping the cocacola, when it is not in the
same placement as pringles and to release the cocacola when the cocacola is on the gripper and the
Robot is on the same placement as pringles. In this experiment it is also assumed that the actions of
grasping and releasing the object always succeed and that there is no effect of the exponential decay
on the probability of the object of being in the gripper.
In order to test the architecture, in this application, two different configurations of the location of the
objects are considered, as presented in Table 5.7.
configuration object location
1cocacola sideboardpringles dining table
2cocacola kitchen tablepringles coffee table
Table 5.7: Location of the objects for experiments with the goal of moving the cocacola to close to the pringles
When the main goal of the agent is carrying out a specific task, as moving the cocacola close to the
pringles, the criterion of POMDP selection is choosing the POMDP with the higher expected rewards, as
explained before. In Table 5.8 the expected values for POMDP selection steps, for both configurations
is presented.
Configurarion 1 Configurarion 2step 0 9 20 25/26/27 0 8 15 20/21/22
kitchen 36.63 17.72 16.65 19.19 36.63 63.87 17.65 19.19dining room 28.49 11.98 17.79 21.17 24.49 43.27 8.47 19.20living room 36.82 13.83 16.79 19.19 36.82 18.32 24.52 21.12
Table 5.8: Expected rewards for POMDP selection steps to carrying out a task
In order to compare the probability distribution of each state variable with the deterministic distribution
that corresponds to the true location of the object, the Hellinger distance between both is used. For two
discrete probability distributions U = (u1, ..., un) and V = (v1, ..., vn), the Hellinger distance is defined
as
H (U, V ) =1√2
√√√√ n∑i=1
(√ui −
√vi)
2, (5.2)
measuring the `1-distance between the probability vectors U and V , allowing us to quantify the similarity
between two probability distributions. There are several methods to measure the difference between two
probability distributions. This one is chosen to quantify the architecture performance because it is an
intuitive method, related to the Euclidean norm of the difference of the square root vectors, and given the
distribution of the true location of the object characteristics. It is a degenerate distribution and then, most
of the domain values have probability zero associated, which in most of the methods, as KL-divergence
39
or cross entropy, implies an indefinite value for some terms given the logarithm function used.
In Figure 5.4, it is possible to verify the Hellinger distance at each step, for both configurations. In
Configuration 1 the robot finds the cocacola in the first room that it visits, grasping it and keeping it
in the gripper until it finds the pringles that are in the dining room. After grasping the cocacola, the
robot visits the kitchen, because there are two placements there, so intuitively the probability of finding
it there is higher, which implies that the expected rewards are also higher, as presented in Table 5.8. In
Configuration 2 the robot finds the pringles and then, as soon as it finds the cocacola, it goes back to
the placements where the pringles were placed, in order to reach the goal.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
cocacola
pringles
Pomdp selection
graspCocacola
releaseCocacola
(a) Configuration 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
cocacola
pringles
Pomdp selection
graspCocacola
releaseCocacola
(b) Configuration 2
Figure 5.4: Progression of Hellinger distance until the robot reaches the goal for two different configurations
5.4.2 Scenario 2
5.4.2.A Objects changing position
In order to analyze the behavior of the architecture to modifications in the objects location, an ex-
periment in Scenario 2 is presented, considering three objects and that there are three modifications
in the objects locations, as it is presented in Table 5.9. After step 75, the mug gray is moved from the
dining table to the night stand in a different room. after step 120, the pringles is moved to the bookshelf,
remaining in the same room. After step 175, all the objects are moved to the kitchen.
40
steps object location
0-75mug gray dining tablecocacola kitchen tablepringles coffee table
76-120mug gray night standcocacola kitchen tablepringles coffee table
121-175mug gray night standcocacola kitchen tablepringles bookshelf
176-260mug gray kitchen cabinetcocacola kitchen tablepringles kitchen table
Table 5.9: Location of the objects for each step
In this experiment, the parameters of the exponential decay are the same as those in Scenario 1,
however the scenario is bigger, which implies a higher entropy in the state variables distribution, because
the robot has a bigger environment to explore, including a new room. In Figure 5.5, for each step,
the value of the Hellinger distance between the distribution of each object state variable is presented,
obtained by the architecture, and the distribution that corresponds to the reality, a degenerate distribution
with P (Xk = x′) = 1, where x′ corresponds the real position of the object in that instance. The Hellinger
distance allows quantifying the similarity between two probability distributions. The robot decides to go
to the living room in every two POMDP selections, as it is presented in Table C.4, because it has three
placements and a big exponential decay in the architecture is used.
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
cocacola mug gray pringles Pomdp selection Modifications in object location
Figure 5.5: Progression of Hellinger distance, with modifications in the location of the objects
Analyzing the robot’s behavior to the first change in the location of the objects at step 75, it is noted
that the robot is able to detect it and update the state variables distribution without affecting its behavior.
This happens because it is the mug gray that changes location and the robot finds its new location
by chance, without noticing before that it is not in the previous one. At step 120, the pringles change
location inside the same room and the modification is detected when the robot goes to that room again
and it notices that the pringles are not in the previous place. But then, given the large uncertainty about
41
the pringles location, the agent keeps exploring the remaining placements in the room, finding it and
reducing the uncertainty. For example, in the previous two times that the robot had explored the living
room, the robot did not decide to search for objects in all the placements. However, when it figured
out that the pringles were not in the previous place, the uncertainty increase made the agent decide to
explore all the placements, as presented in Table 5.10.
step robot location action110 kitchen cabinet gocoffee table111 coffee table searchObject112 coffee table searchObject113 coffee table searchObject114 coffee table goBookshelf115 bookshelf searchObject116 bookshelf searchObject117 bookshelf doNothing
step robot location action125 bed gocoffee table126 coffee table searchObject127 coffee table searchObject127 coffee table goSideboard128 sideboard searchObject129 sideboard goBookshelf131 bookshelf searchObject132 bookshelf doNothing
Table 5.10: Comparison between living room actions, before(left) and after(right) pringles changing location
At step 175, there is the last modification in the location of the objects, with all objects being placed
somewhere in the kitchen. This modification is done after the robot’s leaving of the kitchen and, for that
purpose, it also takes some steps until the robot is able to find those modifications. The robot keeps
exploring the bedroom and the living room, increasing the uncertainty about the location of the objects
until the expected rewards of the kitchen POMDP become the lower ones. Once it is in the kitchen
and finds all the objects, the uncertainty decreases. It also has the capability of keeping the Hellinger
distance in very low values, because it just needs to keep searching for the objects in that room, not
allowing the increase in the uncertainty about the location of the objects. It is also possible to verify
that when the robot finds the objects and decides to do nothing, because it already has an entropy of
approximately zero, in the next few POMDP selections steps, the lower expected reward values are in the
living room. However, giving the high confidence about the location of the objects, the first action taken
by that POMDP is to do nothing, which means not moving the robot from the kitchen. This process is
repeated until the uncertainty increases enough to make the kitchen POMDP being selected and select
actions to make the robot figure it out if the location of the objects remains the same.
5.4.2.B Incorrect observations robustness
The architecture presented needs to have a perception model to construct the POMDP observations
about the world state. For that purpose, it is important to analyze how the system modulates the per-
ception model. When the Knowledge Representation Engine needs to generate each POMDP, it also
needs to define its observation function. In this case, it defines an observation model for each object
state variable, defining the probability distribution over possible observations for each state variable,
42
given the previous state and the action. The observation model used in the experiments, as explained in
Chapter 5.2, allows defining these observation models for each state variable, depending on the object
that it represents, the robot location, at that time, and the room furniture configuration. For Scenario 2,
the observation model can be summarized as presented in Table C.5.
To verify if the system is able to deal with incorrect observations, the observations generation in this
simulated experiments is following the model implemented for each POMDP and presented in Table C.5.
For two different configurations of the location of the objects, as presented in Table 5.11, it is obtained
the Hellinger distance over 400 steps, as presented in Figure C.1 and Figure C.2. It is possible to
verify that the architecture is able to easily remove the effect of the erroneous observations when they
are generated at the same false negative and false positive rate that the observation function of each
POMDP considers. These are denominated as the expected negative and false positive rates.
configuration object location
1mug gray bedcocacola coffee tablepringles kitchen cabinet
2mug gray night standcocacola bookshelfpringles sideboard
Table 5.11: Location of the objects for experiments with wrong observations
In order to verify if the architecture is able to deal with incorrect observations at a higher rate than
the expected by each POMDP, in Table 5.12, it is presented the mean Hellinger distance, for 400 steps
and starting with a uniform distribution, for different false positive and negative rates. In Table 5.12,
it is possible to verify that the architecture keeps having a good robustness to incorrect observations,
even when it is generated false positive and negative observations at twice of the expected rate by
the POMDP model. In that case, the false negative rate is 40% for the mug gray, and even with that
conditions, the mean Hellinger distance remains lower.
configuration 1 configuration 2cocacola mug gray pringles cocacola mug gray pringles
no wrong observations 0.213 0.348 0.394 0.269 0.369 0.179
expected falsepositive andnegative rates
x
1 0.239 0.382 0.409 0.258 0.347 0.1811.25 0.229 0.373 0.405 0.248 0.378 0.1401.5 0.253 0.370 0.407 0.260 0.365 0.2321.75 0.246 0.357 0.404 0.281 0.396 0.2022 0.287 0.399 0.430 0.268 0.344 0.2062.5 0.330 0.551 0.405 0.304 0.478 0.2163 0.486 0.535 0.542 0.353 0.687 0.285
Table 5.12: Mean Hellinger distance for different multiples of the expected false and negatives rates
43
5.4.3 Performance Analysis
In order to analyze if the architecture is able to reduce the uncertainty about the environment in
different scenarios and objects configurations, it is presented the mean value of the Hellinger distance
for different objects configurations and scenarios. In Table 5.13 and Table 5.14 is presented the mean
Hellinger distance of three objects for 100 steps for Scenario 1 and 2 respectively, considering 10 differ-
ent random object configurations and an initial uniform distribution on the location of the objects.
experiment 1 2 3 4 5cocacola KT 0.213 DT 0.403 CT 0.159 KC 0.239 CT 0.234mug gray KT 0.214 KC 0.243 KT 0.256 KT 0.241 S 0.274pringles S 0.178 KT 0.212 CT 0.156 CT 0.169 KT 0.234
experiment 6 7 8 9 10cocacola CT 0.161 S 0.215 DT 0.403 KT 0.204 DT 0.396mug gray KT 0.322 DT 0.410 KT 0.215 CT 0.164 DT 0.403pringles S 0.198 DT 0.399 KC 0.240 CT 0.161 CT 0.174
Table 5.13: Mean Hellinger distance for 100 steps for different object configurations in Scenario 1.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining Table
experiment 1 2 3 4 5cocacola S 0.233 DT 0.630 S 0.229 KC 0.433 BS 0.208mug gray DT 0.658 KT 0.478 DT 0.659 KC 0.435 B 0.436pringles NS 0.474 CT 0.243 DT 0.651 S 0.237 KT 0.388
experiment 6 7 8 9 10cocacola BS 0.215 B 0.446 NS 0.219 S 0.233 KT 0.342mug gray B 0.443 CT 0.449 B 0.234 DT 0.644 KC 0.353pringles KC 0.389 KC 0.208 BS 0.253 S 0.232 CT 0.284
Table 5.14: Mean Hellinger distance for 100 steps for different object configurations in Scenario 2.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,
BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand
Considering that a uniform distribution corresponds to Hellinger distance equal to approximately
0.74 for Scenario 1 and 0.8 for Scenario 2, it is possible to verify that for different configurations in
both scenarios the architecture is able to keep a reduced uncertainty about the environment. However,
in Scenario 2 the mean Hellinger distance is bigger than in Scenario 1 because its complexity is also
bigger.
In order to also analyze if the architecture is able to keep a reduced uncertainty about the location
of the objects when the objects configuration changes, in Table 5.15 and Table 5.16 is presented the
mean Hellinger distance for 200 steps for Scenario 1 and 2, respectively, on that conditions. For each
experiment is selected an initial random configuration of the location of the objects and an initial uniform
44
distribution. Approximately halfway, the location of at least two of the objects is modified.
experiment 1 2 3 4cocacola CT KC 0.178 KC KT 0.215 DT CT 0.333 S S 0.199mug gray KT KT 0.198 DT KC 0.285 DT KT 0.354 S CT 0.220pringles DT S 0.303 DT S 0.344 DT S 0.331 KT DT 0.292
experiment 5 6 7cocacola KT DT 0.325 DT S 0.310 S DT 0.321mug gray S DT 0.341 DT CT 0.349 S KC 0.188pringles S KT 0.186 S KC 0.242 DT S 0.284
experiment 8 9 10cocacola CT CT 0.162 KC KT 0.274 S CT 0.235mug gray CT KT 0.257 CT KC 0.261 S DT 0.345pringles DT S 0.315 CT CT 0.172 KC KT 0.208
Table 5.15: Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 1.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining Table
experiment 1 2 3 4cocacola BS NS 0.320 B CT 0.381 KT B 0.344 BS DT 0.449mug gray KC KC 0.298 DT KC 0.464 S BS 0.283 NS KC 0.447pringles KC S 0.286 CT KT 0.314 CT NS 0.260 KT CT 0.333
experiment 5 6 7cocacola NS NS 0.332 CT B 0.269 KC NS 0.396mug gray CT KT 0.374 DT NS 0.469 S S 0.206pringles NS S 0.326 KT CT 0.354 KC KT 0.363
experiment 8 9 10cocacola KT CT 0.316 KC CT 0.429 B CT 0.277mug gray S B 0.414 DT DT 0.613 B NS 0.382pringles KC KC 0.310 KT B 0.416 S KC 0.289
Table 5.16: Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 2.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,
BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand
In the experiments presented in Table 5.15 and Table 5.16 is possible to verify that the architecture
is able to keep the uncertainty lower. The objects located in the dining table have higher uncertainty
because there is just one placement in the dining room and the agent prefers exploring the remain rooms,
reducing, indirectly, the uncertainty about it without never observing it and reducing the uncertainty about
the remain objects.
45
5.5 Real scenario experiments
In order to test the architecture performance in a real scenario, the mbot was the robot used, pre-
sented in Figure 5.6(b). It is used by SocRob@Home team of Institute for Systems and Robotics as a
research tool to test and implement the work developed by the research community. The real scenario
used in the experiments is the ISRoboNet@Home testbed, presented in Figure 5.6(a).
(a) Testbed (b) Robot
Figure 5.6: Testbed and robot used for the real scenario experiments
In the real experiments, it is considered the real time for the exponential decay, using a mean life
λ−1 of 5 minutes. There are no assumptions in the actions time, using the real time that the agent
spends in each action. For the perception model a ROS wrapper of YOLOv3 is used, as mentioned at
the beginning of Chapter 5.
The location of the objects is presented in Table 5.17, where it is possible to verify that, approximately
halfway through the experiment, the cocacola is moved form the kitchen table to the sideboard.
time(s) object location
0-1150mug gray dining tablecocacola bedpringles coffee table
1151-2145mug gray dining tablecocacola sideboardpringles coffee table
Table 5.17: Location of the objects in the real scenario experiment
The Hellinger distance for each object, during the 35 minutes of the experiment, is presented in
Figure 5.7 and it is possible to verify that the architecture has a similar behavior both in the real scenario
and in the simulated experiments. When the cocacola changes the location the robot manages to figure it
out and updates the cocacola distribution in just about 2 minutes. The architecture is also able to reduce
the uncertainty about the pringles location, when it receives 2 consecutive false negatives. Thus, before
46
it left the room, the robot moves back to the coffee table and collects new observations, given the higher
uncertainty about the pringles location.
In order to complement the experiments presented in this Chapter, in the SocRob@Home Youtube
Channel 5, some videos of the real scenario experiments are available.
K: Kitchen L: Living room B: Bedroom D: Dining room
0 200 400 600 800 1000 1200 1400 1600 1800 2000
time(s)
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
L B L K L B L K L B L K L B L K L B L K L B L K
cocacola
False Positive
False negative
Pomdp selection
(a) cocacola
0 200 400 600 800 1000 1200 1400 1600 1800 2000
time(s)
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
L B L K L B L K L B L K L B L K L B L K L B L K
pringles
False Positive
False negative
Pomdp selection
(b) pringles
0 200 400 600 800 1000 1200 1400 1600 1800 2000
time(s)
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
L B L K L B L K L B L K L B L K L B L K L B L K
mug gray
False Positive
False negative
Pomdp selection
(c) mug gray
Figure 5.7: Hellinger distance over the time for each object, in the real behavior experiment
5https://www.youtube.com/playlist?list=PL8fxtCUfhUR1HcqrGb8WHZ-F0bRrhtk47. Accessed 14 Oct 2018
47
5.6 Scalability Analysis
When the Knowledge Representation Engine creates the POMDPs models, it is necessary to find the
optimal policy for each one to use it in the Decision Maker. The use of POMDPs in real-world problems,
as the one presented in this work, has been limited by the poor scalability of existing solution algorithms
to find the optimal policy for a finite-horizon discrete POMDP. Large policy spaces and large state spaces
are two important sources of the scalability problem and that is the main motivation of having a Decision
Maker with several POMDPs. In the experiments presented it is used a Matlab implementation of the
Symbolic Perseus, a point-based value iteration algorithm, to get the optimal policy of each POMDP.
This implementation is not the most efficient, however, it enables us to compare and understand the
poor scalability of a solution, to find the optimal policy of a POMDP and understand how the presented
architecture tries to solve the problem.
The number of states of a POMDP depends on the number of state variables X ′k and the respective
domain D′k, that in the semantic mapping application is the number of objects and placements, where
each object can be located, respectively. The architecture presented enables us to reduce the number
of state variables of each POMDP, considering that some of the objects cannot be located in some of
the rooms, and enables us to reduce the domain of each state variable, considering just the placements
present in that room and the possibility of not being there.
Considering an alternative architecture, where the Decision Maker was composed by just one POMDP
representing all the environment, the set of state variables of the POMDP would be equal to the set off
all state variables X . Also, the number of POMDP states would be equal to∏Kk=1Dk, where K is equal
to the number of objects plus one and that is because of the Robot state variable. |Dk| is the number
of placements where the object or robot k can be located, considering that they are the same. For the
Decision Maker proposed in this architecture, the set of state variables Xn of each POMDPn is smaller
or equal to X , considering just the objects that can be located in that room. The domain D′k of each state
variable X ′k ∈ Xn is equal to the number of placements in that room plus one, given the possibility of the
object not being there. The relations between the POMDP number of states, observations, normal ac-
tions and prediction actions with the number of placements and objects are presented in Equation (5.3).
# states = (# placements + 1)# objects+1
# observations = 2# objects
# normal actions = (# placements + 2)
# prediction actions = (# objects)# placements+2
(5.3)
In Table 5.18, for different POMDPs with a different number of placements and objects, the runtime for
finding an optimal policy and the number of states, observations, normal actions and prediction actions
48
is presented. It is possible to verify that the problem is intractable when the number of placements and
objects increases. The crux of using this architecture is that the runtime, instead of depending on the
total number of objects and placements of the world model, just depends on the number of objects and
placements of the most complex room.
# placements # objects # states # observations# normalactions
# predictionactions
runtime
12 8 4 3 8 33s3 16 8 3 27 4m12s4 32 16 3 64 31m
22 27 4 4 16 02h16m3 81 8 4 81 08h30m4 243 16 4 256 26h50m
32 64 4 5 32 03h30m3 256 8 5 243 98h20m4 1024 16 5 1024 526h46m
Table 5.18: Scalability analysis of different POMDP models used in the Decision Maker
The architecture deals with offline POMDPs, which means that the problem of finding the optimal
policy just needs to be done at the beginning of the architecture operation. This could mislead one into
thinking that the complexity and the runtime, needed to find that optimal policy, are not a big problem.
However, as explained before, a domestic environment is a dynamic environment and for that reason,
even the world model is constantly changing, the objects and placements considered can change. This
architecture is able to, given a new world model, generate the new POMDPs and then find the new
optimal policies.
5.7 Discussion
In the simulated experiments as well as in the real scenario experiments, the agent is able to have
an active behavior to keep the level of uncertainty reduced, as desired. The architecture is able to deal
with the dynamism of the environment, as it is verified in Chapter 5.4.1.B. In the results presented, when
there was any modification in the state of the environment, the architecture was able to minimize the
uncertainty about the world state, in a short amount of time or steps and in different conditions: moving
the object to a placement inside the same room or to a different room, moving one or multiple objects
simultaneously, making the robot first observe the object in a new location or that is not in the previous
one, etc.
The architecture presents a good performance dealing with incorrect observations as verified in
Table 5.12. Even when there is twice as many incorrect observations rate as expected by the model,
49
the results keep expressing a good performance of the architecture in representing the location of the
objects.
It is also possible to verify, that the approach for selecting the POMDP, given the global belief, can
efficiently minimize the uncertainty in the location of all the objects. The architecture presents some
interesting behaviors, as deciding to explore more times the rooms with more placements, when the
goal is finding the location of all the objects, which makes perfect sense given the robot’s purpose. Most
of the times, the robot does not even visit the dining room because it has just one placement and it can
infer the probability of the object being there, knowing if it is somewhere else.
The results obtained in the real scenario reinforce the simulated experiment results and allow to fulfill
the proposed objective, of having a real robot able to keep an updated probabilistic representation of the
environment and use that information for decision making.
The results presented in Chapter 5.6 about the scalability analysis, allow verifying that the approach
of having a Decision Maker composed by several POMDPs, each one representing a room, can minimize
significantly the complexity of finding the optimal behavior in a domestic environment. In the architecture
presented, the complexity of the decision making no longer depends on the number of placements in
the global world, thus depending on the number of objects and the maximum number of placements in
a room.
50
6Conclusion
6.1 Achievements
In this work an efficient architecture able to keep a global representation about the complex world
states is presented, taking decisions to reduce the uncertainty and even accomplish a goal. The ap-
plication of the architecture in the semantic mapping context, allows us to create a system that is able
to keep an updated probabilistic representation of the location of the objects and eventually use that
information to carry out a task. The architecture presented is also robust to incorrect observations as
presented in the experiment results.
A method of bypassing the problem of finding the optimal policy on large POMDP is presented,
having multiple POMDPs and reducing the number of states of each one. However, this also requires a
system to control the different POMDPs and keep a representation of the global world state, presented
in this work as the Knowledge representation Engine.
Another important achievement presented in this work, is to have an architecture that is responsible
for the full generation of the POMDP models for the semantic mapping application, using just the world
model to infer the states, observation, transition and rewards function of each POMDP, avoiding the
explicit declaration of the model.
The results that were obtained are another important achievement, supporting the purpose and
presenting the performance desired. As well as the architecture implementation in a real scenario, with
a real robot. Verifying that the robot was able to keep moving autonomously inside the testbed, changing
its behavior given the location of the objects and the internal belief. The robot was able to maintain a
lower uncertainty about the state of the world, as intended.
51
6.2 Future Work
As a follow-up to this work, there is a lot of interesting studies and experiments that could be done,
such as:
• Using multiple robots, and adapting the architecture in order to have multiple robots exploring
multiple rooms.
• Exploring the ProbLog inference process to add more inference in the Knowledge Representation
Engine to update the belief
• Using techniques of Inverse Reinforcement Learning in order to define the transition, observation
and reward function of each POMDP
52
Bibliography
[1] I. Kostavelis and A. Gasteratos, “Semantic mapping for mobile robotics tasks: A survey,” Robotics
and Autonomous Systems, vol. 66, pp. 86–103, 2015.
[2] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable
stochastic domains,” Artificial Intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
[3] A. Pronobis, K. Sjoo, A. Aydemir, A. N. Bishop, and P. Jensfelt, “Representing spatial knowledge in
mobile cognitive systems,” Intelligent Autonomous Systems 11, IAS 2010, pp. 133–142, 2010.
[4] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard,
“Past, present, and future of simultaneous localization and mapping: Toward the robust-perception
age,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.
[5] S. Thrun, “Robotic Mapping: A Survey,” Science, vol. 298, no. February, pp. 1–35, 2002.
[6] J.-A. Fernandez-Madrigal, J. Gonzalez, C. Galindo, and A. Saffiotti, “Robot task planning using
semantic maps,” Robotics and Autonomous Systems, vol. 56, no. 2008, pp. 955–966, 2008.
[7] L. De Raedt and A. Kimmig, “Probabilistic Programming Concepts,” arXiv preprint arXiv:1312.4328,
pp. 1–42, 2013.
[8] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed., ser. Series in Artificial
Intelligence. Upper Saddle River, NJ: Prentice Hall, 2010.
[9] D. Fierens, G. Van Den Broeck, and J. Renkens, “Inference and Learning in Probabilistic Logic
Programs using Weighted Boolean Formulas,” Theory and Practice of Logic Programming, vol.
15:3, pp. 358 – 401, 2013.
[10] A. Dries, A. Kimmig, W. Meert, J. Renkens, G. V. D. Broeck, J. Vlasselaer, and L. D. Raedt,
“ProbLog2 : Probabilistic logic programming,” in Machine Learning and Knowledge Discovery in
Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015,
Proceedings, Part III, 2015, pp. 312–315.
53
[11] M. Wiering and M. van Otterlo, Reinforcement Learning: State-of-the-Art. Springer Publishing
Company, Incorporated, 2014.
[12] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for
POMDPs,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence, ser.
IJCAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, pp. 1025–1030.
[13] M. T. Spaan and N. Vlassis, “Perseus: Randomized point-based value iteration for pomdps,” Journal
of artificial intelligence research, vol. 24, pp. 195–220, 2005.
[14] M. T. Spaan, T. S. Veiga, and P. U. Lima, “Decision-theoretic planning under uncertainty with infor-
mation rewards for active cooperative perception,” Autonomous Agents and Multi-Agent Systems,
vol. 29, no. 6, pp. 1157–1185, 2015.
[15] H. Zender, O. M. Mozos, P. Jensfelt, G.-j. M. Kruijff, and W. Burgard, “Conceptual Spatial Represen-
tations for Indoor Mobile Robots,” Robotics and Autonomous Systems, vol. 56, no. 6, pp. 493–502,
2008.
[16] C. Galindo, A. Saffiotti, S. Coradeschi, P. Buschka, J. A. Fernandez-Madrigal, and J. Gonzalez,
“Multi-hierarchical semantic maps for mobile robotics,” 2005 IEEE/RSJ International Conference on
Intelligent Robots and Systems, IROS, no. 3, pp. 3492–3497, 2005.
[17] R. Capobianco, J. Serafin, J. Dichtl, G. Grisetti, L. Iocchi, and D. Nardi, “A proposal for semantic
map representation and evaluation,” 2015 European Conference on Mobile Robots, ECMR 2015 -
Proceedings, 2015.
[18] A. Nuchter and J. Hertzberg, “Towards semantic maps for mobile robots,” Robotics and Autonomous
Systems, vol. 56, no. 11, pp. 915–926, 2008.
[19] A. Pronobis and P. Jensfelt, “Large-scale semantic mapping and reasoning with heterogeneous
modalities,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 3515–
3522, 2012.
[20] D. Pangercic, B. Pitzer, M. Tenorth, and M. Beetz, “Semantic Object Maps for robotic housework
- Representation, acquisition and use,” IEEE International Conference on Intelligent Robots and
Systems, pp. 4644–4651, 2012.
[21] E. Bastianelli, D. D. Bloisi, R. Capobianco, F. Cossu, G. Gemignani, L. Iocchi, and D. Nardi, “On-line
semantic mapping,” 2013 16th International Conference on Advanced Robotics, ICAR 2013, 2013.
[22] M. Hanheide, C. Gretton, R. Dearden, N. Hawes, J. Wyatt, A. Pronobis, A. Aydemir, M. Gobel-
becker, and H. Zender, “Exploiting probabilistic knowledge under uncertain sensing for efficient
54
robot behaviour,” IJCAI International Joint Conference on Artificial Intelligence, pp. 2442–2449,
2011.
[23] T. S. Veiga, P. Miraldo, R. Ventura, and P. U. Lima, “Efficient object search for mobile robots in
dynamic environments: Semantic map as an input for the decision maker,” IEEE International Con-
ference on Intelligent Robots and Systems, vol. 2016-Novem, pp. 2745–2750, 2016.
[24] M. Araya, O. Buffet, V. Thomas, and F. Charpillet, “A POMDP Extension with Belief-dependent
Rewards,” in Advances in Neural Information Processing Systems 23. Curran Associates, Inc.,
2010, pp. 64–72.
55
AP (Xk|Z) derivation
The probability distribution P (Xk|Z) depends on the function fk,n : Dk → D′
k of POMDPn, that
associates each element of Dk a single element of D′
k, on the prior P (Xk) and on the distribution
P(X′
k|Z)
, provided by one of the POMDPs in the Decision Maker. Dk and D′
k are the set of values
of the variable Xk and X′
k, respectively. Taking into account the characteristics of the function π, the
probability P(X′
k|Xk
)is given by Equation (A.1).
P(X′
k|Xk
)=
1 if X
′
k = fk,n (Xk)
0 otherwise(A.1)
Then, the probability P (Xk|Z) can be derived as presented in (A.2).
P (Xk|Z) =∑X′k
P(Xk, X
′
k|Z)
=∑X′k
P(Xk|X
′
k, Z)P(X′
k|Z)
=∑X′k
P(Xk|X
′
k
)P(X′
k|Z) (
Xk ⊥⊥ Z|X′
k
)
=∑X′k
P(X′
k|Xk
)P (Xk)
P(X′k
) P(X′
k|Z)
(Bayes Rule)
=P (Xk)∑
x∈Dk
P(X′k|Xk = x
)P (Xk = x)
P(X′
k|Z)
, with X′
k = fk,n (Xk)
(A.2)
56
BWorld Model Files Examples
B.1 Furniture Model example
name,room,x,y,area
"kitchen_table","kitchen","5.648537","-1.205729","2.2"
"kitchen_cabinet","kitchen","4.665875","-0.266732","2.5"
"dining_table","dining_room","7.473658","-1.135156","2.0"
"coffee_table","living_room","6.496987","-3.889012","0.5"
"sideboard","living_room","7.631555","-3.9","0.5"
B.2 Objects Model example
name,category,distribution,volume
"cocacola","drink","uniform",0.355
"pringles","snack","uniform",0.375
57
CExperiment tables and figures
C.1 Scenario 1 - Static Environment Model
step object coffeetable
diningtable
kitchencabinet
kitchentable sideboard robot
location action
0
mug gray 0.200 0.200 0.200 0.200 0.200
out goCoffee tablemug black 0.200 0.200 0.200 0.200 0.200cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200
1
mug gray 0.200 0.200 0.200 0.200 0.200coffeetable searchObjectmug black 0.200 0.200 0.200 0.200 0.200
cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200
2
mug gray 0.860 0.011 0.011 0.011 0.108coffeetable searchObjectmug black 0.049 0.243 0.243 0.243 0.221
cocacola 0.032 0.248 0.248 0.248 0.225pringles 0.025 0.249 0.249 0.249 0.227
3
mug gray 0.984 0.000 0.000 0.000 0.015coffeetable goSideboardmug black 0.011 0.259 0.259 0.259 0.214
cocacola 0.004 0.260 0.260 0.260 0.215pringles 0.003 0.261 0.261 0.261 0.215
4
mug gray 0.984 0.000 0.000 0.000 0.015
sideboard searchObjectmug black 0.011 0.259 0.259 0.259 0.214cocacola 0.004 0.260 0.260 0.260 0.215pringles 0.003 0.261 0.261 0.261 0.215
5
mug gray 0.996 0.000 0.000 0.000 0.003
sideboard searchObjectmug black 0.012 0.312 0.312 0.312 0.052cocacola 0.005 0.320 0.320 0.320 0.034pringles 0.003 0.323 0.323 0.323 0.027
6
mug gray 0.999 0.000 0.000 0.000 0.001
sideboard searchObjectmug black 0.011 0.326 0.326 0.326 0.011cocacola 0.004 0.330 0.330 0.330 0.004pringles 0.003 0.331 0.331 0.331 0.003
58
step object coffeetable
diningtable
kitchencabinet
kitchentable sideboard robot
location action
7
mug gray 0.999 0.000 0.000 0.000 0.000
sideboard doNothingmug black 0.010 0.329 0.329 0.329 0.002cocacola 0.004 0.332 0.332 0.332 0.001pringles 0.003 0.332 0.332 0.332 0.000
8
mug gray 0.999 0.000 0.000 0.000 0.000
sideboard goKitchen tablemug black 0.010 0.329 0.329 0.329 0.002cocacola 0.004 0.332 0.332 0.332 0.001pringles 0.003 0.332 0.332 0.332 0.000
9
mug gray 0.999 0.000 0.000 0.000 0.000kitchentable searchObjectmug black 0.010 0.329 0.329 0.329 0.002
cocacola 0.004 0.332 0.332 0.332 0.001pringles 0.003 0.332 0.332 0.332 0.000
10
mug gray 0.999 0.000 0.000 0.000 0.000kitchentable searchObjectmug black 0.014 0.465 0.424 0.094 0.003
cocacola 0.006 0.487 0.444 0.062 0.001pringles 0.000 0.010 0.095 0.895 0.000
11
mug gray 0.999 0.000 0.000 0.000 0.000kitchentable goKitchen cabinetmug black 0.016 0.523 0.436 0.021 0.004
cocacola 0.007 0.536 0.447 0.009 0.001pringles 0.000 0.000 0.011 0.989 0.000
12
mug gray 0.999 0.000 0.000 0.000 0.000kitchencabinet searchObjectmug black 0.016 0.523 0.436 0.021 0.004
cocacola 0.007 0.536 0.447 0.009 0.001pringles 0.000 0.000 0.011 0.989 0.000
13
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet searchObjectmug black 0.000 0.015 0.979 0.006 0.000
cocacola 0.011 0.881 0.094 0.013 0.002pringles 0.000 0.000 0.001 0.999 0.000
14
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000
cocacola 0.012 0.960 0.013 0.013 0.002pringles 0.000 0.000 0.000 1.000 0.000
15
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000
cocacola 0.012 0.960 0.013 0.013 0.002pringles 0.000 0.000 0.000 1.000 0.000
16
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000
cocacola 0.012 0.960 0.013 0.013 0.002pringles 0.000 0.000 0.000 1.000 0.000
17
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000
cocacola 0.012 0.960 0.013 0.013 0.002pringles 0.000 0.000 0.000 1.000 0.000
18
mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000
cocacola 0.012 0.960 0.013 0.013 0.002pringles 0.000 0.000 0.000 1.000 0.000
Table C.1: State variables distributions and actions for a static model of the environment
59
C.2 Scenario 1 - Dynamic Environment Model
step object coffeetable
diningtable
kitchencabinet
kitchentable sideboard robot
location action
0
mug gray 0.200 0.200 0.200 0.200 0.200
out goCoffee tablemug black 0.200 0.200 0.200 0.200 0.200cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200
10
mug gray 0.005 0.327 0.332 0.332 0.005diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.328 0.333 0.333 0.003pringles 0.994 0.003 0.000 0.000 0.003
20
mug gray 0.170 0.648 0.004 0.008 0.170
kitchen goCoffee tablemug black 0.100 0.095 0.003 0.003 0.801cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.801 0.095 0.003 0.003 0.100
28
mug gray 0.011 0.762 0.109 0.114 0.004diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.083 0.088 0.825 0.003pringles 0.994 0.003 0.000 0.000 0.003
36
mug gray 0.106 0.782 0.004 0.007 0.101
kitchen goCoffee tablemug black 0.088 0.083 0.003 0.003 0.825cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.825 0.083 0.003 0.003 0.088
43
mug gray 0.007 0.797 0.093 0.096 0.007diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.076 0.081 0.837 0.003pringles 0.994 0.003 0.000 0.000 0.003
50
mug gray 0.095 0.797 0.007 0.007 0.095
kitchen goCoffee tablemug black 0.081 0.076 0.003 0.003 0.837cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.837 0.076 0.003 0.003 0.081
58
mug gray 0.007 0.758 0.114 0.114 0.007diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.097 0.102 0.796 0.003pringles 0.994 0.003 0.000 0.000 0.003
66
mug gray 0.103 0.783 0.004 0.007 0.103
kitchen goCoffee tablemug black 0.088 0.083 0.003 0.003 0.825cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.825 0.083 0.003 0.003 0.088
73
mug gray 0.007 0.797 0.093 0.096 0.007diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.076 0.081 0.837 0.003pringles 0.994 0.003 0.000 0.000 0.003
80
mug gray 0.095 0.797 0.007 0.007 0.095
kitchen goCoffee tablemug black 0.081 0.076 0.003 0.003 0.837cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.837 0.076 0.003 0.003 0.081
60
step object coffeetable
diningtable
kitchencabinet
kitchentable sideboard robot
location action
88
mug gray 0.007 0.758 0.114 0.114 0.007diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.097 0.102 0.796 0.003pringles 0.994 0.003 0.000 0.000 0.003
96
mug gray 0.103 0.783 0.004 0.007 0.103
kitchen goCoffee tablemug black 0.088 0.083 0.003 0.003 0.825cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.825 0.083 0.003 0.003 0.088
103
mug gray 0.007 0.797 0.093 0.096 0.007diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994
cocacola 0.003 0.076 0.081 0.837 0.003pringles 0.994 0.003 0.000 0.000 0.003
110
mug gray 0.095 0.797 0.007 0.007 0.095
kitchen goCoffee tablemug black 0.081 0.076 0.003 0.003 0.837cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.837 0.076 0.003 0.003 0.081
Table C.2: State variables distributions in POMDP selection steps, for a dynamic model of the environment
61
C.3 Scenario 2 - Objects Changing Position
step object bed bookshelf coffeetable
diningtable
kitchencabinet
kitchentable
nightstand sideboard robot
location
0mug gray 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
outcocacola 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125pringles 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
10mug gray 0.197 0.007 0.007 0.195 0.195 0.195 0.197 0.007 living
roomcocacola 0.199 0.004 0.004 0.197 0.197 0.197 0.199 0.004pringles 0.000 0.002 0.990 0.002 0.002 0.002 0.000 0.002
20mug gray 0.004 0.085 0.085 0.246 0.246 0.246 0.004 0.085
bedroomcocacola 0.002 0.082 0.082 0.250 0.250 0.250 0.002 0.082pringles 0.002 0.067 0.664 0.066 0.066 0.066 0.002 0.067
31mug gray 0.097 0.005 0.005 0.257 0.261 0.261 0.097 0.017 living
roomcocacola 0.096 0.003 0.003 0.261 0.265 0.265 0.096 0.011pringles 0.002 0.002 0.991 0.002 0.000 0.000 0.002 0.002
40mug gray 0.184 0.083 0.083 0.360 0.005 0.005 0.184 0.096
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.055 0.056 0.720 0.055 0.002 0.002 0.055 0.056
50mug gray 0.225 0.005 0.002 0.360 0.082 0.082 0.225 0.018 living
roomcocacola 0.070 0.002 0.002 0.071 0.071 0.712 0.070 0.002pringles 0.000 0.002 0.991 0.002 0.002 0.002 0.000 0.002
59mug gray 0.002 0.076 0.074 0.435 0.155 0.155 0.013 0.090
bedroomcocacola 0.002 0.060 0.060 0.113 0.113 0.591 0.003 0.060pringles 0.002 0.056 0.720 0.055 0.055 0.055 0.002 0.056
70mug gray 0.096 0.005 0.005 0.379 0.197 0.197 0.103 0.018 living
roomcocacola 0.093 0.003 0.003 0.162 0.165 0.474 0.093 0.008pringles 0.002 0.002 0.991 0.002 0.000 0.000 0.002 0.002
79mug gray 0.161 0.074 0.073 0.430 0.004 0.004 0.168 0.086
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.055 0.056 0.720 0.055 0.002 0.002 0.055 0.056
89mug gray 0.202 0.004 0.002 0.407 0.080 0.080 0.208 0.016 living
roomcocacola 0.070 0.002 0.002 0.071 0.071 0.712 0.070 0.002pringles 0.000 0.002 0.991 0.002 0.002 0.002 0.000 0.002
96mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.992 0.000
bedroomcocacola 0.003 0.053 0.053 0.109 0.109 0.618 0.003 0.053pringles 0.002 0.049 0.755 0.048 0.048 0.048 0.002 0.049
103mug gray 0.052 0.002 0.002 0.052 0.051 0.051 0.790 0.002 living
roomcocacola 0.054 0.002 0.002 0.139 0.142 0.562 0.054 0.043pringles 0.002 0.002 0.991 0.002 0.000 0.000 0.002 0.003
110mug gray 0.091 0.052 0.052 0.091 0.004 0.004 0.656 0.052
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.048 0.049 0.755 0.048 0.002 0.002 0.048 0.049
118mug gray 0.130 0.003 0.002 0.126 0.059 0.059 0.583 0.038 living
roomcocacola 0.054 0.002 0.002 0.055 0.056 0.775 0.054 0.002pringles 0.000 0.002 0.991 0.002 0.002 0.002 0.000 0.002
125mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.993 0.000
bedroomcocacola 0.003 0.052 0.052 0.095 0.095 0.650 0.003 0.052pringles 0.002 0.049 0.755 0.048 0.048 0.048 0.002 0.049
62
step object bed bookshelf coffeetable
diningtable
kitchencabinet
kitchentable
nightstand sideboard robot
location
133mug gray 0.064 0.002 0.002 0.064 0.063 0.063 0.740 0.002 living
roomcocacola 0.068 0.007 0.002 0.137 0.140 0.570 0.068 0.007pringles 0.007 0.922 0.021 0.012 0.010 0.010 0.007 0.011
140mug gray 0.102 0.052 0.052 0.102 0.004 0.004 0.630 0.052
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.052 0.717 0.063 0.055 0.002 0.002 0.052 0.056
149mug gray 0.148 0.010 0.004 0.144 0.073 0.073 0.545 0.004 living
roomcocacola 0.067 0.002 0.002 0.068 0.068 0.726 0.067 0.002pringles 0.001 0.988 0.002 0.002 0.002 0.002 0.001 0.002
156mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.993 0.000
bedroomcocacola 0.003 0.053 0.053 0.106 0.106 0.625 0.003 0.053pringles 0.002 0.754 0.049 0.048 0.048 0.048 0.002 0.049
164mug gray 0.064 0.002 0.002 0.064 0.063 0.063 0.740 0.002 living
roomcocacola 0.068 0.007 0.002 0.146 0.149 0.551 0.068 0.008pringles 0.002 0.988 0.002 0.002 0.001 0.001 0.002 0.002
171mug gray 0.102 0.052 0.052 0.102 0.004 0.004 0.630 0.052
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.048 0.754 0.049 0.048 0.002 0.002 0.048 0.049
180mug gray 0.148 0.010 0.004 0.144 0.073 0.073 0.545 0.004 living
roomcocacola 0.067 0.002 0.002 0.068 0.068 0.726 0.067 0.002pringles 0.001 0.988 0.002 0.002 0.002 0.002 0.001 0.002
190mug gray 0.002 0.105 0.096 0.298 0.196 0.196 0.012 0.096
bedroomcocacola 0.002 0.063 0.063 0.112 0.112 0.584 0.002 0.063pringles 0.002 0.701 0.060 0.059 0.059 0.059 0.002 0.060
201mug gray 0.091 0.003 0.018 0.313 0.236 0.236 0.098 0.006 living
roomcocacola 0.087 0.002 0.008 0.163 0.163 0.488 0.087 0.003pringles 0.132 0.005 0.022 0.235 0.235 0.235 0.132 0.004
211mug gray 0.002 0.073 0.084 0.290 0.237 0.237 0.002 0.075
bedroomcocacola 0.002 0.072 0.076 0.184 0.184 0.408 0.002 0.072pringles 0.002 0.077 0.090 0.251 0.251 0.251 0.002 0.076
220mug gray 0.075 0.005 0.016 0.303 0.261 0.261 0.075 0.005 living
roomcocacola 0.075 0.003 0.010 0.219 0.219 0.397 0.075 0.003pringles 0.076 0.003 0.010 0.278 0.278 0.278 0.076 0.003
228mug gray 0.002 0.059 0.068 0.291 0.258 0.258 0.004 0.059
bedroomcocacola 0.002 0.058 0.063 0.227 0.227 0.364 0.003 0.058pringles 0.002 0.057 0.063 0.273 0.273 0.273 0.002 0.057
237mug gray 0.073 0.004 0.013 0.291 0.270 0.270 0.075 0.004 living
roomcocacola 0.073 0.003 0.008 0.242 0.246 0.352 0.073 0.003pringles 0.073 0.002 0.007 0.278 0.282 0.282 0.073 0.002
245mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000
246mug gray 0.004 0.000 0.000 0.004 0.986 0.004 0.004 0.000 living
roomcocacola 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000pringles 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000
247mug gray 0.003 0.001 0.001 0.003 0.986 0.004 0.003 0.001 living
roomcocacola 0.003 0.001 0.001 0.003 0.003 0.986 0.003 0.001pringles 0.003 0.001 0.001 0.003 0.003 0.986 0.003 0.001
252mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000
63
step object bed bookshelf coffeetable
diningtable
kitchencabinet
kitchentable
nightstand sideboard robot
location
253mug gray 0.003 0.000 0.000 0.003 0.986 0.004 0.003 0.000 living
roomcocacola 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000pringles 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000
254mug gray 0.003 0.001 0.001 0.003 0.986 0.004 0.003 0.001 living
roomcocacola 0.003 0.001 0.001 0.003 0.003 0.986 0.003 0.001pringles 0.003 0.001 0.001 0.003 0.003 0.986 0.003 0.001
259mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000
kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000
260mug gray 0.003 0.000 0.000 0.003 0.986 0.004 0.003 0.000 living
roomcocacola 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000pringles 0.003 0.000 0.000 0.003 0.003 0.986 0.003 0.000
Table C.3: State variables distributions in POMDP selection steps, with changes in the location of the objects
step 0 10 20 31 40 50 59 70 79bedroom 19.506 19.733 31.252 22.358 20.553 20.641 30.401 22.437 20.815kitchen 19.848 20.516 20.052 20.076 31.077 22.579 20.911 21.001 31.367
dining room 25.506 25.616 25.631 25.669 25.574 25.628 25.650 25.674 25.603living room 13.217 29.050 14.121 28.005 14.515 29.530 14.379 28.448 14.533
step 89 96 103 110 118 125 133 140 149bedroom 20.689 31.438 22.834 21.550 21.651 31.597 22.121 21.361 21.337kitchen 22.632 21.833 21.936 31.380 23.366 22.146 21.438 31.246 22.794
dining room 25.647 25.480 25.535 25.467 25.527 25.474 25.535 25.469 25.540living room 29.812 15.711 26.857 16.142 27.429 15.799 23.022 16.543 30.069
step 156 164 171 180 190 201 211 220 228bedroom 31.544 22.343 21.406 21.338 30.638 19.416 31.577 20.160 31.226kitchen 21.895 21.486 31.262 22.798 20.782 20.290 20.209 20.212 20.176
dining room 25.479 25.543 25.472 25.540 25.594 25.549 25.543 25.552 25.553living room 16.071 30.045 16.846 30.127 14.430 26.998 13.782 27.564 14.276
step 237 245 246 247 252 253 254 259 260bedroom 20.346 31.788 30.685 29.593 31.785 30.683 29.591 31.785 30.683kitchen 20.169 31.742 29.654 28.586 31.736 29.669 28.601 31.736 29.669
dining room 25.558 31.785 31.234 30.688 31.789 31.238 30.692 31.789 31.238living room 28.017 28.642 28.642 28.642 28.637 28.637 28.637 28.637 28.637
Table C.4: Expected rewards for POMDP selection steps in a experiment with objects changing location
64
C.4 Scenario 2 - Wrong Observations Robustness
roomrobot
locationobject location cocacola pringles mug gray
livingroom
coffeetable
coffee table 0.873 0.9 0.8sideboard 0.099 0.099 0.099bookshelf 0.1 0.1 0.1
none 0.01 0.01 0.01
sideboard
coffee table 0.099 0.099 0.099sideboard 0.873 0.9 0.8bookshelf 0.087 0.087 0.087
none 0.01 0.01 0.01
bookshelf
coffee table 0.1 0.1 0.1sideboard 0.087 0.087 0.087bookshelf 0.873 0.9 0.8
none 0.01 0.01 0.01out - 0 0 0
kitchen
kitchencabinet
kitchen cabinet 0.873 0.9 0.8kitchen table 0.096 0.096 0.096
none 0.01 0.01 0.01
kitchentable
kitchen cabinet 0.096 0.096 0.096kitchen table 0.873 0.9 0.8
none 0.01 0.01 0.01out - 0 0 0
bedroom
bednight stand 0.093 0.093 0.093
bed 0.873 0.9 0.8none 0.01 0.01 0.01
nightstand
night stand 0.873 0.9 0.8bed 0.093 0.093 0.093
none 0.01 0.01 0.01out - 0 0 0
diningroom
dining tabledining table 0.873 0.9 0.8
none 0.01 0.01 0.01out - 0 0 0
Table C.5: Probabilities of observing each object in Scenario 2
65
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
cocacola
False Positive
False negative
(a) cocacola (8 false positives, 3 flase negatives)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
pringles
False Positive
False negative
(b) pringles (0 false positives, 2 flase negatives)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
mug gray
False Positive
False negative
(c) mug gray (3 false positives, 12 flase negatives)
Figure C.1: Hellinger distance considering wrong observations for objects in configuration 1
66
0 50 100 150 200 250 300 350
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
cocacola
False Positive
False negative
(a) cocacola (8 false positives, 2 false negatives)
0 50 100 150 200 250 300 350
step
0
0.2
0.4
0.6
0.8
1
He
llin
ge
r d
ista
nce
pringles
False Positive
False negative
(b) pringles (3 false positives, 11 false negatives)
0 50 100 150 200 250 300 350
step
0
0.2
0.4
0.6
0.8
1
Helli
nger
dis
tance
mug gray
False Positive
False negative
(c) mug gray (0 false positives, 8 false negatives)
Figure C.2: Hellinger distance considering wrong observations for objects in configuration 2
67