a combination of object recognition and localisation for ...1234088/fulltext01.pdf · localsation...

88
IN DEGREE PROJECT MECHANICAL ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2018 A Combination of Object Recognition and Localisation for an Autonomous Racecar JONATHAN CRESSELL ISAC TÖRNBERG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

IN DEGREE PROJECT MECHANICAL ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

A Combination of Object Recognition and Localisation for an Autonomous Racecar

JONATHAN CRESSELL

ISAC TÖRNBERG

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

Page 2: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 3: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

i

Master of Science Thesis TRITA-ITM-EX 2018:193

A Combined Approach for Object Recognition and Localsation for an Autonomous Racecar

Jonathan Cressell

Isac Törnberg Approved

2018-06-19 Examiner

De-Jiu Chen Supervisor

Lars Svensson Commissioner

ÅF AB Contact person

Tor Ericson

Abstract

With autonomous vehicles being a hot topic for research it has also become an interest in the world of motor sport. To be able to run a vehicle autonomously it needs to know what the current pose of the vehicle is and what the environment looks like. This thesis aims to solve this problem using SLAM and object detection with 2D LiDAR and camera as sensor input, looking at the performance in terms of accuracy and latency. The object detection problem was repurposed as an object recognition problem by utilising the 2D LiDAR for cone candidate extraction which was projected onto the camera image and verified by a Convolutional Neural Network (CNN). Two different CNN architecture were used, MobileNet and a minimalistic architecture with less than 7 layers. The best performing CNN with four convolutional layers and two fully connected layers reached a total of 87.3% accuracy with a classification time of 4.6ms on the demonstrator constructed. Three different SLAM algorithms were implemented, Pose Graph Optimization, Rao-Blackwellized Particle Filter and Extended Kalman Filter (EKF). When tested on the demonstrator the EKF solution showed the best results with a mere 20mm average error in vehicle position and 39mm average error in cone position. Further, the end-to-end timing of the EKF algorithm was the fastest at an average of 32ms.

The two best performing algorithms were combined for an evaluation, with the output of the CNN as input to the EKF. The performance was measured to an average error of 19mm for the position and 51mm for the cones. Further, the latency was only increased by the 4.6ms that the CNN required for classification, to a total of 36.54ms.

Page 4: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 5: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

iii

Examensarbete TRITA-ITM-EX 2018:193

En kombination av Objektigenkänning och Lokalisering i en Autonom Racingbil

Jonathan Cressell

Isac Törnberg Godkänt

2018-06-19

Examinator

De-Jiu Chen

Handledare

Lars Svensson Uppdragsgivare

ÅF AB Kontaktperson

Tor Ericson

Sammanfattning Autonoma fordon är ett aktuellt ämne för research och har på senare tid även tagit sig in i motorsporten. För att åstadkomma autonom körning med ett fordon behöver det veta den nuvarande positionen och hur omgivningen ser ut. Detta examensarbete ämnar att lösa det problemet med SLAM och objektdetektion med hjälp av en 2D Lidar och kamera som sensorer, och utvärdera utifrån precision och latens.

Objektdetektionsproblemet omformulerades som ett objektigenkänningsproblem med hjälp av 2D LiDAR sensorn för extrahering av kon-kandidater som sedan projicerades på kamerabilden och verifierades av ett Convolutional Neural Network (CNN). Två olika CNN arkitekturer användes, MobileNet och en minimalistisk arkitektur med färre än 7 lager. Det bäst presterande CNN:et med fyra faltningslager och två fullt kopplade lager nådde en total på 87.3% precision med en klassificeringstid på 4.6ms på demonstratören.

Tre olika SLAM algoritmer implementerades, Pose Graph Optimisation, Rao-Blackwellized Particle Filter, och Extended Kalman Filter (EKF). När de testades på demonstratorn gav EKF det bästa resultatet med endast 20mm genomsnittligt fel för fordonets position och 39mm för konernas position. Den totala tiden för samma algoritm var även den kortaste på ett genomsnitt av 32ms. De två bäst presterande algoritmerna kombinerades för utvärdering, med utmatningen från CNN som inmatning till EKF. Prestandan mättes till ett genomsnittligt fel på 19mm för fordonets position och 51mm för konorna. Vidare så ökades latensen med endast den tid det tog för CNN att klassificera konorna, det vill säga 4.6ms, till en total av 36.54ms,

Page 6: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 7: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Nomenclature

CNN Convolutional Neural Network

DNN Deep Neural Network

EKF Extended Kalman Filter

GPS Global Positioning System

IC Individual Compatibility

ICNN Individual Compatibility Nearest Neighbour

ICP Iterative Closest Point

IMU Inertial Measurement Unit

JCBB Joint Compatibility Branch and Bound

LiDAR Light Detection and Ranging

mAP Mean Average Precision

MRPT Mobile Robot Programming Toolkit

NN Nearest Neighbour

PDF Probability Density Function

PGO Pose Graph Optimisation

RBPF Rao-Blackwellized Particle Filter

RMSE Root Mean Square Error

ROS Robot Operating System

v

Page 8: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

vi

SEIF Sparse Extended Information Filter

SIR Sampling Importance Re-sampling

SLAM Simultaneous Localisation and Mapping

SOTA State of the Art

SSD Single Shot MultiBox Detector

UKF Unscented Kalman Filter

vESC Vedder Electronic Speed Controller

Page 9: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Description . . . . . . . . . . . . . . . . . . . . . 31.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research Question . . . . . . . . . . . . . . . . . . . . . . 41.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Safety and Risks . . . . . . . . . . . . . . . . . . . . . . . . 51.7 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.8 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . 71.9 Division of Work . . . . . . . . . . . . . . . . . . . . . . . 71.10 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Simultaneous Localisation and Mapping 92.1 Introducing the SLAM problem . . . . . . . . . . . . . . . 9

2.1.1 The SLAM Problem Formulation . . . . . . . . . . 112.2 Solving the SLAM problem . . . . . . . . . . . . . . . . . 12

2.2.1 Filtering Techniques . . . . . . . . . . . . . . . . . 122.2.2 Optimisation Techniques . . . . . . . . . . . . . . 142.2.3 Peripherals . . . . . . . . . . . . . . . . . . . . . . 162.2.4 Data Association . . . . . . . . . . . . . . . . . . . 18

2.3 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Pose Graph Optimisation . . . . . . . . . . . . . . 202.3.2 Rao-Blackwellized Particle Filter . . . . . . . . . . 202.3.3 Extended Kalman Filter . . . . . . . . . . . . . . . 21

2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.1 A Platform for Indoor Localisation, Mapping, and

Data Collection using an Autonomous Vehicle . . 222.4.2 MIT and Hypha Racecar . . . . . . . . . . . . . . . 22

vii

Page 10: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

viii CONTENTS

2.4.3 A Comparison of Data Association Techniquesfor Simultaneous Localization and Mapping . . . 23

2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 232.5.1 Pose Graph Optimisation . . . . . . . . . . . . . . 232.5.2 Rao-Blackwellized Particle Filter . . . . . . . . . . 242.5.3 Extended Kalman Filter . . . . . . . . . . . . . . . 25

3 Object Recognition 273.1 Computer Vision and Object Recognition Overview . . . 27

3.1.1 Deep Learning Era . . . . . . . . . . . . . . . . . . 273.1.2 Convolutional Neural Networks . . . . . . . . . . 28

3.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.1 R-CNN and Fast R-CNN . . . . . . . . . . . . . . . 303.2.2 YOLO: You only look once . . . . . . . . . . . . . 313.2.3 SSD: Single Shot MultiBox Detector . . . . . . . . 313.2.4 MobileNet . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Real-time Traffic Cone Detection for Autonomous

Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.2 Cone Detection: Using a Combination of LiDAR

and Vision-Based Machine Learning . . . . . . . . 343.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.1 Cone Candidate Extraction . . . . . . . . . . . . . 343.4.2 Transfer Learning . . . . . . . . . . . . . . . . . . . 343.4.3 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4.4 Training Data . . . . . . . . . . . . . . . . . . . . . 36

4 Demonstrator 394.1 Demonstrator . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 Software Architecture . . . . . . . . . . . . . . . . . . . . . 41

4.2.1 Robot Operating System . . . . . . . . . . . . . . . 42

5 Results 455.1 SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 455.1.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . 495.3 Combined Performance . . . . . . . . . . . . . . . . . . . 54

Page 11: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CONTENTS ix

6 Discussion and Conclusions 576.1 SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . 596.3 Combined Performance . . . . . . . . . . . . . . . . . . . 61

7 Future work 637.1 SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . 637.3 Combined Performance . . . . . . . . . . . . . . . . . . . 64

A Confusion matrix 65

Bibliography 67

Page 12: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 13: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 1

Introduction

This chapter introduces the problem, scope of the project, research questionsand surrounding circumstances.

1.1 Background

Autonomous vehicles (AV) are currently a hot topic and can to no sur-prise be found high up in the Gartner hype curve, seen in Figure 1.1,and are currently on its way to the adoption stage. Companies in in-frastructure, safety, automotive and service as they are all working to-wards autonomous vehicles [1]. The market of autonomous vehicles isforecast to grow at a Compound annual growth rate of 36.9% between2017 and 2027 eventually ending up as a $126.8 billion dollar marketaccording to [2]. These advancements and expectations have placeda heavy burden on various research fields such as sensors, percep-tion and machine learning. One major area being researched is calledSimultaneous Localisation and Mapping (SLAM) which addresses aproblem that is considered fundamental in robotics. The SLAM solu-tion is a key component for creating a moving autonomous system inan unknown environment and has been solved in a number of differ-ent ways [3].

1

Page 14: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

2 CHAPTER 1. INTRODUCTION

Figure 1.1: Gartner hype curve 2017 [4].

The topic of autonomous vehicles has spread even to motor sport.The Formula Student racing series is a yearly set of events where stu-dent teams from around the globe participate in a competition us-ing student built race cars. Formula Student Germany introduced aDriverless Vehicle class in 2017, where the participants vehicle is sup-posed to drive through a circuit outlined by cones, illustrated in Figure1.2. In 2018 KTH Formula Student [5] aim to be the first Swedish teamto present a functional solution in this competition. In cooperationwith ÅF AB who wish to improve their competence in the AV area, theSLAM and Object Recognition problems are addressed in this masterthesis.

Page 15: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 1. INTRODUCTION 3

Figure 1.2: Formula Student Autonomous Drive [6].

1.2 Problem Description

The Formula Student competition comes with a set of rules and lim-itations. Firstly, the event will be held outdoors on a tarmac surfaceand the circuit will be outlined with small traffic cones distributed ata max distance of 5m from each other, with yellow cones for the in-ner border and blue for the outer border of the circuit. Larger orangecones are placed at start, finish and time keeping lines. A descriptionof the track can be seen in Figure 1.3. The borders of the circuit arepainted, and there may be further markings that are not part of thecircuit. Participants are not provided with any further map data priorto the event and no modification of the circuit is allowed, nor is ad-ditional placement of landmarks. Other than the above, there are nolimitations regarding solutions of the problem. The goal is to com-plete ten laps around the circuit in the shortest time possible [7], fullyautonomously. This can be done by running an object avoidance al-gorithm the first lap while mapping the track and in the following

Page 16: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

4 CHAPTER 1. INTRODUCTION

laps use the obtained map and current position in a trajectory plan-ner to obtain the optimal racing line. The part of mapping the trackand providing accurate real-time positioning using SLAM and objectrecognition was set as the goal of this master thesis.

Start

Position

Yellow/Blue Cone

Small/Big Orange Cone

Red TK Marking & TK Equipment

(Shape undefined)

Start / Finish Line

10 Laps

6 m

3 m

min

.Stop

Area

(after 10 laps)

5 m max.

High Contrast

Track Limit Line

Figure 1.3: Formula Student Germany track drive for driverless [7].

1.3 Scope

The objects to be recognised were traffic cones in an unknown indoorenvironment. Two cones at the start/finish line were larger and coneson the left or right hand side of the track had different colours. Themodel for motion control was provided on beforehand and was notbe a subject of this thesis. A scale demonstrator was constructed totest the system and obtain results. Implementation and analysis werelimited to the autonomous racecar scenario.

1.4 Research Question

The SLAM problem has seen a great number of different solutions overthe years. With different characteristics to each solution, there is noone-fits-all algorithm available. In a strictly defined scenario as the

Page 17: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 1. INTRODUCTION 5

one described in Section 1.2, it is interesting to know what aspects willaffect the choice of SLAM algorithm and how the different algorithmswill perform under the given circumstances. These questions are sum-marised as the first research question below.

• How does the choice of SLAM algorithm impact latency and accuracyfor mapping and localisation in an autonomous racecar?

The SLAM solution relies on the sensors ability to accurately iden-tify relevant landmarks in the vicinity of the robot. While there areseveral ways of achieving this, the great majority of solutions utiliseeither camera or laser scans, which in turn are well suited to iden-tify the cones used as landmarks in this scenario. A leading methodin object recognition using camera feed are the Convolutional NeuralNetworks, an extension to the traditional Neural Networks, which arewell suited for identifying cones in a high speed racing environment.This leads to the research question below.

• How does the choice of Convolutional Neural Network architecture forcone recognition with a small training set impact accuracy and latencyin an autonomous racecar?

1.5 Methodology

This thesis begins with a qualitative study of related literature to ob-tain required knowledge on the subject and identify state of the artapproaches to solving the problems in question. After deciding onwhich approaches to pursue, implementation is attempted experimen-tally starting from solutions available with open source licenses. Theseopen source solutions are then modified to suit the scope of this the-sis. A demonstrator is built in parallel to use as a test platform for theimplementations. The results are gathered in a quantitative mannerusing the demonstrator on a pre-determined test track. The best per-forming algorithms are then combined to provide an understandingof how the object recognition and SLAM algorithms perform together.

1.6 Safety and Risks

The risks involved with the solutions presented in this work are re-lated to those of the implementation, since the SLAM and Object Recog-

Page 18: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

6 CHAPTER 1. INTRODUCTION

nition algorithms alone do not output anything that can cause harm tothe environment. However, these algorithms are commonly combinedwith techniques that allow the robot or vehicle to move autonomously.In such cases, it would be necessary to verify the integrity of the SLAMoutput map upon which for example a trajectory planner bases it out-put. Further, if an exploration algorithm uses the object recognitionoutput to determine the existence of obstacles, this would also requirean extra layer of safety. Such improvements could be redundant al-gorithms or sensors which would prevent the robot or vehicle frominflicting harm on the environment. Further, depending on the im-plementation, standards such as ISO 26262 [8] would require investi-gation. This subject has been further investigated in a neighbouringmaster thesis between ÅF AB and KTH Formula Student [9].

1.7 Ethics

A number of ethical questions arise with the application of SLAM al-gorithms and object recognition in AVs. Many of these dilemmas areoften based on variants of the well known trolley problem [10]. If anaccident cannot be avoided, how is the vehicle to choose whether itshould hit a wall and sacrifice those in the vehicle, or hit a group ofpeople with unknown consequences [11, 12]? According to six surveysperformed on residents of the USA, there is a clear ethical dilemmawith AVs and dangerous traffic situations ite[12]. One of the surveysshowed that while it is seen as morally correct to program an AV tosacrifice the passengers for the sake of saving a greater number ofpedestrians, the respondents themselves would rather not buy sucha vehicle.

In the case of accidents there will also be an issue regarding liabil-ity. Should the owner, the vehicle producer or even the governmentwho permitted the vehicles use in traffic be held responsible for theoutcome [13]? This also has to be decided before AVs can become re-ality on common roads.

An object recognition system can become an ethical problem in itsapplication, for example a facial recognition software can act racist[14]. This can happen even though the intention is not to create a bi-ased or racist system but can be caused by naive assumptions or by abiased data set. A well performing system can still cause ethical issues

Page 19: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 1. INTRODUCTION 7

since there is a possibility that it might violate peoples personal life bygathering data about them if they are recognised and tracked by a sys-tem. The same applies here, even though the intention of the systemis not to track a person the data is still there and can be in worst caseexploited by another system or entity.

1.8 Sustainability

The algorithm itself has very little impact on sustainability without anapplication, as such autonomous vehicles are again discussed. AVs canbe beneficial in the attempt to reach global goals such as EU 2020 [15]and Transport 2050 [16]. From a general perspective, AVs will have alarge impact on the social, environmental and economic sustainability[17]. In social aspects, AVs are likely to reduce traffic related deathsand reduce the number of jobs in the transport sector. Environmen-tally, AVs can improve the energy efficiency by for example car shar-ing where the increased utilisation of each vehicle would reduce thetotal required number of vehicles in an area [18]. From an economicperspective, automotive manufacturers risk lower sales and their cus-tomers may change from end consumers to car sharing companies oreven governing bodies that aim to improve infrastructure. Further, ona private economy level, transport related expenses may generally de-crease. However, autonomous vehicles do not only introduce issuesbut will eventually reduce stress, improve productivity as well as re-duce crash rates and insurance costs. They also enable shared vehiclepossibilities and provide independent mobility for non-drivers suchas adolescents and people with disabilities [19].

1.9 Division of Work

The work was divided between Isac and Jonathan where Isac was re-sponsible for object recognition and Jonathan for the SLAM solution.Remaining work was performed jointly.

Page 20: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

8 CHAPTER 1. INTRODUCTION

1.10 Report Outline

The report is divided into 7 chapters. The first chapter introduces theproblem as well as the scope and project outline. The second and thirdchapter focuses on the SLAM and object recognition problem with abrief background, state of the art solutions and algorithm implemen-tation in each chapter respectively. The following chapter explains thehardware demonstrator implementation as well as the software in thesystem overview. Chapter 5 presents the results from the demonstra-tor and Chapter 6 discusses the results as well as the thesis as a whole,and draws a number of conclusions with answers to the research ques-tions. Finally the future work in Chapter 7 describes what could beimproved, spin-off ideas as well suggestions on continued work to do.

Page 21: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 2

Simultaneous Localisation andMapping

This chapter introduces the subject of Simultaneous Localisation and Map-ping from a theoretical perspective, presents state of the art solutions and abrief description of the implementations.

2.1 Introducing the SLAM problem

A fundamental problem in the field of autonomous vehicles and roboticsin general is the subject of localisation; to allow a robot or vehicleto identify its own position and orientation, further called pose, us-ing onboard sensors. A common solution to this problem is to usea Global Positioning System (GPS), but for applications where GPSis sub-optimal, unreliable or unavailable, alternative solutions are re-quired [20, 21]. Odometry, inertial measurements and similar solu-tions that could be suggested suffer from accumulative errors and eventhe smallest drift will cause the pose to deviate greatly over time, thusother methods are necessary [22].

The localisation problem can be solved by using sensors to scan theenvironment in which the robot or vehicle is operating. By identi-fying range and bearing towards suitable landmarks in the vicinity,the robots position can be acquired. However, this requires a map towhich the position can be projected. If this map is determined on be-forehand and available to the robot this problem is solved, howeverwhen this is not the case a map of the environment has to be created

9

Page 22: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

10 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

during operation. A chicken and egg problem occurs where the robotlocation is required to create the map and the map is required to iden-tify the location. This has come to be known as the Simultaneous Lo-calization and Mapping (SLAM) problem. A visual description of theproblem can be seen in Figure 2.1 where the white poses and yellowlandmarks are ground truth, and the grey marks present the SLAMoutput.

Figure 2.1: The SLAM problem [23].

Solving the SLAM problem comes down to a matter of fusing datafrom multiple sources. A simple system would include a Light De-tection and Ranging (LiDAR) device which outputs laser range scanstogether with wheel odometry on a robot. The range finder wouldoutput observed features in the environment and the odometry wouldoutput the movement of the robot. A combination of these allows forcreating a map of where the observations are located along with thetrajectory traversed by the robot.

Page 23: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 11

There are several solutions to the SLAM problem with one aspect incommon; the solutions use probabilistic estimates. By predicting therobots next pose using control input and previous pose data, then cor-recting this by using the predicted pose along with observation data,it is possible to simultaneously create a map of the environment andidentify the robot’s location.

2.1.1 The SLAM Problem Formulation

As mentioned above, the SLAM problem is solved using probabilisticestimates, with the input commonly being measurements and controlsignals and the output being the estimated map and robot pose. Herefollows a mathematical formulation of the problem. The system sets ofcontrol inputs u1:t and observations z1:t are considered known, where:

u1:t = {u1, u2, ..., ut}z1:t = {z1, z2, ..., zt},

followed by a set of past poses x0:t and the actual environment mapm that are considered unknown, where

x0:t = {x0, x1, x2, ..., xt}.

The control inputs and observations are used to estimate the mapand robot pose by the following probabilistic approach using Bayes’rule:

p(x0:t,m|z1:t, u1:t).

The above equation is known as the full SLAM problem and hascommonly not been possible to solve in real time due to the compu-tational complexity with respect to the number of variables. By dis-regarding the previous poses of the robot in the problem formulationand attempting to approximate for only the current pose based on thelatest sensor data, Online SLAM is defined in accordance with equa-tion 2.1.

p(xt,m|z1:t, u1:t) (2.1)

Page 24: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

12 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

Even simplified, this proves to be a difficult problem to solve giventhat the environment and robot pose are both unknown, since the es-timates for both map and pose correlate. This results in initial errorpropagation, but as the number of identified landmarks increase andpreviously identified landmarks are identified again, the error can begreatly reduced. This in itself is however a difficult issue called dataassociation which will be investigated further at a later stage in this re-port. Successful data association can result in loop closure when return-ing to a previously traversed area, adjusting the map to better mirrorreality.

2.2 Solving the SLAM problem

A great number of methods have been conceived to solve SLAM prob-lem since it was formulated. These solutions are generally either filteror optimisation based and thus inherit characteristics from the methodused.

2.2.1 Filtering Techniques

Known as Online SLAM techniques, filter based solutions incremen-tally update only the current pose of the robot together with the map,as discussed in Section 2.1. There are a number of different filters com-monly used, the most common ones are discussed below.

Extended Kalman Filter

Branched from the Kalman Filter [24] which is limited to linear cases,the Extended Kalman Filter (EKF) was developed to allow computa-tion of non-linear systems by means of linearisation using a first-orderTaylor expansion [25]. The EKF has been shown to be well-performingas long as the linearisation is made around the true value of the statevector [26], which is in practice not entirely possible since this is thevalue to be estimated. If the actual true value is not included in the es-timate, the uncertainty will not converge [27]. However, since the esti-mates are mostly close enough to the true value, EKF can be applied,as long as the maps do not grow too large [23]. The EKF function willcalculate the jacobian of the state vector in each update resulting in atime complexity to the square of the state vector size. A solution to this

Page 25: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 13

was presented by [28] where submaps of the recently updated area areused instead of the entire map, and a correlation between submaps aremaintained but not processed every update.

Unscented Kalman Filter

In order to compensate for the limitations of the EKF linearisationmethod a new version of the Kalman Filter was introduced by [29]called the Unscented Kalman Filter (UKF). By selecting a number ofsigma points in the vicinity of the expected value and passing thesethrough a non-linear function to then calculate an estimate, the lin-earised estimate will be greatly improved for non-linear functions. Thishowever comes with greatly increased complexity, making implemen-tation more difficult.

Information Filter

Another filter technique that can be utilised is the Sparse Extended In-formation Filter (SEIF) which is an information form of EKF [30] thatexploits the computational properties of sparse matrices. The implica-tions of using SEIF are essentially getting a faster computations on thecost of enforcing a sparse information filter and therefore disregardingsome of the correlations between landmarks [31], reducing the accu-racy.

Particle Filtering

Rao-Blackwellized particle filters (RBPF) was utilised in a solution pre-sented by [32] which, unlike the Gaussian distribution that Kalman fil-ters use, apply a probability distribution with a finite set of samplesor hypotheses, also called particles. A high density of particles in anarea corresponds to a high probability and a low density correspondsto a low probability. This can be used to represent any multi-modaldistribution, if enough samples are given, which makes particle filterssuitable for a wide range of estimation problems. The capacity to trackmulti-modal beliefs and include non-linear motion and measurementmodels makes the performance of particle filters particularly robust[33]. However, this comes at the cost of high computational require-ments, making implementation more difficult.

Page 26: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

14 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

A commonly used factorisation of the joint posterior presented in Sec-tion 2.1, known as Rao-Blackwellization, allows computation of therobot pose, or trajectory, given only the observation and odometrymeasurements. This can in turn be used to compute the map. Theposterior is estimated using a particle filter, where each particle repre-sents a potential trajectory of the robot with an individual map asso-ciated to each particle [32]. Using the estimated posterior, or pose, thetechnique "mapping with known poses" can be applied to analyticallycalculate the map [34].

Since each particle will not maintain a correct trajectory and map, hencethe need for multiple particles, adjustments must be made in orderto refresh the set of particles and prevent the uncertainty of the algo-rithm from diverging. For this purpose, a Sampling Importance Re-sampling (SIR) filter is commonly applied [35]. When used togetherwith a Rao-Blackwellized particle filter, the map is incrementally up-dated when measurements and observations are made available, byupdating the set of particles representing the vehicle pose and sur-rounding map. This has been explained as a four step process con-sisting of sampling, importance weighting, re-sampling and map esti-mation. The new set of particles is obtained from the existing set bysampling from a proposal distribution [35, 34]. The new particles arethen assigned weights, as the proposal distribution is in general not ac-curate. The number of particles used in an application is often static,as such the particles with the lowest importance weight are discardedto leave room for the more accurate particles.

2.2.2 Optimisation Techniques

In opposition to the simplified Online SLAM approach, optimisationtechniques attempt to solve the Full SLAM problem where the entiretrajectory of the robot is updated, rather than only the current pose.The Full SLAM approach can thus be advantageous depending on thepurpose of the SLAM application. A common solution using optimi-sation, or smoothing, is the pose graph optimisation technique.

Pose Graph Optimisation

In Pose Graph Optimisation, the SLAM problem can be seen as a graphwith nodes representing the pose of the robot in time as landmarks,

Page 27: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 15

and edges between nodes as constraints between poses as seen in Fig-ure 2.2. The constraints are built up by observations and control sig-nals to the robot.

Figure 2.2: A PGO visualisation. The dotted lines represent con-straints, stars are landmarks and circles are poses.

Since the raw measurements from sensors include some kind ofnoise the graph has to be optimised against this by minimising theerror. This means that the configuration of robot poses are matchedto best fit the constraints. The graph based SLAM can therefore beseen as two separate parts, the graph construction, also called front-end, and the graph optimisation, also called back-end [36]. The con-straints between poses can be updated when revisiting a previouslytraversed area by loop closing, reducing the uncertainty of the graph.Since all poses are linked together with constraints, this update wouldbe very costly. However, by introducing a hierarchical structure be-tween poses where closely linked poses are grouped together in sev-eral stages, the computational cost of each update can be reduced enoughto realise good enough performance for an online application [37]. A

Page 28: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

16 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

visualisation of the constraint reduction can be seen in Figure 2.3.

Figure 2.3: The reduction of constraints in hierarchical PGO [38].

2.2.3 Peripherals

A number of commonly occurring peripherals are used along withmost SLAM implementations. For example, there are a number ofstandardised mapping styles, where grid maps and feature maps arethe most common in the two-dimensional case.

Map Styles

SLAM algorithms output maps in different styles depending on whatis preferred by the user and the suitability to the algorithm. Two ofthe major methods are Occupancy Grid Maps, often called only GridMaps, and Feature Maps. The former is a matrix where cells are filledwhen observed by the robot. A cell which maps to a position in realitythat does not contain an object is labelled as zero, whereas a cell whichmaps to an object will be labelled with a one. Doing this results in amap of zeros and ones, often displayed in white and black, as seen inFigure 2.4.

Page 29: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 17

Figure 2.4: An occupancy grid map.

Grid maps are often implemented with algorithms that rely heavilyon LiDAR and scan matching methods, as the laser scans can almostdirectly be inserted into the map.

Algorithms that rely on features in the environment often apply fea-ture map solutions. These maps are less intuitive, containing the po-sitions and uncertainties of the landmarks or features that have beendetected in the environment. An example can be seen in Figure 2.5.

Figure 2.5: A feature map.

Page 30: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

18 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

Features are saved as a middle point with an uncertainty, visuallydisplayed as an ellipse and often magnified for improved user experi-ence.

Scan Matching

Scan matching techniques follow the development of range findersand are especially effective with 360◦ solutions, as the larger scanningarea gives a higher amount of points to use in matching. Iterative Clos-est Point (ICP) is a commonly used algorithm for scan matching meth-ods. It aims to find the transformation between a laser scan and a gridmap section by minimising the square errors between the two. It is it-erative in the aspect that it will recalculate the error when it decreases,as such a decent initial guess is required for it to not be stuck in thefirst local minimum [39]. In practice, this means that a new scan can-not be too far from a previous scan, unless there is other informationthat will allow the scan matcher to identify the initial guess, such asreliable odometry.

2.2.4 Data Association

With a purpose similar to that of scan matching, data association iscommonly implemented with algorithms using feature maps. Uponreturning to an already traversed location and identifying previouslymapped landmarks, uncertainties regarding vehicle pose and mappedlandmarks can be reduced by associating new observations with al-ready measured landmarks. However, with the above uncertaintiesand sensor uncertainties it is not trivial to identify which mappedlandmark corresponds to the measured landmark at a given time, shownin Figure 2.6. If the uncertainties are too large there is a risk of asso-ciating an observation with the wrong landmark, something that maycause the algorithm to diverge.

Individual Compatibility Nearest Neighbour

A common and fairly straight-forward method of solving the data as-sociation problem can be found in the Nearest Neighbour (NN) fil-ter. Individual Compatibility Nearest Neighbour (ICNN) judges thecompatibility between an observation and a mapped landmark by the

Page 31: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 19

Figure 2.6: Within the robot pose uncertainty, the measurements canoften be associated in more than one way [23].

minimal Mahalanobis distance [40] and will commonly achieve an ac-curacy around 70% with millisecond computation times [41, 42]. TheIndividual Compatibility (IC) test evaluates the uncertainty distribu-tion of the landmark and measurement towards a threshold χ2 distri-bution value, used to determine the plausibility of a data association.

The low computational time requirement of NN comes from the factthat it is linear with the size of the map, since it will at the most per-form mn tests, where m is the number of map features and n is thenumber of measurements. Aside the computational requirements, ICNNis not considered sufficiently robust to be used in most applications,since obtaining a faulty association may result in the SLAM algorithmdiverging. However, if the cumulative error of measurements and es-timates is smaller than the distance between the features and that thenumber of spurious measurements is small, the ICNN algorithm canbe considered adequate to solve the data association problem [43].

Joint Compatibility Branch and Bound

The problem of spurious measurements being associated with mapfeatures can be limited by reconsidering the established associations[43]. The Joint Compatibility Branch and Bound (JCBB) algorithm con-siders all measurements in one update [42] by using the IC test for amatrix rather than a single value, thus expanding the ICNN conceptto cover all possible data associations across the measurement set [21].

Page 32: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

20 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

This expansion improves accuracy, though at the cost of computationtimes that are increased up to two orders of magnitude to that of ICNN[43]. Since the computation effort increases with the number of land-marks, it is often considered applicable only for smaller environmentswith fewer features.

2.3 State of the Art

A study of the State of the Art (SOTA) applications available as opensource with Robot Operating System (ROS) was made to identify rel-evant software. Here follows those that implement some of the algo-rithms presented in the theory.

2.3.1 Pose Graph Optimisation

Google Cartographer [44] was developed as a solution for mapping in-door environments in 2D and 3D using SLAM by mounting the hard-ware as a backpack to be carried by a person. Cartographer uses posegraph optimisation and a local to global map setup to allow mappingof very large areas in real-time. The LiDAR scan data is inserted into alocal submap and matched using a scan matching algorithm based onthe Ceres solver [45] built for solving complicated optimisation prob-lems. When the robot has moved out of the local submap area thissubmap is then sent to the global map for fixation and loop closure.This process is then repeated for each submap that is completed. Thissolution prevents the scan-to-scan accumulative error which will oc-cur when using a single map and scan matching. Cartographer is alsoable to make use of landmarks to improve loop closure, and uses IMUdata to stabilise the LiDAR scan data.

2.3.2 Rao-Blackwellized Particle Filter

As previously mentioned, the Rao-Blackwellized Particle Filter can betuned to be suitable as a SLAM solution. There are numerous applica-tions available, two of them are presented below.

Page 33: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 21

Mobile Robot Programming Toolkit

The Mobile Robot Programming Toolkit (MRPT) provides several dif-ferent configurations for both grid and feature maps, developed atthe University of Málaga since 2004 [46]. Other than SLAM solutions,MRPT contains a great number of solutions for various issues mobilerobots may face. An improved technique for implementation of par-ticle filters to achieve real-time performance taking into account theaccuracy of the robot’s sensors and applying an adaptive re-samplingtechnique that assists in maintaining a reasonable level of particle va-riety was suggested by [35]. This improved technique is implementedin the MRPT RBPF solution and also uses ICP scan matching to cre-ate grid maps, similar to Gmapping [47], which is also based on thesolutions presented by [35, 34] and discussed in further detail below.

Gmapping

Another RBPF implementation is Gmapping, developed by the au-thors of [35, 34], which uses LiDAR scan data and odometry to mapthe environment into a grid map. Gmapping also implements thesame adaptive re-sampling technique as MRPT’s RBPF, as well as anapproach to calculate an accurate proposal distribution improves per-formance in the prediction step, increasing overall performance [47].Loop closure is achieved by re-evaluating which is the most likelyparticle by matching scans of the area, the estimated position and themaps of all particles when the robot returns to a previously traversedarea.

2.3.3 Extended Kalman Filter

As one of the first and most fundamental solutions to the SLAM prob-lem, Extended Kalman Filters are still viable for feature map solu-tions when combined with accurate and efficient data association tech-niques.

Mobile Robot Programming Toolkit

The implementation of EKF in MRPT is not far from the theoreticalbackground presented in Section 2.2.1 and allows for data associationusing ICNN and JCBB, based on either Maximum Likelihood [48] or

Page 34: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

22 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

Mahalanobis distance [49]. Important for feature based SLAM tech-niques is the accurate evaluation of sensor uncertainties, as these areused for data association, thus the ability to adjust this is also included.

Hector SLAM

Developed at the Darmstadt University of Technology for an autonomoussearch and rescue robot competition series [50], Hector SLAM is alightweight grid mapping SLAM algorithm using only LiDAR scandata. Since it requires no odometry or other input than the LiDARscans, it can be easily configured and deployed on lightweight plat-forms. The algorithm is based on EKF for the state estimation and usesa scan matching method derived from a Gauss-Newton approach orig-inating from computer vision [51]. With low computational require-ments and simple configuration, it is a modern and viable method forrunning primarily indoor SLAM systems.

2.4 Related Work

This section presents work related to the topic of SLAM under thisthesis, as well as some work regarding the demonstrator.

2.4.1 A Platform for Indoor Localisation, Mapping, andData Collection using an Autonomous Vehicle

State of the art open source SLAM applications were evaluated by [22]as a part of a master thesis project to build an indoor localisation de-vice. Algorithms were evaluated on loop closure, map accuracy, per-formance and trajectory using the same input data. While Gmappingand Cartographer had similar performance, the reliability of Cartogra-pher made it the recommended algorithm. Little discussion was maderegarding the nature of the SLAM algorithms used in the applications,though the performance results are a good pointer towards which ap-plications to investigate.

2.4.2 MIT and Hypha Racecar

Scale autonomous race cars have been implemented by [52] and [53]as part of university learning projects. RC cars are rebuilt to house

Page 35: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 23

sensors and embedded processing units, and run SLAM algorithms tonavigate through unknown indoor environments. Both projects focuson accurate navigation and high movement speeds, which makes theavailable documentation valuable to this project, though the mappingprocess is rarely discussed. Further, the open source code providedhas proved useful when creating a demonstrator similar to the RC racecars used in mentioned projects.

2.4.3 A Comparison of Data Association Techniquesfor Simultaneous Localization and Mapping

A master thesis was written on the subject of comparing data asso-ciation methods [21] with SLAM. EKF and FastSLAM (an RBPF im-plementation) were used for each data association algorithm, whereICNN and JCBB were two of the algorithms evaluated using tests madein simulation and real world data sets. It was shown that JCBB pro-vides greater accuracy at the cost of longer computation times, andwhile ICNN results in lower accuracy and shorter computation times,ICNN can perform almost equally to JCBB in areas with sparse featuredensity since the risk of making a faulty association will be reducedwhen features are further apart. The number of spurious measure-ments also affects ICNN to a greater extent than JCBB, where a highernumber will result in lower accuracy and the risk of forcing the SLAMalgorithm to diverge due to erring associations. This work gave a goodidea of which data association methods to investigate, and what to ex-pect of them.

2.5 Implementation

The following SLAM algorithms were selected for comparison due totheir inherently different characteristics. The algorithms were imple-mented in ROS using combined parts of various open source solutions,the main parts presented in Section 2.3. These solutions were strippedand adjusted to suit the application at hand as described below.

2.5.1 Pose Graph Optimisation

Pose Graph Optimisation was implemented with grid maps by fus-ing laser scans and ICP based scan matching, IMU data and odome-

Page 36: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

24 CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING

try. It also incorporated landmark support for loop closure to furtherimprove performance. As previously mentioned for grid mapping ingeneral, the matching of map resolution to LiDAR resolution is a sen-sitive point for optimal performance, together with the scan matchingupdate frequency to laser scan frequency. Further, the algorithm wastuned by adjusting the parameters regarding local update frequency,weighting towards adjusting previous map insertions and the scorerequired to update the map. The scan matching was tuned by adjust-ing the weights for translation and rotation, to determine whether theodometry readings or scan matching result should be trusted. The pa-rameters for PGO were chosen as presented in Table 2.1. The scores area measure for how certain the scan matching needs to be to be insertedinto the map. Map resolution was set to 5cm, which mirrors the uncer-tainty of the LiDAR at ~4m range, which was evaluated to be slightlyhigher than that specified by the data sheet [54].

Table 2.1: PGO parameters

Parameter Value Unit

Map resolution 0.05 mMin. Score 0.65 -Update frequency 30 HzSM Trans. weight 0.1 -SM Rot. weight 0.2 -

2.5.2 Rao-Blackwellized Particle Filter

The Particle Filter solution was implemented with a Rao-BlackwellizedParticle Filter with grid mapping using laser scans, ICP scan matchingand odometry. As with the PGO it was extended to make use of land-marks for improved loop closure. Additionally, an adaptive methodfor adjusting the number of particles discussed in Section 2.2.1 wasused to improve performance. The parameters were chosen as pre-sented in Table 2.2. Linear distance and angular thresholds determinethe maximum robot movement allowed before the map is updated.The min/max particle number is used in the adaptive re-samplingmethod.

Page 37: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 2. SIMULTANEOUS LOCALISATION AND MAPPING 25

Table 2.2: RBPF parameters

Parameter Value Unit

Map resolution 0.05 mMin. Score 54 -Linear Dist. th. 0.02 mLinear Ang. th. 1 deg

Min Particles 30 -Max Particles 150 -

2.5.3 Extended Kalman Filter

The Extended Kalman Filter algorithm was implemented as a lightweightsolution to the Online SLAM problem. The EKF solution was imple-mented with a feature map, a cone detection algorithm using laserscans, and odometry. EKF solutions rely heavily on the data associa-tion methods used, as such a comparison between Individual Compat-ibility Nearest Neighbour and Joint Compatibility Branch and Boundwas made, splitting the EKF results into two. The EKF data associa-tion was implemented with the IC χ2 and Mahalanobis IC thresholdsshown in Table 2.3.

Table 2.3: Extended Kalman Filter parameters

Parameter Value Unit

IC χ2 th. 0.95 -Maha IC th. 0.90 -

Page 38: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 39: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 3

Object Recognition

This chapter discusses the object recognition problem and presents a numberof solutions with implementations.

3.1 Computer Vision and Object RecognitionOverview

Machine vision has been a research topic which has grown more andmore in late years and is now heavily researched. One of the first pa-pers written on machine vision in 1963 is the PhD thesis Machine per-ception of three-dimensional solids [55] which describes a computer pro-gram that is able to transform a photograph to a line drawing and thenfinally into a 3D representation. In the 1990’s there were a lot of dif-ferent approaches to object recognition such as colour histograms [56],eigenfaces [57] and feature-based methods which were thoroughly re-searched. The area was then continuously expanded in the 2000’s asparts-and-shape models were introduced as well as bags of words. Itwas during this time researchers started utilising machine learning incomputer vision and deep learning models such as Restricted Boltz-mann Machines [58] and Deep Belief Networks [59] were created.

3.1.1 Deep Learning Era

The springboard to where deep learning currently is in computer vi-sion started in 2012 when the first Convolutional Neural Network re-duced the error rate by almost half in the ILSVRC competition, an ob-ject recognition challenge [60], beating all the traditional algorithms

27

Page 40: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

28 CHAPTER 3. OBJECT RECOGNITION

[61]. Since then the deep learning algorithms started being used moreand more in computer vision resulting in other well performing imagerecognition nets such as VGG Net, GoogleNet and Microsoft ResNet,which all used different variants of architectures for improving the ac-curacy and performance [62].

3.1.2 Convolutional Neural Networks

The first implementation of convolutional neural networks (CNNs) incomputer vision was in the late 1990’s [63] but gained traction in thecomputer vision field in 2012 after the ILSVRC competition. What dif-ferentiates a CNN from an ordinary Neural Network is the assumptionthat the input is formatted as an image e.g height, width and numberof channels, and therefore being able to use certain properties withinthe network. For larger images e.g 200x200x3 the weights for each neu-ron in an ordinary neural net would be 120 000 different weights. Witha CNN this is not the case as only a small region is connected to eachneuron [62] when using convolutional layers.

The convolutional layers consist of learnable filters, also called ker-nels. These filters have a width, height and depth which slides over theinput using the stride as step length. For each step across the inputconvolutions are done with the input and the kernel resulting in a sin-gle number. An example of a convolutional layer in 2D, meaning theKernel only has a width and height, can be seen in Figure 3.1.

0 0 11 1 00 1 0

1

0 11 10 1

1 1 0 1

00 11 0 1

01

01 1 0 1

2 13 42 3 2

241

KernelInput Result

Figure 3.1: Convolutional layer.

As Figure 3.1 shows, the result is smaller than the input. To get thesame size output as the input zero padding can be used. This means theinput matrix is padded with zeroes and therefore making the outputlarger. An example of this can be seen in Figure 3.2.

Page 41: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 3. OBJECT RECOGNITION 29

0 0 11 1 00 1 0

1

0 11 10 1

1 1 0 1

00 11 0 1

01

01 1 0 1

2 22 33 2 3

41

1

KernelInput Result

0 000

0

224

2 2 2 2231

1

1 1 3 1 2

Figure 3.2: Example of zero padding.

In CNNs down sampling layers are usually used. This means theinput to the layer is down sampled into a smaller size, reducing thespatial size. This is done in a similar way to the convolutional lay-ers except the mathematical operation for each stride is different. Oneway to do this by using max pooling, where each subregion is pro-cessed with a max operation, outputting the maximum number in thatsubregion [62]. An example of this can be seen in Figure 3.3 where astride of 2 is used.

2 13 42 3 2

24

151

1 3 4 5

43 55

Input Output

Figure 3.3: Example of max pool layer.

A CNN can be built in many different wasy and there a lot of differ-ent layers which can be used. The layers mostly used are the following[62]

• Input - Raw input data to the CNN, in the case of image pro-cessing it will consist of width, height and the number of colourchannels.

• Convolutional layers - The convolutional layers use filters thatare convoluted with the image. These convolutional layers candetect different features such as edges or curves and use fourparameters which does not change during training, also knownas hyperparameters: number of filters, size of the filters, strideand amount of zero padding.

Page 42: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

30 CHAPTER 3. OBJECT RECOGNITION

• Rectified linear unit layers - The ReLU layers apply an activationfunction to each element, e.g set all negative elements to zero.

• Pooling layers - The pooling layers perform down sampling forthe width and height with several different types of down sam-pling where max pooling is the most popular.

• Fully connected layers - The fully connected layers are like nor-mal layers used in neural networks where each node is connectedto all the nodes on the previous layer. These layers are mostlyused in the end of the network for classification.

3.2 State of the Art

In this section some of the most recent and popular algorithms in ob-ject recognition and object detection are presented.

3.2.1 R-CNN and Fast R-CNN

R-CNN is based on a Region-based Convolutional Network whichcombines the following insights according to [64]:

• "One can apply high-capacity CNNs to bottom-up region pro-posals in order to localize and segment objects"

• "When labeled training data is scarce, supervised pre-training foran auxiliary task, followed by domain-specific fine-tuning, yieldsa significant performance boost"

First R-CNN uses a process called selective search to find region pro-posals which are then run through a CNN for feature extraction. Thefeatures are then classified using a support vector machine, SVM. Thebasic idea of R-CNN can be seen in Figure 3.4.

The R-CNN achieves good accuracy but it suffers from several draw-backs. One is the multi-stage training process requiring separate train-ing for the CNN and SVM. R-CNN is also expensive in both timeand space, which results in a slow object detection reaching 0.022 fps[65]. Fast R-CNN improves R-CNN with implementing single-stagetraining, higher mean average precision, mAP, and runs the detectionfaster, 3.3 fps, excluding the time for object proposal. This is done byutilising shared convolutions between proposals essentially meaningthe CNN only has to run once on every image [65].

Page 43: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 3. OBJECT RECOGNITION 31

Figure 3.4: R-CNN overview.

3.2.2 YOLO: You only look once

YOLO differs from R-CNN in the case that it is not a traditional imageclassifier repurposed for object detection. YOLO uses a single neu-ral net both for localisation of boundary boxes and recognition. Thismeans that the whole network can be evaluated and trained for detec-tion performance at once. The network divides the image into a totalof S × S grid cells where each cell is responsible for detection if an ob-jects center is within that cell. B number of boundary boxes for eachcell are then predicted and scored. In the case of the VOC 2007 dataset, where 20 different classes in a scene should be recognised [66],the parameters were set to S = 7 and B = 2 resulting in a total of 98boundary boxes to be scored. The network reached up to 45 fps with63.4% mAP on the VOC 2007 data set and FAST YOLO reached up to155 fps with 52.7% mAP [67]. Since the first version of YOLO therehave been updates with YOLO v2/YOLO900 [68], which introducedthe possibility of accuracy and fps tradeoff, and later on YOLO v3 [69]which mainly focused on increasing the accuracy of the network.

3.2.3 SSD: Single Shot MultiBox Detector

SSD is a method for detecting objects using a single deep neural netrelying on features from multiple layers for classification. With SSDa high accuracy can be reached, even on images with low resolution,and a simple end-to-end training. The differences between the SSDand YOLO architecture can be seen in Figure 3.5. The SSD algorithmmanages a fast high accuracy detection, 59 fps with mAP 74.3% onVOC2007 [70] test data where Faster R-CNN performs with 7 fps. As

Page 44: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

32 CHAPTER 3. OBJECT RECOGNITION

SSD performs very well on large objects but suffers on detection onsmall objects, there is still room for improvements [71].

300

300

3

VGG-16 through Conv5_3 layer

19

19

Conv7(FC7)

1024

10

10

Conv8_2

512

5

5

Conv9_2

256

3

Conv10_2

256 256

38

38

Conv4_3

3

1

Image

Conv: 1x1x1024 Conv: 1x1x256Conv: 3x3x512-s2

Conv: 1x1x128Conv: 3x3x256-s2

Conv: 1x1x128Conv: 3x3x256-s1

Dete

ctio

ns:

8732 per C

lass

Class ifier : Conv: 3x3x(4x(Classes+4))

512

448

448

3

Image

7

7

1024

7

7

30

Fully Connected

YOLO Customized Architecture

Non-M

axi

mum

Suppre

ssio

n

Fully Connected

Non-M

axi

mum

Suppre

ssio

n

Dete

ctio

ns:

98 p

er cl

ass

Conv11_2

74.3mAP59FPS

63.4mAP45FPS

Class ifier : Conv: 3x3x(6x(Classes+4))

19

19

Conv6(FC6)

1024

Conv: 3x3x1024

SS

DYO

LO

Extra Feature Layers

Conv: 1x1x128Conv: 3x3x256-s1

Conv: 3x3x(4x(Classes+4))

Figure 3.5: SSD and YOLO architecture [71].

3.2.4 MobileNet

MobileNet is a set of object recognition architectures created to beable to run computer vision applications on embedded and mobiledevices. The MobileNet architecture is built to first and foremost op-timise against latency but also size, to be able to fit on the limited re-sources available. This is done by creating light weight deep neuralnetworks, DNNs, utilising depthwise seperable convolutions whichessentially means the computations can be drastically reduced as wellas the model size. This is done by seperating the convolutions into twosteps, convolutions for each input channel and then combining themwith pointwise convolutions, see Figure 3.6. MobileNet also introducea parameter called width multiplier, α, to easily change the input chan-nel size uniformly and reduce or increase resources needed [72]. Thetotal amount of layers are 28 when counting pointwise and depthwiselayers as seperate.

Page 45: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 3. OBJECT RECOGNITION 33

3x3 Depthwise Conv

BN

1x1 Conv

BN

ReLU

ReLU

3x3 Conv

BN

ReLU

Figure 3.6: Left: Normal convolutions with batchnorm and ReLU,Right: Depthwise seperable convolutions with batchnorm and ReLU[72]

3.3 Related Work

There is previously conducted work related to the detection of conesusing computer vision. Here follows two projects with different ap-proaches to the problem of identifying cones.

3.3.1 Real-time Traffic Cone Detection for AutonomousVehicle

In [73] radar data was used to focus the camera and find interesting im-age patches that correspond to potential cones. First chamfer matchingwas used for the shape detection on the image patches. The chamfermatching used a distance image which was created by a binary imageof the shape of a cone. This distance image was then matched to thetemplate of the cone using rotation, translation and re-sizing. Cannyedge was used for detecting edges, even though more accurate algo-rithms exist these were not used because of the real-time performancerequirement. To choose the threshold for classification a data set of 400positive and 600 negative samples were used to create a probabilitydensity function, PDF, to get an accurate threshold. To deal with falsepositives when only using chamfer matching a simple colour segmen-tation was then used to check if the pixels correspond to those createdwith the PDF [73].

Page 46: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

34 CHAPTER 3. OBJECT RECOGNITION

3.3.2 Cone Detection: Using a Combination of LiDARand Vision-Based Machine Learning

A combination of 3D LiDAR and a vision system based on small CNNto detect cones with an input size of 128×128 was used in [74]. The Li-DAR data was filtered by looking at the intensities, where cones showa high intensity due to the high reflective surface. The scan data varalso filtered so that only cones infront of the car was considered. Thepotential cone candidates from the LiDAR data were then transformedinto image patches which were then classified by an CNN which hadbeen trained on over 24000 cone images. The CNN reached a total of95.4% accuracy [74].

3.4 Implementation

The cone detection was implemented by using an image classifier ofimage patches instead of object localisation on a full frame, similar tothe method used in [74] and [73]. This was done to reduce the compu-tation time and complexity for the cone detection to be able to run it inreal-time on an embedded system. The flow chart of the full cone clas-sification algorithm can be seen in Figure 3.7 and is explained below.

3.4.1 Cone Candidate Extraction

The image patches used for cone classification were acquired by utilis-ing the laser scan data which was processed to identify possible coneobjects. The process consisted of transforming the laser scan into apoint cloud residing in the 2D euclidean space. This was done to beable to use euclidean clustering to extract objects with size similar to acone. The extracted clusters were then projected into the rectified cam-era frame by utilising a pinhole camera model and the camera matrix[75].

3.4.2 Transfer Learning

The first cone detection algorithm was implemented using transferlearning on the existing image classifier type called MobileNets [72]mentioned in Section 3.2.4. By utilising transfer learning a model that

Page 47: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 3. OBJECT RECOGNITION 35

Laser scan data

Camera image frame

Laser scan to

point cloud

Euclidean clustering

Transform to image frame

Image patch classification

Classified cones in 2D euclidean

space

Figure 3.7: Flow chart of cone classification algorithm

has been trained on e.g ImageNet which consists of 1.2 milion imagesover 1000 categories can be used to train to recognise new categoriesin an effecient way. This means the full architecture does not need tolearn to find features in images but instead the final layers can be re-trained to classify a new class of choice [62]. The transfer learning wasimplemented using the Tensorflow machine learning framework [76]on four different MobileNets. The input image was resized to a squareimage of either 128×128 pixels or 224×224 pixels depending on whichnet was used. For the training process cross-entropy loss was used,evaluating the error of the network for each training iteration. TheMobileNet classifier was chosen for its relatively high accuracy andspeed suitable for embedded systems.

3.4.3 CNN

The second cone detection algorithm was implemented as a small CNN.As cones are objects with few features when compared with more fea-ture complex objects such as faces or animals, and a light weight net-work was required, a minimalistic network architecture was chosen.Each convolutional layer was followed by a max pool and a ReLU ac-

Page 48: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

36 CHAPTER 3. OBJECT RECOGNITION

tivation function and classified by a fully connected layer. The inputimage was resized to a square image as mentioned in Section 3.4.2, butonly to the size of 128× 128 pixels, as well cross-entropy loss was usedfor training. The small CNN architecture was chosen for fast perfor-mance and the hypothesis that few layers are enough for classificationof cones.

A small hyperparameter search was conducted by varying the num-ber of convolutional layers, fully connected layers and learning rate.

3.4.4 Training Data

The training data was gathered using the demonstrator, described inSection 4.1, saving the cone candidate image patches from the LiDARand camera. This was done both in an indoor office environmentwith artificial lighting and outdoor environment with natural lighting.The image patches gathered from the demonstrator were also comple-mented by taking images of cones using a camera from a SamsungGalaxy S7 from different views, scale, lighting, scenery and both withand without occlusions in an outdoor environment. All the imageswere then labelled into the following classes blue cone, orange cone, yel-low cone and no cone. Examples of training data can be seen in theFigures 3.8, 3.9 and 3.10.

Figure 3.8: Indoor im-age patch from LiDARcandidate

Figure 3.9: Outdooron grass taken withsmartphone camera

Figure 3.10: Outdooron asphalt taken withsmartphone camera

For the training data 120 images were gathered for each class andeach data set was then augmented using Augmentor [77] to increasethe amount of data. The images were augmented by flipping them

Page 49: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 3. OBJECT RECOGNITION 37

vertically, varying contrast, varying brightness and introducing dis-tortions, increasing the data size to 480 for each class.

Page 50: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 51: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 4

Demonstrator

This chapter presents the hardware and software solutions used to constructa demonstrator for testing the previously presented implementations.

4.1 Demonstrator

The hardware demonstrator was based on the components used inother small scale autonomous vehicles such as the MIT Racecar [52]and F1/10 [78], as well as JetsonHacks that have written tutorials forthe Jetson boards and the MIT racecar [79]. The main components ofthe demonstrator can be seen in the Table 4.1 and only differ slightlyto the aforementioned platforms.

Table 4.1: Main parts for the demonstrator

Type ProductRC Car Himoto Tanto BL 1:10Embeddedsystem

Nvidia JetsonTX2 Dev kit

Stereo camera Stereolabs ZEDLiDAR RPLidar A2M8

IMUSparkfunMPU-9250 9-DoF

Motor driver vESC 4.12

The demonstrator was in short an RC car with an overhead con-struction housing the components. Platforms were laser cut from 4mm

39

Page 52: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

40 CHAPTER 4. DEMONSTRATOR

acrylic sheets. The Jetson TX2 development board handled communi-cation with all devices and ran all software. The hardware communi-cation can be seen in the flow chart in Figure 4.1.

The positioning of the sensors was made to resemble the ones usedon the MIT Racecar and F1/10. This created a compact car with sen-sor aligned in the vertical plane making the transformations betweensensors simpler and placing the centre of gravity as low as possible toallow for higher speed manoeuvres.

Jetson TX2

RPLiDAR

ZED StereocameraVESC

BLDC MPU-9250

Powerbank

3.3V I2C

19V

Battery11.1V

USB Hub USB 3.0USB 2.0

USB

Servo

PWM5V

USB

Figure 4.1: Hardware flow chart.

The RC car equipped a brushless DC motor which allowed sensor-less odometry by measuring the back EMF [80], achieved using theVedder Electronic Speed Controller (vESC) [81] which also controlsboth motor and steering servo. Movement was also measured usingan Inertial Measurement Unit (IMU), the nine degree of freedom MPU-9250 [82], where the accelerometer and gyroscope were used. Visualdata was obtained using a Stereolabs ZED stereo camera [83] and theRPLidar A2 from Slamtec [84] provided 360° 2D laser scan data. Allsensors were sampled at 15Hz to match the upper limit of the LiDAR.The final demonstrator can be seen in Figure 4.2.

Page 53: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 4. DEMONSTRATOR 41

Figure 4.2: The Demonstrator.

4.2 Software Architecture

The software architecture was constructed as shown in Figure 4.3. Thedotted lines represent data streams which were used in different SLAMalgorithms.

Page 54: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

42 CHAPTER 4. DEMONSTRATOR

Laser scan data

Camera image frame

Laser scan to

point cloud

Euclidean clustering

Transform to image frame

Image patch classification

Classified cones in 2D euclidean

space

SLAM

ZED SDK

Odometry

IMU data

Figure 4.3: Software block diagram.

All implementations were made in ROS, which is presented below,to allow for rapid prototyping and good scaling and modularity. Ex-ceptions to this were the implementations made in Tensorflow.

4.2.1 Robot Operating System

The software implementation is based on ROS, a publisher-subscriberstyle environment which allows for simple construction of otherwisecomplex systems [85]. ROS has a large open source community thatopens up for fairly simple and rapid software prototyping.

A ROS system is built up by nodes which communicate using a publisher-subscriber model [86] where different parts of the system, nodes, shareinformation through messages. This allows a transparent system wherenodes require no knowledge of each other as long as the messages fol-low a standard. Further, nodes can be written in different program-ming languages depending on what is preferable for the application.This loose coupling between nodes improves robustness as each node

Page 55: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 4. DEMONSTRATOR 43

can be run in a stand alone state. Further, as long as the message for-mat is maintained, the workings of each node can be freely modified,as such modularity is greatly favoured by the publisher-subscribermodel. Nodes can also be decoupled completely and run on separatesystems, communicating with messages over for example Wi-Fi.

Transforms

A robot often consists of a number of sensors, actuators and a compu-tation unit. These are seldom placed in the exact same position andwill thus have differing coordinate systems. A range finder may beplaced at the top of the robot while an IMU would sit in the rotationalcentre. The ROS community have created a standard for these dif-ferent coordinate frames [87]. Transforms between statically mountedsensors are fairly simple, the difference in position and orientation isall that is required, the rest is solved by a ROS package [88]. The trans-form system can be set up as presented in Figure 4.4.

Figure 4.4: Transform setup [89].

In the 2D situation, base_footprint can be removed and odom willconnect directly to base_link. This connection is dynamic and is de-fined as the change in position and orientation for every update. Aspreviously stated, odometry is continuous but will eventually drift,which is the point of using SLAM. The transformation between mapand odom is derived from the transformation between map and base_link,

Page 56: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

44 CHAPTER 4. DEMONSTRATOR

which is dependent on the algorithm. Data association, loop closingand scan matching techniques will adjust the map and the base_linkaccordingly, resulting in non-continuous behaviour, which is pairedwith the continuous odometry.

Page 57: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 5

Results

This chapter contains the test methods and results of the previously presentedimplementations.

5.1 SLAM

SLAM performance was evaluated on map and pose accuracy as wellas latency. The demonstrator was driven around a track lined by yel-low and blue cones with larger orange cones showing the start and fin-ish point. The path traversed by the robot and the position of the coneswas logged as ground truth by marking the position of the LiDAR onthe surface of the track, as the LiDAR is the origin in the robot coordi-nate system. Sensor data was recorded using Rosbag to be played backusing the various algorithms, for fair comparison. Several laps werecompleted to validate the recording used for the below presented re-sults. The total track length was 32.64m, lined with 18 blue, 13 yellowand 2 orange cones. The precision of the ground truth was evaluatedto 1mm, obtained using aerial imaging.

5.1.1 Accuracy

Upon a completed lap the traversed path provided by the SLAM algo-rithm was compared to the ground truth, resulting in the below visiblemaps. The black line and black square markers are ground truth, theorange are the SLAM results.

The result from the PGO algorithm can be seen in Figure 5.1.

45

Page 58: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

46 CHAPTER 5. RESULTS

Figure 5.1: PGO overlaid to the ground truth. The orange line is thealgorithm, black is ground truth.

The largest pose error seen in Figure 5.1 was 108mm, with the re-sulting mean error at 29mm. All cones were identified by the algo-rithm, with the largest cone error being 162mm and the mean error50mm.

Further to the Online SLAM solutions, the result from the RBPF basedalgorithm can be seen in Figure 5.2. The path shown is an average ofthe best performing particles.

Figure 5.2: RBPF overlaid to the ground truth.

Page 59: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 5. RESULTS 47

The maximum pose error was 132mm with a mean of 55mm. Forthe cone positions the maximum was at 101mm and the mean was34mm. Two cones were however missed by the algorithm. Finally, anEKF solution using only features was tested, as seen in Figures 5.3 and5.4.

Figure 5.3: EKF JCBB overlaid to the ground truth.

Figure 5.4: EKF ICNN overlaid to the ground truth.

Divided in to two tests for different data association methods, theEKF using ICNN resulted in a maximum pose error of 786mm and a

Page 60: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

48 CHAPTER 5. RESULTS

mean error of 197mm. Cone position error peaked at 86mm with amean of 39mm. For the EKF using JCBB, the resulting largest pose er-ror was 455mm and a mean of 136mm, with cone positions erring upto 115mm with an average of 51mm. JCBB missed one cone.

The above presented results are summarised in Table 5.1.

Table 5.1: SLAM Error of a finished run.

Error Pose avg Pose max Cone avg Cone max Unit

PGO 29 108 50 162 mmRBPF 55 132 34 101 mmEKF JCBB 136 455 51 115 mmEKF ICNN 197 786 39 86 mm

Further, by tracking the pose in relation to nearby landmarks inreal-time rather than after a completed run, the data in Table 5.2 wasobtained.

Table 5.2: SLAM Error in real time.

RT Error Pose avg Pose max Unit

PGO 27 101 mmRBPF 36 124 mmEKF JCBB 29 84 mmEKF ICNN 20 61 mm

The algorithms were also evaluated in pose and cone accuracy bymeasuring the Root Mean Square Error (RMSE), as this is often usedas a standard evaluation tool for SLAM algorithms [90]. The resultsare presented in Table 5.3.

Page 61: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 5. RESULTS 49

Table 5.3: RMSE of SLAM algorithms.

RMSE Pose RT Pose Cone Unit

PGO 36 34 62 mmRBPF 61 38 39 mmEKF JCBB 178 37 59 mmEKF ICNN 252 25 42 mm

5.1.2 Latency

Algorithm latency was evaluated from sensor data callback to mapcompletion and is presented below as an average of all map creationsover the test run. Included in Table 5.4 are the times for creating themap alone, as well as complete lead time from taking a measurementto completing the map, with all steps in between included. The LiDARcone identification is included for all algorithms, but in the EKF case,the algorithm must wait for this to be completed while the other twoalgorithms can run the landmark insertion in parallel.

Table 5.4: SLAM Latency.

Latency Map avg Map max Total avg Total max Unit

PGO 42.14 95.77 42.14 95.77 msRBPF 70.94 172.61 70.94 172.61 msEKF JCBB 2.13 9.71 32.44 64.12 msEKF ICNN 1.59 4.64 32.00 59.05 ms

5.2 Object Recognition

The object recognition performance was evaluated with respect to twodifferent aspects, accuracy and latency. The validation set consisted of20% of the gathered labelled data and the remaining 80% was used fortraining. Another validation set was created from the ground truth lapmentioned above. The image patches were gathered using the LiDARcandidate extraction and then manually labelled. The data distribu-tion for the ground truth lap can be seen in Table 5.5

Page 62: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

50 CHAPTER 5. RESULTS

Table 5.5: Ground truth lap validation data set distribution.

Blue Yellow Orange No cone Total images

217 137 34 258 646

The training accuracy and cross entropy loss for the MobileNettransfer learning can be seen in Figure 5.5, 5.6, 5.7 and 5.8. The figuresare labelled with a number and a percentage. The first number in e.g128 and 224, corresponds to the input image size and the percentagecorresponds to the width multiplier α.

1,000 2,000 3,000 4,000

0.2

0.4

0.6

0.8

1

1.2

step

cros

sen

trop

ylo

ss

Training lossValidation loss

1,000 2,000 3,000 4,000

0.7

0.8

0.9

1

step

Acc

urac

y[%

]

Training accuracyValidation accuracy

Figure 5.5: Training/Validation accuracy and cross entropy loss forMobileNet 128 100%.

The hyper parameter search for the CNN was done by varying thenumber of convolutional layers, fully connected layers and learningrates. In Table 5.6 the validation accuracy and cross entropy loss canbe seen from the hyperparameter search.

Page 63: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 5. RESULTS 51

1,000 2,000 3,000 4,000

0.2

0.4

0.6

0.8

1

1.2

step

cros

sen

trop

ylo

ss

Training lossValidation loss

1,000 2,000 3,000 4,000

0.4

0.6

0.8

1

step

Acc

urac

y[%

]

Training accuracyValidation accuracy

Figure 5.6: Training/Validation accuracy and cross entropy loss forMobileNet 224 100%.

1,000 2,000 3,000 4,000

0.2

0.4

0.6

0.8

1

1.2

step

cros

sen

trop

ylo

ss

Training lossValidation loss

1,000 2,000 3,000 4,000

0.4

0.6

0.8

1

step

Acc

urac

y[%

]

Training accuracyValidation accuracy

Figure 5.7: Training/Validation accuracy and cross entropy loss forMobileNet 128 50%.

Page 64: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

52 CHAPTER 5. RESULTS

1,000 2,000 3,000 4,000

0.2

0.4

0.6

0.8

1

1.2

step

cros

sen

trop

ylo

ssTraining loss

Validation loss

1,000 2,000 3,000 4,000

0.7

0.8

0.9

1

step

Acc

urac

y[%

]

Training accuracyValidation accuracy

Figure 5.8: Training/Validation accuracy and cross entropy loss forMobileNet 224 50%.

Table 5.6: Hyper parameter search results.

Learningrate

Number of con-volutional layers

Number of fullyconnected layers

Accuracy[%]

Cross en-tropy loss

1E-3 2 1 31.25 1.3861E-4 2 1 62.5 0.78051E-5 2 1 50 1.0051E-3 2 2 100 0.01861E-4 2 2 100 8.853e-31E-5 2 2 93.75 0.23861E-3 3 1 37.5 1.3861E-4 3 1 50 0.69641E-5 3 1 68.75 0.82761E-3 3 2 100 1.932e-41E-4 3 2 100 3.383e-41E-5 3 2 93.75 0.14461E-3 4 1 18.75 1.3861E-4 4 1 31.25 1.3861E-5 4 1 37.50 0.96261E-3 4 2 100 7.451e-81E-4 4 2 93.75 0.36621E-5 4 2 93.75 0.2801

In Table 5.7 the final training, validation and cone track accuracycan be seen as well as the average classification time. The CNNs withlearning rates which reached the highest ground truth lap accuracy foreach set of convolutional and fully connected layers were included in

Page 65: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 5. RESULTS 53

the table. The CNNs are labelled by the learning rate, lr, the number ofconvolutional layers, conv, and the number of fully connected layers,fc.

Table 5.7: Accuracy and classification time for the different architec-tures.

Architecture Validationaccuracy [%]

Ground truthlap accuracy [%]

Averageclassificationtime[ms]

MobileNet 128/100 98 39.9 55.9MobileNet 224/100 96 47.2 61.5MobileNet 128/50 98 43.4 55.4MobileNet 224/50 100 41.1 59.5lr1e-4, conv=2,fc=1 62.5 37.4 5.1lr1e-4, conv=2,fc=2 100 48.6 5.2lr1e-4, conv=3,fc=1 50 46.9 4.3lr1e-4, conv=3,fc=2 100 75.6 5.3lr1e-5, conv=4,fc=1 37.5 47.8 4.3lr1e-3, conv=4,fc=2 100 87.3 4.6

A confusion matrix for the ground truth lap for the CNN with fourconvolutional layers and two fully ocnnected layers can be seen in Ta-ble 5.8.

Table 5.8: Confusion matrix of the CNN with 4 convolutional layersand 2 fully connected layers.

Predicted class

Actual class

Blue Yellow Orange No coneBlue 194 0 9 14Yellow 0 137 0 0Orange 0 0 34 0No cone 17 5 37 199

Page 66: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

54 CHAPTER 5. RESULTS

5.3 Combined Performance

The best performing algorithms from object recognition and SLAMwere selected a test of combined performance with results visible inFigure 5.9.

Figure 5.9: Classified EKF ICNN overlaid to the ground truth.

The average real-time pose error was 19mm with a peak of 46mm,while the cones erred with an average of 51mm and a maximum of242mm. Table 5.9 shows a comparison to the EKF without classifica-tion of objects. The error here is the combined accuracy of the SLAMalgorithm and the object recognition.

Table 5.9: Real time SLAM error data with classified features.

RT Error Pose avg Pose max Cone avg Cone max Unit

Classified EKF ICNN 19 46 51 242 mmRegular EKF ICNN 20 61 39 86 mm

For comparison, RMSE was evaluated as well. The result can beseen in Table 5.10.

Page 67: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 5. RESULTS 55

Table 5.10: RMSE of SLAM data with classified features.

RMSE Pose Cone Unit

Classified EKF ICNN 22 73 mmRegular EKF ICNN 25 42 mm

Further, the latencies of the combined algorithms were measuredand is presented with a comparison to the non-classified case in Table5.11

Table 5.11: EKF SLAM Latency with classified features.

Latency Map avg Map max Total avg Total max Unit

Classified EKF ICNN 1.53 4.20 36.54 63.71 msRegular EKF ICNN 1.59 4.64 32.00 59.05 ms

Page 68: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 69: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 6

Discussion and Conclusions

This chapter contains a summary of the results followed by an answer to theresearch questions. The results are discussed and conclusions are drawn.

6.1 SLAM

The results show that while PGO provides a very accurate map in re-gard to trajectory and cone position after a completed lap, the real-timepose when running EKF with ICNN is even more accurate. Further,the EKF based SLAM results in the map with the highest accuracy, de-spite the lowest execution times. The RBPF based solution took a mid-dle ground together with the JCBB based EKF solution, with the high-est latency seen for RBPF. RBPF was also the only algorithm sloweron average than the fastest sensor. For the very specific scenario ofrunning SLAM on an outdoor cone track, EKF SLAM with ICNN dataassociation showed the greatest promise. To answer the question ofhow the choice of SLAM algorithm will affect performance in this rac-ing scenario, a simple and lightweight solution such as the EKF willallow for higher localisation accuracy despite lower latency.

The large errors seen in the EKF pose results in Figures 5.4 and 5.3can easily be explained due to the fact of them being Online SLAM so-lutions, which means that these will not update previous poses at anypoint. When a large loop closure is completed, the pose of the robotwill suddenly change to better match that of the new map, but previ-ous poses will remain in the "previous" map. The reason for RBPF notshowing this behaviour is that the pose trajectory shown in Figure 5.2

57

Page 70: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

58 CHAPTER 6. DISCUSSION AND CONCLUSIONS

was an average of the best performing particles, eliminating the largerloop closures.

Further on the subject of the online versus full SLAM approaches,while it may be interesting to see the path the vehicle took for ana-lytic purposes, it is not required to achieve racing performance in theFormula Student competitions. Only the map and current position isrequired, hence online and full SLAM solutions can be compared in afair manner by looking at the real time position.

For the EKF algorithm, the ICNN data association showed better re-sults in all aspects. In most cases, JCBB will have higher accuracy thanICNN due to it checking the associations with all other landmarks,not only the one in question. However, with the sparse map of conesused in this specific scenario together with the relatively high accuracyof the sensors, ICNN will very rarely be faced with the situation thatthere are several possible outcomes of an association. The uncertaintyof a measurement and landmark pair will unlikely be larger than thedistance between two cones, thus allowing ICNN to perform on parwith JCBB in terms of accuracy but with lower latency.

The aspect of hardware will also greatly affect the choice of algorithm.Higher end hardware of all forms will allow better performance withany algorithm, or phrased differently, allow higher driving speeds inthe race car scenario. All algorithms were limited by the amount ofdata which could be obtained per distance moved, in other words theresults will be better the slower the vehicle moves. Further tuning canimprove the allowed speed of the vehicle, and the choice of algorithmwill do the same. Primarily, the sensor update frequency could bematched to the latency of the algorithm used. Other than the outputfrequency of the sensors, the range will also be relevant. Larger rangewill allow faster creation of the map and more effective loop closure.

Further, to the subject of map choice, both grid maps and feature mapsperform their tasks well enough in the test environment. Objects (cones)can be easily identified in both maps, be it as data or visually. Otherthan suitability to the chosen SLAM algorithm, there appears to beno need to place any weight on which map is to be used for this spe-cific testing environment. Further, PGO and RBPF, being implemented

Page 71: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 6. DISCUSSION AND CONCLUSIONS 59

with grid maps and scan matching, rely on the existence of walls orother continuous features in the environment. EKF on the other handis instead only disturbed by such features as the feature detection al-gorithms may provide spurious measurements from wall segments.The Formula Student race tracks have been both in open areas with-out walls and in arenas where there is a wall around the entire track.Depending on the situation, either mapping style may outperform theother.

6.2 Object Recognition

The overall performance of the MobileNet was clearly worse in bothterms of accuracy at 47.2% and latency at 61.5ms compared to the thesmall CNN even though the MobileNet showed low losses and goodaccuracy during training. This could be explained by the small train-ing data that might not have been sufficient to provide a well perform-ing transfer learning. Also the features that MobileNet is using mightbe too complex and not a good fit for cone detection. As can be seenin the Table 5.7 the CNNs performed with radically different accuracyand it can be concluded that a single fully connected layer generallygave a poor accuracy compared to the ones with two fully connectedlayers. Similarly, increased accuracy can be seen as the number of con-volutional layers increase with a total of 87.3% accuracy and latencyof 4.6ms for the CNN with four convolutional layers and two fullyconnected layers. To answer the question of how the choice of CNNarchitecture for cone recognition impacts accuracy and latency for adriverless race car, a smaller CNN with four convolutional layers andtwo fully connected layers allowed relatively high accuracy and lowlatency, while a larger network like MobileNet had an unsatisfactoryaccuracy and higher latency.

There was generally no clear differences for the image classificationtime in between the small CNNs but quite large difference in com-putational performance can be seen in between the MobileNets andthe CNNs were the CNNs performed around 10 times faster than theMobileNets. This can easily be explained by the large differences innetwork size and complexity.

Page 72: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

60 CHAPTER 6. DISCUSSION AND CONCLUSIONS

One of the major issues with classification was the performance of theimage patch classification in conjunction with the LiDAR. The trans-formed LiDAR positions often resulted in image patches which onlycontained parts of cones or in worst case no cone at all even when itwas a cluster which represented a cone that was projected onto thecamera image. Examples of this can be seen in Figure 6.1. Often thiswas caused by fast movements, particularly rotational movement.

In the confusion matrix seen in Table 5.8 the best performing CNN

Figure 6.1: Image patches from the lidar cone candidates which causedissues.

predictions on the ground truth lap validation set is presented. TheCNN performs well over all on the cone data but on the "no cone"classification there were some miss classifications. A possible reasonfor this is that the data needed to classify something as "not a cone"requires a vast more variety than to classify something as somethingas specific as a cone. This is mainly because the "no cone" class doesnot have any unique features but instead is all the images which don’tcontain certain features. With the small data set at hand this mighthave had a large effect on the classification outcome.

Conclusively, the results show that a CNN with four convolutionallayers and two fully connected layers performed best in this evalua-tion, both in terms of accuracy and latency. The performance is how-ever affected by the training data size which potentially means a differ-ent architecture might perform better with more data. As the evalua-tion did not consider any larger CNN than the ones in hyperparametersearch there is a possibility of higher performance with a larger CNN.

Page 73: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

CHAPTER 6. DISCUSSION AND CONCLUSIONS 61

6.3 Combined Performance

The result for the combined performance showed improvement in realtime pose but lower precision of the finished map. This was due mainlytwo things. Firstly, the lack of synchronisation between laser scan andcamera image. This will greatly impact the performance when rotat-ing as the system will take the image which is closest in time to thelaser scan that detects an object. This image may not be exactly at thesame time, which will slightly offset the image patch that is extracted,resulting in an image which is difficult to classify as a cone. The dif-ference of a successful image patch and a unsuccessful can be seen inFigure 6.2a and 6.2b. Secondly, the LiDAR was restricted to the camerafield of view, which is much less than that of the LiDAR. For data as-sociations, it will be more difficult for the algorithm to do a successfulloop closure with fewer points to use in a matching. This will affectthe accuracy of the resulting map, which can be seen in the data.

(a) LiDAR to camera projec-tion hit.

(b) LiDAR to camera miss dur-ing turn.

Figure 6.2: Examples of LiDAR to camera projection. The red circlescorresponds to the cone candidates from the euclidean clustering pro-jected onto the camera image.

With the performance, both in terms of speed and accuracy, of thecombined algorithms it is reasonable to assume feasibility in a highspeed racing environment if higher sampling speed of the sensors isavailable. At higher speeds there might however be other issues suchas blurry images and inexact odometry causing the algorithms to fail.

Page 74: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 75: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Chapter 7

Future work

This chapter contains suggestions of future work to be done on the subject.

7.1 SLAM

Since no single algorithm is superior in all fields, it could be interest-ing to run one algorithm for the first mapping process, where latencyis not as crucial, and swap to a faster algorithm in pure localisationmode (no modifications made to the map) for higher speeds after themap has been completed. It would also be interesting to see how welleach of the chosen algorithms perform in localisation mode when themap is already determined. Such an evaluation could improve under-standing of whether it would be relevant to run two different algo-rithms in succession.

It would be interesting to implement a hybrid data association methodfor the EKF solutions, where JCBB would be used instead of ICNN inthose cases where there is more than one possible outcome of the dataassociation. This could greatly improve real-time accuracy withoutsacrificing much latency.

7.2 Object Recognition

One key aspect that can be concluded is the issue of the LiDAR resolu-tion inside the camera field of view and the issue of transforming thecorresponding LiDAR point onto a key frame. One way to solve this

63

Page 76: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

64 CHAPTER 7. FUTURE WORK

is by utilising depth cloud from a stereo camera or RGB-D camera andmasking for different depths to get the different cone candidates. Thiswould remove any latency and transformation issues caused by theLiDAR and camera configuration and instead the depth informationdirectly correspond to the camera frame taken at the same time point.Using this approach would also make it possible to use two separatepipelines for the cone detection, one using camera and one using thelidar. This would create redundancy in the system and situations suchas rain, which can cause issues for the LiDAR, would not be as severe.

If one would still like to use the high accuracy of a LiDAR for the posi-tioning of objects a 3D LiDAR with higher frequency should be consid-ered. With a depth cloud from a 3D LiDAR smaller and more preciseimage patches could potentially be gathered and therefore more easilyclassified correctly.

A higher camera position could improve the classification as well, sincethe probability of getting overlapping cones in the same image patchwould be lower as the perspective would be more favourable than acamera facing the cones at the same height.

7.3 Combined Performance

The orange cones placed at the start and finish could be used to im-prove reliability of the algorithms by adding a layer which wouldcause loop closure upon re-identifying the orange cones. This couldallow algorithms to focus more on real time precision and rely almostsolely on the orange cones for loop closure, to achieve an accurate map.

Safety could be improved by adding a feature that informs the tra-jectory planner of the fact that yellow cones should be to the left andblue to the right. This could reduce the risk of the vehicle going off thetrack or in the wrong direction.

Since the limited camera field of view is likely to have a significantaffect on map accuracy, a solution where the identified cones are fur-ther tracked using the 360° LiDAR after leaving the camera field ofview would be interesting.

Page 77: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Appendix A

Confusion matrix

A confusion matrix is used for evaluating the performance of a clas-sification method on a data set. This can be used both in a binaryclassification as well as in multiclass classifications. As shown in Fig-ure A.1 the matrix consists of predicted and actual classes. The pre-dicted classes represent the classification made and the actual classesrepresent the ground truth. For each row, shown in Figure A.1, thecorresponding column shows the classifications made for that partic-ular class. E.g Row A shows that A was predicted as A 6 times, as B1 time and C 3 times. For a 100% accuracy predictions the matrix is adiagonal matrix.

Predicted class

Act

ual c

lass

A

A

B C

BC

6 1 3

172

0 1 9

Figure A.1: Example of confusion matrix.

65

Page 78: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 79: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

Bibliography

[1] Wired. “Mapped: The top 263 companies racing toward autonomouscars”. In: (2017).

[2] Infoholic Research LLP. Autonomous Vehicle Market: Global Drivers,Restraints, Opportunities, Trends, and Forecasts to 2023. 2017.

[3] H. Durrant-Whyte and T. Bailey. “Simultaneous localization andmapping: part I”. In: IEEE Robotics and Automation Magazine 13.2(2006), pp. 99–110. DOI: 10.1109/MRA.2006.1638022.

[4] Gartner. “Top Trends in the Gartner Hype Cycle for EmergingTechnologies, 2017”. In: (2017).

[5] KTH Formula student. KTH Formula student. URL: http : / /www.kthformulastudent.se/ (visited on 01/25/2018).

[6] Formula Student Germany. Autonomous Driving at Formula Stu-dent Germany 2017. URL: https://www.formulastudent.de/pr/news/details/article/autonomous-driving-at-formula-student-germany-2017/ (visited on 01/25/2018).

[7] Formula Student Germany. FSG Competition Handbook 2018. 2018.

[8] International Organization for Standardization. Road vehicles: Func-tional safety. https://www.iso.org/standard/43464.html. 2011.

[9] Fabin Källström. Practical challenges when applying ISO 26262 toautomated vehicles. 2018.

[10] Judith Jarvis Thomson. “The Trolley Problem”. In: The Yale LawJournal 94.6 (1985), pp. 1395–1415.

[11] J. Bonnefon, A. Shariff, and I. Rahwan. “Autonomous VehiclesNeed Experimental Ethics: Are We Ready for Utilitarian Cars?”In: (2015).

67

Page 80: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

68 BIBLIOGRAPHY

[12] J. Bonnefon, A. Shariff, and I. Rahwan. “The social dilemma ofautonomous vehicles”. In: Science 352 (2016). DOI: 10.1126/science.aaf2654.

[13] G.E. Marchant and R.A. Lindor. “The Coming Collision betweenAutonomous Vehicles and the Liability System”. In: (2012).

[14] Jasper Hamill. “Chinese iPhone X owners claim Apple’s Face IDfacial recognition cannot tell them apart”. In: (). (Visited on ).

[15] European Union. EUROPE 2020: A strategy for smart, sustainableand inclusive growth. http://ec.europa.eu/eurostat/web/europe-2020-indicators. 2010.

[16] European Union. Transport 2050: Comission outlines ambitious planto increase mobility and reduce emissions. http://europa.eu/rapid/press-releaseIP − 11− 372en.htm. 2011.

[17] Schreurs M.A. and Steuwer S.D. Autonomous Driving—Political,Legal, Social, and Sustainability Dimensions. 2016.

[18] D.J. Fagnanta and K.M. Kockelman. “The travel and environ-mental implications of shared autonomous vehicles, using agent-based model scenarios”. In: (2014).

[19] Todd Litman. “Autonomous Vehicle Implementation PredictionsImplications for Transport Planning”. In: (2018).

[20] Bing Wang Liang Li Ming Yang and Chunxiang Wang. “An overviewon sensor map based localization for automated driving”. In:Urban Remote Sensing Event (JURSE) (2017). DOI: 10 . 1109 /JURSE.2017.7924575.

[21] Aron J. Cooper. “A Comparison of Data Association Techniquesfor Simultaneous Localization and Mapping”. In: (2005).

[22] Teodor Coroiu and Oscar Hinton. A Platform for Indoor Localisa-tion, Mapping, and Data Collection using an Autonomous Vehicle.eng. Student Paper. 2017.

[23] Cyrill Stachniss. Robot Mapping Course 13/14. 2014. URL: http://ais.informatik.uni-freiburg.de/teaching/ws13/mapping/.

[24] R. E. Kalman. “A New Approach to Linear Filtering and Predic-tion Problems”. In: Trans. ASME, Ser. D, J. Basic Eng 82.D (1960),pp. 35–45.

Page 81: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

BIBLIOGRAPHY 69

[25] R. E. Kalman and R. S. Bucy. “New Results in Linear Filteringand Prediction Theory”. In: Trans. ASME, Ser. D, J. Basic Eng 82.D(1961), pp. 95–108.

[26] Li Yu Guillaume Bresson Zayed Alsayed and Sébastien Glaser.“Simultaneous Localization and Mapping: A Survey of CurrentTrends in Autonomous Driving”. In: IEEE Transactions on Intelli-gent Vehicles 2.3 (2017), pp. 194–220. DOI: 10.1109/TIV.2017.2749181.

[27] S.J. Julier and J.K. Uhlmann. “A counter example to the theoryof simultaneous localization and map building”. In: Robotics andAutomation, 2001. Proceedings 2001 ICRA. IEEE International Con-ference (2001). DOI: 10.1109/ROBOT.2001.933280.

[28] G. Dissanayake S.B. Williams and H. Durrant-Whyte. “An ef-ficient approach to the simultaneous localisation and mappingproblem”. In: Robotics and Automation, 2002. Proceedings. ICRA’02. IEEE International Conference (2002). DOI: 10.1109/ROBOT.2002.1013394.

[29] Simon J. Julier and Jeffrey K. Uhlmann. New extension of the Kalmanfilter to nonlinear systems. The University of Oxford.

[30] Peter S. Maybeck. Stochastic models, estimation, and control. Vol. 141.Mathematics in Science and Engineering. 1979.

[31] Sebastian Thrun et al. “Simultaneous Localization and MappingWith Sparse Extended Information Filters”. In: 23.7-8 (2004), pp. 693–716. DOI: 10.1177/0278364904045479.

[32] A. Doucet et al. “Rao-Blackwellized particle filtering for dynamicbayesian networks”. In: In Proc. of the Conf. on Uncertainty in Ar-tificial Intelligence (UAI) (2000).

[33] M. Montemerlo and S. Thrun. “FastSLAM: A Scalable Methodfor the Simultaneous Localization and Mapping Problem in Robotics”.In: (2007).

[34] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. “Im-proving Grid-based SLAM with Rao-Blackwellized Particle Fil-ters by Adaptive Proposals and Selective Resampling”. In: IEEERobotics and Automation (2005).

Page 82: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

70 BIBLIOGRAPHY

[35] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. “Im-proved Techniques for Grid Mapping With Rao-BlackwellizedParticle Filters”. In: IEEE TRANSACTIONS ON ROBOTICS 23(2007).

[36] Cyrill Stachniss Giorgio Grisetti Rainer Kümmerle and WolframBurgard. “A Tutorial on Graph-Based SLAM”. In: 2.4 (2010), pp. 31–43. DOI: 10.1109/MITS.2010.939925.

[37] C. Estrada, J. Neira, and J.D. Tardos. “Hierachical SLAM: Real-time accurate mapping of large environments”. In: IEEE Trans-actions on Robotics 21(4) (2005), pp. 588–596.

[38] Cyrill Stachniss. Robot Mapping Course 15/16. 2016. URL: http://ais.informatik.uni-freiburg.de/teaching/ws15/mapping/.

[39] Paul J. Besl and Neil D. McKay. “A Method for Registration of3-D Shapes”. In: IEEE Trans. Pattern Anal. Mach. Intell. 14.2 (Feb.1992), pp. 239–256. ISSN: 0162-8828. DOI: 10.1109/34.121791.URL: http://dx.doi.org/10.1109/34.121791.

[40] J E Guivant and E M Nebot. “Optimization of the simultane-ous localization and map-building algorithm for real-time im-plementation”. In: 17.3 (2001), pp. 242–257.

[41] Jie Xiong and Yan Li. “A Hybrid Data Association Strategy inSLAM”. In: International Conference on Intelligent Human-MachineSystems and Cybernetics 4 (2012), pp. 348–351. DOI: 10.1109.

[42] Bai-Fan Chen, Zi-Xing Cai, and Zhi-Rong Zou. “A Hybrid DataAssociation Approach for Mobile Robot SLAM”. In: InternationalConference on Control, Automation and Systems (2010), pp. 1900–1903.

[43] J Neira and J D Tardos. “Data association in stochastic mappingusing the joint compatibility test”. In: 17.6 (2001), pp. 890–897.

[44] Wolfgang Hess et al. “Real-Time Loop Closure in 2D LIDARSLAM”. In: 2016 IEEE International Conference on Robotics and Au-tomation (ICRA). 2016, pp. 1271–1278.

[45] Sameer Agarwal, Keir Mierle, et al. Ceres Solver. http://ceres-solver.org.

[46] Jose-Luis Blanco-Claraco et al. Mobile Robot Programming Toolkit.https://www.mrpt.org/.

Page 83: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

BIBLIOGRAPHY 71

[47] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. OpenSLAM:Gmapping. https://openslam-org.github.io/gmapping.

[48] Joseph C. Watkins. Topic 15: Maximum Likelihood Estimation. http://math.arizona.edu/~jwatkins/o-mle.pdf.

[49] P. Mahalanobis. On tests and measures of group divergence I. Theo-retical formulae. 1930.

[50] S. Kohlbrecher et al. “A Flexible and Scalable SLAM System withFull 3D Motion Estimation”. In: Proc. IEEE International Sympo-sium on Safety, Security and Rescue Robotics (SSRR). IEEE. Nov.2011.

[51] Bruce D. Lucas and Takeo Kanade. “An Iterative Image Registra-tion Technique with an Application to Stereo Vision (DARPA)”.In: Proceedings of the 1981 DARPA Image Understanding Workshop.Apr. 1981, pp. 121–130.

[52] MIT Racecar. MIT Racecar: A Powerful Platform for Robotics Re-search and Teaching. URL: https://mit-racecar.github.io/ (visited on 03/14/2018).

[53] HaoChih Lin, Chien-Linag Chu, and Eric W Ko. Hypha-Racecar.2017. URL: http://arxiv.org/abs/1512.02325.

[54] SLAMTEC. RPLidar A2M8 Datasheet. 2017. URL: http://bucket.download.slamtec.com/d25d26d45180b88f3913796817e5db92e81cb823/LD208_SLAMTEC_rplidar_datasheet_A2M8_v1.0_en.pdf.

[55] Lawrence Gilman Roberts. “Machine perception of three-dimensionalsolids”. PhD thesis. Massachusetts Institute of Technology, June1963.

[56] Michael J. Swain and Dana H. Ballard. “Color indexing.” In: In-ternational Journal of Computer Vision 7.1 (1991), pp. 11–32.

[57] Matthew Turk and Alex Pentland. “Eigenfaces for recognition”.In: J. Cognitive Neuroscience 3.1 (1991), pp. 71–86. DOI: http://dx.doi.org/10.1162/jocn.1991.3.1.71.

[58] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. “Re-stricted Boltzmann Machines for Collaborative Filtering”. In: Pro-ceedings of the 24th International Conference on Machine Learning.ICML ’07. Corvalis, Oregon, USA: ACM, 2007, pp. 791–798. ISBN:978-1-59593-793-3. DOI: 10.1145/1273496.1273596.

Page 84: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

72 BIBLIOGRAPHY

[59] Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. “A FastLearning Algorithm for Deep Belief Nets”. In: Neural Comput.18.7 (July 2006), pp. 1527–1554. DOI: 10.1162/neco.2006.18.7.1527.

[60] Olga Russakovsky et al. “ImageNet Large Scale Visual Recogni-tion Challenge”. In: International Journal of Computer Vision (IJCV)115.3 (2015), pp. 211–252. DOI: 10.1007/s11263-015-0816-y.

[61] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Ima-geNet Classification with Deep Convolutional Neural Networks”.In: Advances in Neural Information Processing Systems 25. Ed. by F.Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. URL:http : / / papers . nips . cc / paper / 4824 - imagenet -classification-with-deep-convolutional-neural-networks.pdf.

[62] Stanford University. CS231n: Convolutional Neural Networks forVisual Recognition. URL: http://cs231n.stanford.edu/index.html (visited on 05/13/2018).

[63] Yann Lecun et al. “Gradient-based learning applied to documentrecognition”. In: Proceedings of the IEEE. 1998, pp. 2278–2324.

[64] Trevor Darrell Ross Girshick Jeff Donahue and Jitendra Malik.“Rich Feature Hierarchies for Accurate Object Detection and Se-mantic Segmentation”. In: Computer Vision and Pattern Recogni-tion (CVPR) (2014). DOI: 10.1109/CVPR.2014.81.

[65] Ross Girshick. “Fast R-CNN”. In: Computer Vision (ICCV), 2015IEEE International Conference (2015). DOI: 10.1109/ICCV.2015.169.

[66] M. Everingham et al. The PASCAL Visual Object Classes Challenge2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

[67] Joseph Redmon et al. “You Only Look Once: Unified, Real-TimeObject Detection”. In: CoRR abs/1506.02640 (2015). URL: http://arxiv.org/abs/1506.02640.

[68] Joseph Redmon and Ali Farhadi. “YOLO9000: Better, Faster, Stronger”.In: CoRR abs/1612.08242 (2016).

[69] Joseph Redmon and Ali Farhadi. “YOLOv3: An Incremental Im-provement”. In: CoRR abs/1804.02767 (2018).

Page 85: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

BIBLIOGRAPHY 73

[70] M. Everingham et al. “The Pascal Visual Object Classes Chal-lenge: A Retrospective”. In: International Journal of Computer Vi-sion 111.1 (Jan. 2015), pp. 98–136.

[71] Wei Liu et al. SSD: Single Shot MultiBox Detector. 2015. DOI: 10.1007/978-3-319-46448-0_2. URL: http://arxiv.org/abs/1512.02325.

[72] Andrew G. Howard et al. “MobileNets: Efficient ConvolutionalNeural Networks for Mobile Vision Applications”. In: (Apr. 2017).

[73] Huang Yong and Xue Jianru. Real-time Traffic Cone Detection forAutonomous Vehicle. 2015. DOI: 10.1109/ChiCC.2015.7260215.

[74] Renaud Dubé Nico Messikommer Simon Schaefer and Mark Pfeif-fer. Cone Detection Using a Combination of LiDAR and Vision-BasedMachine Learning. 2017.

[75] Michel Goossens, Frank Mittelbach, and Alexander Samarin. TheInversion Camera Model. In: Geometric Algebra with Applicationsin Engineering. Berlin, Heidelberg: Springer, 2009, pp. 277–297.ISBN: 978-3-540-89068-3.

[76] Martin Abadi et al. TensorFlow: Large-Scale Machine Learning onHeterogeneous Systems. Software available from tensorflow.org.2015. URL: https://www.tensorflow.org/.

[77] Marcus D. Bloice. Augmentor. URL: https://github.com/mdbloice/Augmentor (visited on 05/08/2018).

[78] F1/10. F1/10: Autonomous Racecar Competition. URL: http://f1tenth.org/ (visited on 03/14/2018).

[79] JetsonHacks. JetsonHacks: Developing for NVIDIA Jetson. URL: http://www.jetsonhacks.com/ (visited on 03/14/2018).

[80] Steven Keeping. Controlling Sensorless, BLDC Motors via Back EMF.URL: https://www.digikey.se/en/articles/techzone/2013/jun/controlling- sensorless- bldc- motors-via-back-emf (visited on 03/14/2018).

[81] Benjamin Vedder. Benjamin’s Robotics. URL: https://vedder.se (visited on 03/14/2018).

[82] InvenSense. InvenSense MPU-9250 Datasheet. URL: https://cdn.sparkfun.com/assets/learn_tutorials/5/5/0/MPU9250REV1.0.pdf.

Page 86: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

74 BIBLIOGRAPHY

[83] StereoLabs. ZED Camera. URL: https://www.stereolabs.com/zed/ (visited on 03/14/2018).

[84] Slamtec. RPlidar A2. URL: https://www.slamtec.com/en/Lidar/A2 (visited on 03/14/2018).

[85] Robot Operating System. Open Source Robotics Foundation. 2018.URL: http://www.ros.org/.

[86] R. Rajkumar, M. Gagliardi, and Lui Sha. “The real-time pub-lisher/subscriber inter-process communication model for distributedreal-time systems: design and implementation”. In: Real-Time Tech-nology and Applications Symposium, 1995. Proceedings (1995).

[87] Wim Meeussen. Coordinate Frames for Mobile Platforms. http://www.ros.org/reps/rep-0105.html.

[88] Tully Foote. “tf: The transform library”. In: Technologies for Practi-cal Robot Applications (TePRA), 2013 IEEE International Conferenceon. Open-Source Software workshop. Apr. 2013, pp. 1–6. DOI:10.1109/TePRA.2013.6556373.

[89] Institute for Systems and Robotics Lisboa. Hector SLAM RobotSetup. http://library.isr.ist.utl.pt/docs/roswiki/hector_slam(2f)Tutorials(2f)SettingUpForYourRobot.html. 2011.

[90] Rainer Kummerle et al. On Measuring the Accuracy of SLAM Al-gorithms. 2009.

Page 87: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner
Page 88: A Combination of Object Recognition and Localisation for ...1234088/FULLTEXT01.pdf · Localsation for an Autonomous Racecar Jonathan Cressell Isac Törnberg Approved 2018-06-19 Examiner

TRITA TRITA-ITM-EX 2018:193

www.kth.se