radar-detection based classification of moving objects ...kth.diva-portal.org › smash › get ›...

Radar-detection based classification of

moving objects using machine learning methods

VICTOR NORDENMARK

ADAM FORSGREN

Master of Science Thesis

Stockholm, Sweden 2015

ii

Radar-detection based classification of moving objects using machine learning

methods

Victor Nordenmark Adam Forsgren

Master of Science Thesis MMK 2015:77 MDA 520

KTH Industrial Engineering and Management

Machine Design SE-100 44 STOCKHOLM

iii

Examensarbete MMK 2015:77 MDA 520

Radar-detection based classification of moving objects using machine learning methods

Victor Nordenmark

Adam Forsgren

Godkänt

2015-06-17

Examinator

Martin Grimheden

Handledare

De-Jiu Chen

Uppdragsgivare

Scania Södertälje AB

Kontaktperson

Kristian Lundh

Sammanfattning

I detta examensarbete undersöks möjligheten att klassificera rörliga objekt baserat på data från Dopplerradardetektioner. Slutmålet är ett system som använder billig hårdvara och utför beräkningar av låg komplexitet. Scania, företaget som har beställt detta projekt, är intresserat av användningspotentialen för ett sådant system i applikationer för autonoma fordon. Specifikt vill Scania använda klassinformationen för att lättare kunna följa rörliga objekt, vilket är en väsentlig färdighet för en autonomt körande lastbil.

Objekten delas in i fyra klasser: fotgängare, cyklist, bil och lastbil. Indatan till systemet består väsentligen av en plattform med fyra stycken monopulsdopplerradars som arbetar med en vågfrekvens på 77 GHz.

Ett klassificeringssystem baserat på maskininlärningskonceptet Support vector machines har skapats. Detta system har tränats och validerats på ett dataset som insamlats för projektet, innehållandes datapunkter med klassetiketter. Ett antal stödfunktioner till detta system har också skapats och testats. Klassificeraren visas kunna skilja väl på de fyra klasserna i valideringssteget. Simuleringar av det kompletta systemet gjort på inspelade loggar med radardata visar lovande resultat, även i situationer som inte finns representerade i träningsdatan.

För att vidare undersöka systemet har det implementerats och testats på prototyplastbilen Astator, och prestandan har utvärderats utifrån både realtidstidsperpsektiv och klassificeringsnoggranhet. Generellt uppvisar systemet lovande resultat i scenarier som liknar slutanvändningsområdet. I mer komplexa trafiksituationer och då lastbilen färdas i högre hastigheter leder dock en högre förekomst av sensorbrus till att systemets prestanda försämras.

iv

Master of Science Thesis 2015:77 MDA 520

Klassificering av rörliga objekt baserat på radardetektioner med hjälp av

maskininlärningsmetoder

Victor Nordenmark

Adam Forsgren

Approved

2015-06-17

Examiner

Martin Grimheden

Supervisor

De-Jiu Chen

Commissioner

Scania Södertälje AB

Contact person

Kristian Lundh

Abstract

In this MSc thesis, the possibility to classify moving objects based on radar detection data is investigated. The intention is a light-weight, low-level system that relies on cheap hardware and calculations of low complexity. Scania, the company that has commissioned this project, is interested in the usage potential of such a system in autonomous vehicle applications. Specifically, the class information is desired in order to enhance the moving object tracker, a subsystem that represents a crucial skillset of an autonomously driving truck.

Objects are classified as belonging to one of four classes: Pedestrian, bicyclist, personal vehicle and truck. The major system input consists of sensor data from a set of four short-range mono-pulse Doppler radars operating at 77 GHz.

Using a set of training and validation data gathered and labeled within this project, a classification system based on the machine learning method of Support vector machines is created. Several other supporting software structures are also created and evaluated. In the validation phase, the system is shown to discern well between the four classes.

System simulations performed on logged radar data show promising performance also in situations not reflected within the labeled dataset.

To further investigate the feasibility of the system, it has been implemented and tested on the prototype test vehicle Astator, and performance has been evaluated with regards to both real-time constraints and classification accuracy. Overall, the system shows promise in the scenarios for which it was intended, both with respect to real-time and classification performance. In more complex scenarios however, sensor noise is increasingly apparent and affects the system performance in a negative way. The noise is extra apparent in heavy traffic and high velocity scenarios.

List of Figures1 Schematic overview of relevant target system architecture . . . . . . 4

2 Approximate placement and FOV of SRR sensors on Astator . . . . 5

3 Development approach schematic . . . . . . . . . . . . . . . . . . . 10

4 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Projection of radar EGO component on the range rate of a detection. 19

6 Linear decision boundaries in two and three dimensions . . . . . . . 21

7 An example of a non-linearly separable dataset . . . . . . . . . . . . 21

8 Illustration of the bias-variance dilemma . . . . . . . . . . . . . . . 24

9 The support vector machine visualized in a two-dimensional featurespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

10 Illustration of the kernel concept . . . . . . . . . . . . . . . . . . . 29

11 Illustration of soft-margin SVM . . . . . . . . . . . . . . . . . . . . 31

12 DBSCAN clustering method . . . . . . . . . . . . . . . . . . . . . . 37

13 Development process and classification system overview . . . . . . . 44

14 The eps tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

15 Timing Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

16 Biplot of features and training data projected onto first two PC . . 70

17 Training data and biplot projected onto first three PC . . . . . . . 71

18 Lines of noise detections behind an object . . . . . . . . . . . . . . 74

19 Scattered noise detections behind an object . . . . . . . . . . . . . 75

20 Clusters of noise detections with high velocity values . . . . . . . . 76

21 Fence detections and noise when moving at 60 km/h . . . . . . . . 77

22 Radar Detections Clustered with Eps = 4 meters . . . . . . . . . . 82

23 Coarse grid search for SVM parameters . . . . . . . . . . . . . . . . 84

24 Fine grid search for SVM parameters . . . . . . . . . . . . . . . . . 84

25 Typical frames from evaluation logs . . . . . . . . . . . . . . . . . . 91

26 Typical frame from highway log . . . . . . . . . . . . . . . . . . . . 91

v

List of Tables

1 Delphi SRR Midrange specifications . . . . . . . . . . . . . . . . . . 5

2 Binary classification confusion matrix . . . . . . . . . . . . . . . . . 34

3 Multiclass Confusion matrix . . . . . . . . . . . . . . . . . . . . . . 62

4 Classification performance measurements . . . . . . . . . . . . . . . 62

5 Radar detection clusters gathered and labeled . . . . . . . . . . . . 67

6 Mean and variance of features used for object description . . . . . . 69

7 Table of the cumulative variance explanation per principal component 70

8 maxdR detection filter statistics . . . . . . . . . . . . . . . . . . . . 79

9 maxClusterVelVar filtering results . . . . . . . . . . . . . . . . . . . 79

10 minClusterAmpVar filtering results . . . . . . . . . . . . . . . . . . 80

11 Offline evaluation of Classification performance . . . . . . . . . . . 85

12 Real-time simlation performance from Simulink Profiler . . . . . . . 87

13 Real-time performance on the target system . . . . . . . . . . . . . 88

14 Input output comparison of the two systems for the different classes 89

15 Classification system evaluation on log with trailer . . . . . . . . . . 90

16 Fulfillment of functional requirements . . . . . . . . . . . . . . . . . 93

17 Fulfillment of extra-functional requirements . . . . . . . . . . . . . 94

vi

Contents

1 Introduction

1.1 Project background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 General project background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Difference to other projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Project goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Project purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Target system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.3 Project requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Project development methodology and considerations . . . . . . . . 9

1.3.1 Development approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.3 Sustainable development considerations . . . . . . . . . . . . . . . . . . . . . . 12

2 Frame of reference

2.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Radar based vehicle perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Doppler radar as input to learning systems. . . . . . . . . . . . . . . . . . . . 15

2.2 Doppler radar perception and integration of multiple sensors . . 16

2.2.1 Basic Doppler radar theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Sensor fusion and integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Theoretical overview of machine learning concepts and methods 19

2.3.1 Supervised learning, classification and overfitting . . . . . . . . . . . . . . . 20

2.3.2 Support vector machines as a method for classification . . . . . . . . . . . 25

2.3.3 Classification performance analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4 Extraction and analysis of object descriptions . . . . . . . . . . . . . 36

2.4.1 Clustering of sensor data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4.2 Selecting and extracting features from data clusters . . . . . . . . . . . . . 39

2.4.3 Principal component analysis for feature evaluation . . . . . . . . . . . . . 40

2.5 Frame of reference conclusions. . . . . . . . . . . . . . . . . . . . . . . . . 41

2.5.1 Current best-practice in vehicle perception . . . . . . . . . . . . . . . . . . . . 42

2.5.2 Theory and methods employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 Methods

3.1 Method overview and system introduction . . . . . . . . . . . . . . . . 44

3.1.1 Stages of system development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1.2 Classification system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Gathering radar detection data . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.1 Test-track data gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.2 Labeling of gathered data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Practical selection and analysis of object descriptions . . . . . . . . 48

3.3.1 Description of features used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.2 Analysis of features and data with PCA . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Pre-classification signal processing . . . . . . . . . . . . . . . . . . . . . . 51

3.4.1 Filtering of radar detections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4.2 Clustering of radar detections using DBSCAN . . . . . . . . . . . . . . . . . 52

3.4.3 Feature vector calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.4 Filtering of radar detection clusters . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Classification of processed objects . . . . . . . . . . . . . . . . . . . . . . 57

3.5.1 Implementation of support vector machine system . . . . . . . . . . . . . . 57

3.5.2 Multiclass, rejection and confidence structures . . . . . . . . . . . . . . . . . 58

3.5.3 Evaluating classification performance on validation data . . . . . . . . . . 61

3.6 System implementation on target platform. . . . . . . . . . . . . . . . 62

3.6.1 Real-time implementation goals and restrictions . . . . . . . . . . . . . . . . 63

3.6.2 Timings and tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.6.3 Validation of final system implementation. . . . . . . . . . . . . . . . . . . . . 65

4 Results and discussion

4.1 Results of data gathering and labeling . . . . . . . . . . . . . . . . . . . 67

4.2 Analysis of feature and data characteristics . . . . . . . . . . . . . . . 69

4.2.1 Characteristics of selected features . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.2 Principal component analysis of features on training data . . . . . . . . . 70

4.2.3 Feature and data analysis discussion . . . . . . . . . . . . . . . . . . . . . . . . 71

4.3 Signal processing and filtering performance . . . . . . . . . . . . . . . 73

4.3.1 Common types of noise in the radar output . . . . . . . . . . . . . . . . . . . 74

4.3.2 Results of developed filtering structures . . . . . . . . . . . . . . . . . . . . . . 78

4.3.3 DBSCAN clustering parameter evaluation . . . . . . . . . . . . . . . . . . . . 81

4.4 Classification-related results . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4.1 Support vector machine model selection . . . . . . . . . . . . . . . . . . . . . . 83

4.4.2 Offline evaluation of classification performance . . . . . . . . . . . . . . . . . 85

4.5 Complete system performance assessment . . . . . . . . . . . . . . . . 86

4.5.1 Real time performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5.2 Classification performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 Conclusions and future work

5.1 Concluding discussion regarding research questions andrequirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.2 Project-wide conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Work division

In this section, the work division made during the thesis and specifically, thewriting of this report, is shown.

The thesis has always been seen as a collaborative effort and both authors havebeen involved in most, if not all, different sections of the report.

However, focus has been placed in different areas, and for this reason one authoris seen as mainly responsible for the corresponding sections originally written bythat author.

In the frame of reference chapter, Victor has been mainly responsible for sections2.1 and 2.4, while Adam has governed sections 2.2 and 2.3.

In the method chapter, Victor has been primarily responsible for sections 3.1, 3.3,3.4.2, 3.5.2 and 3.6, and Adam sections 3.2, 3.4, 3.5.1 and 3.5.3.

As for the results chapter, Victor is mainly responsible for sections 4.2, 4.3.3 and4.5, while Adam is primarily responsible for sections 4.1, 4.3.1, 4.3.2 and 4.4.

The conclusions chapter saw sections 5.2 and 5.3 written by Victor, while Adamwas mainly responsible for section 5.1.

The introduction chapter was an entirely collaborative effort and no particularwork division can be seen here, as all sections were cooperatively written by bothauthors.

This has been a natural work division, as the knowledge gained in the literaturereview as well as the writing of the frame of reference sections, has been vital whenwriting the corresponding sections in the method and results chapters.

Part 1: Introduction

This chapter aims to present a background and to introduce the reader to theproject. First, background and motivation for the project will be presented followedby a problem description and an introduction to the target system - the prototypevehicle Astator. The requirements and research questions considered will also bedetailed here. Finally, the development approach used will be explained togetherwith ethical considerations made.

1.1 Project background

In this section, the background to the project will be presented together with aproblem description. Brief details about the difference between this project andearlier work made in similar areas is also presented.

1.1.1 General project background

In the automotive industry today, a big effort is put into the developmentof intelligent vehicles, advanced driver assistance systems, and ultimately,autonomously driving vehicles.

The development of more intelligent systems within vehicles have wideimplications. Early warning systems and advanced vehicle perception can lead toa big decrease in injuries resulting from accidents. Additionally, vehicle operationscan be more optimized for fuel consumption, thus minimizing costs and reducingenvironmental impact. The need for human operators can be reduced, as in thecase of platooning systems where a single person operates an entire fleet of vehicles,or a fully autonomous system, were the human operator is completely removed.

The advance of more complex systems in vehicles leads to higher requirements inthe processing of large amounts of data, such as when analysing multiple sensorsignals simultaneously. When it comes to processing large quantities of (often highdimensional) data, the ever-increasing computational power available has enabledthe use of machine learning methods to tackle new problems. Machine learningmethods can provide a means to analyze big quantities of data in ways that havepreviously not been possible.

For a company such as Scania, research in advanced driver assistance systems isat the core of an expected future suite of services to provide to customers. Acrucial goal is the ability of a vehicle to detect and track surrounding objects.This complex problem of vehicle perception demands many system layers and hasmany feasible solution approaches.

At Scania REPA (the unit for development of advanced driver assistance systems),within the iQMatic project, research is conducted on the development of a truck

1

that can autonomously carry goods to increase the efficiency of mining sites. Theprototype vehicle Astator is a platform for testing new solutions developed in thisproject. The vehicle contains, among many other subsystems, an object trackingsystem. The purpose of this system is to keep track of objects surrounding thetruck, and predict their future positions and movements. The tracker, togetherwith other subsystems, builds up a perception system which allows the vehicle tosense and react to its surroundings.

1.1.2 Problem description

The object tracking system of Astator uses motion models to predict how adetected object will move. Currently, motion models are interpolated fromdetection history, which if a wrong assumption is made can lead to estimationerrors and unreliable performance. This problem is especially apparent when nohistory is available, as the choice of initial parameters has a big effect on futurepredictions.

For a suitable motion model to be selected by the object tracker, some initialinformation about a detected object is needed. Such information would preferablyinclude an estimation about what category, or class, a detected object belongsto. For this reason, Scania has requested an investigation into the creation of alow-level classification system.

This system should deliver class data (object type) in order to provide additionalinformation as input to the object tracker. The system will use machine learningmethods to process and classify objects based on data received from radar sensorsmounted on Astator.

1.1.3 Difference to other projects

Classification of moving objects based on sensor data is not a new field. However,the classification process is usually done at a high level, combining input datagathered from several sensors such as LIDAR, stereo cameras and radars.

The use of multiple sensors and fusion of sensor data can provide a very accurateclassification, but also requires heavy processing power. Additionally, the use ofmultiple sensors will lead to a higher cost for the sensor platform, and the movingparts that are found in LIDAR systems can lead to a reduced hardware robustnessof the system.

Other methods of object classification based on radar input rely on a deeperknowledge about the radar signal characteristics in order to construct statisticalmodels. Whereas these methods require radar data in its raw form, the radarsensors used in this project deliver heavily processed data in the form of detectionpoints. When dealing with processed radar data in this form, little research hasbeen done before.

2

The classification system concerned in this thesis project is intended to operate ona low level, based only on radar detection data, and before the tracking system inthe signal chain. As such, it should not require big amounts of data as input, thecalculations to be made should be of low complexity, and it should be executablein a real-time context.

1.2 Project goal

In this section, the purpose and goals of this master thesis project are described.The Astator target system platform is also introduced. Additionally, the researchquestions considered are detailed here together with project requirements.

1.2.1 Project purpose

This MSc thesis project is about researching, implementing and evaluating aclassification system with the purpose of identifying moving objects based onsensor data mainly consisting of radar detections.

Objects shall be classified as belonging to one of four classes:

1. Pedestrian

2. Bicyclist

3. Personal vehicle

4. Truck

The system will be implemented and evaluated on the prototype vehicle platformAstator.

The end purpose of the classification system is to improve accuracy in the objecttracking system by providing class data as an additional input.

1.2.2 Target system

The Astator system (referred to as ”Astator” or the ”target system”) is theprototype platform used for development within the iQMatic project. It containsnumerous subsystems and modules. Below in figure 1, a schematic overview of allrelevant parts in the system architecture is shown, together with the autonomousvehicle context that this project operates in.

3

ECU and softwareSensors

SRR_1

SRR_2

SRR_3

SRR_4

Interface

...

IMU

GPS

YAW

CAN bus

Pre-process Translate data

Moving Object Classifier

SMOT object tracker

Actuators

Steering

Acceleration

Braking

LIDAR

Situational Awareness

Object Assessment

Sub-object Assessment

SItuational Assessment

Autonomous Vehicle Skillset for IQmatic

Artificial Intelligence

Decision Making

Path Planning

Automatic control

Path ExecutionCam

...

Figure 1: Schematic overview of relevant target system architecture

The yellow shaded area spans the parts of the target system that are of direct andindirect concern within this project. The boxes in red are the specific modules(software or hardware) that the classifier system developed in this project relies onfor input data. Sensor modules are connected to the main ECU (Embedded controlunit) through CAN (Controller area network). The green box is the intendedproduct of this project. The purple box represents the receiving subsystem thatthe classification system output is intended for.

The blue shaded area provides some autonomous vehicle context and representsa higher abstraction level with concepts that are of particular concern withinthe iQMatic project (but not within the scope of this thesis). The moving objectclassification system that this project concerns is intended to be part of the ObjectAssessment skill. This in turn is part of the vehicle situational awareness (alsocalled vehicle perception).

Relevant target system characteristics and specifics are presented below.

Short Range Radars Astator is equipped with six short range radars (SRR),two forward facing systems and one additional radar at each corner of the vehicle.The four corner radars (SRR 1-4) are the sensors that will be of interest to this

4

project and they will provide the main data input to the classification system.The forward facing sensors will not be further considered as they are of a slightlydifferent configuration. The corner radars are mounted to give a broad, partlyoverlapping field of view (FOV) of the vehicle side and rear surroundings. Theapproximate positions of these radars as well as their FOV can be seen in figure2 below.

Figure 2: Approximate placement and FOV of SRR sensors on Astator

A condensed list of the SRR system specifications can be found below in table 1:

Table 1: Delphi SRR Midrange specifications

Frequency 77 [GHz]Field of view 150 [degrees]Range 0.5-80 [m]Sampletime 50 [ms]Bandwidth 250 [MHz]

The radar sensors deliver data in the form of detections. These are data pointsrepresenting a detected object. Each detection contains positional information aswell as additional parameters.

Each of the radars deliver a data package of up to 64 detections every 50 ms overCAN. Furthermore, since four different radars are used in the project, up to 256detections can be received in any single frame. The amount of detections received

5

depend on external factors such as how much movement and reflection is detectedas well as the internal structure of the radar processing unit.

A data package from the radar sensors contains the following parameters:

• Amplitude: indicates amount of energy reflected back from detected surface.Exact unit and calculation unknown.

• Doppler velocity: the relative velocity of the detected surface in the radarradial direction [m/s].

• Distance: distance to the detected surface [m].

• Angle: detection ray angle relative to the normal of the radar [rad].

In addition, there is information about time, whether the delivered packagecontains updated data, detection ID and package size.

IMU and GPS The IMU (Inertial measurement unit) and GPS (Globalpositioning system) sensors provide vehicle heading and velocity information,which is important for the interpretation and translation of radar data. Thesecomputations are made in the VEGO system, and will not be presented in furtherdetail.

The ECU The Astator ECU is a 16GB RAM, core i7 quad-core processorrunning Linux. Most of the autonomous framework software operates on this.The software system works on an update cycle of 10 ms.

Pre-processing In this software stage, the data on the CAN bus is convertedinto the LCM format used within all of the other software framework.

Translate data The MEAS software step serves to translate sensor datainto different coordinate systems, as well as provide estimations of measurementcertainty. In particular, we are concerned with the EGO (Referring to the selfvehicle) local Cartesian coordinate system, with origin located at the center ofthe rear axis of Astator. The data translation is necessary to determine absoluteobject velocities and to integrate the different radars into one system.

Based on the calculated velocity associated with a certain radar detection, andthe estimated uncertainty of the corresponding radar in this particular angle, anindex of certainty that the detection belongs to a moving object is produced.

• Movement index 3: 99.7 % certainty (three standard dev.).

• Movement index 2: 95.5 % certainty (two standard dev.).

• Movement index 1: 68.3 % certainty (one standard dev.).

• Movement index 0: Object is probably not moving.

6

It should be noted that since the exact standard deviations of the radar systemare unknown, this method is based on estimated values. There could be dynamicbehaviours regarding the measurement certainty of the radars, or phenomenaunaccounted for. Hence, the certainties above do not express reality, but onlya very rough approximation.

A brief outline of the theory behind how radar detections can be transformedfrom sensor specific polar coordinates to the local Cartesian system can be foundin section 2.2.2, Sensor fusion and integration.

Moving Object Classifier This is the subsystem to be developed within thisproject, and which the rest of the report will concern in detail.

SMOT object tracker SMOT is the object tracking system of Astator. Ituses combined sensor data in a Kalman filter structure to track objects and predicttheir future positions based on history. It is the recipient of the output from themoving object classifier.

Autonomous vehicles must consolidate many different skill sets in order tofunction. Within the iQmatic project, these skill sets have been abstracted intothe following nomenclature:

Situational Awareness This can also be referred to as vehicle perception.This contains object assessment (of which the moving object classification systemis part of), sub-object assessment and situation assessment.

Artificial Intelligence This concerns the decision making of the autonomousvehicle and will not be further examined in this project.

Automatic Control This regulates the movements of the autonomous vehiclethrough the different actuators available. This will not be further examined inthis project.

1.2.3 Project requirementsThese requirements have been developed by the authors together with supervisorsfrom both Scania and KTH, and are also influenced by the literature surveyconducted at the start of the project. They should be seen as a result ofinvestigations conducted by the authors and not as requirements imposed byScania. These requirements will serve as foundation for the research questionsasked within the project.

Real time The system shall detect and classify objects in a real timeenvironment with a predictable execution time. This execution time must befast enough so that the output can be computed before the next sample.

The sensors deliver data at 20 Hz, this means that the absolute maximumexecution time for the classification system in order to classify on each sampleis 50 ms. A reduction of the execution time below 10 ms would be beneficial sinceother systems on the Scania test truck are run with a frequency of 100 Hz.

7

Clustering The sensors deliver data in the form of detection points with asmall number of parameters. It is assumed that these parameters in their rawform are not enough to reliably classify objects. Because of this, it is necessaryto group the detections into objects. This clustering process will provide newinformation about the detected objects, enabling more features to be computed.

System output In order for the system to be useful to the tracker, the classoutput needs to have a certain degree of accuracy. A confidence output (withsome sort of confidence measurement) would be useful to the system in order forthe data to be more easily integrated into the tracker, especially if it can take ona probabilistic structure.

Programming language The reference programming language of this projectis MATLAB, but in order to implement the system on the embedded system ofAstator, a toolbox to generate C code will be used.

Below is a more condensed list of the requirements to be investigated within thisproject.

Functional requirements

• The system shall take the output of the sensors and cluster these data pointsinto objects

• The system should filter out static objects and only be concerned withmoving objects

• The system shall classify these moving objects as belonging to one or noneof the classes

• The system should be able to provide a confidence output

• The system should have an execution time that never exceeds 50 ms

Extra-functional requirements and technological preferences

• The system shall use input data from four short range radars, combined withsensor data regarding EGO movements.

• The system shall classify on every sample separately, without the use ofdetection history or feedback loops.

• The system shall operate with the use of SVM as classification method

• The system shall be programmed in MATLAB

• The system shall be implemented on the embedded hardware of Astator

8

1.2.4 Research questions

To approach the task from a scientific viewpoint, the following research questionshave been formulated:

1. How can existing machine learning theory be integrated into the embeddedhardware of Astator with the purpose of creating a classification system?

2. How can this system be optimized for real time execution?

3. What can be done to improve the classification accuracy of this system withregards to robustness against noise and environmental factors?

4. What are the major obstacles in creating the system, and what can be doneto overcome them?

The aim of this project is to answer these research questions through thedevelopment of a classification system. The overall development approach used toconstruct this system is presented below.

1.3 Project development methodology and considerations

In this section, the specific methodology used for development within this projectis presented, together with delimitations in the project scope. Additionally,social and ethical considerations with regards to sustainable development will bediscussed here.

1.3.1 Development approach

In the development of this project, an adaptation of the V-model of developmentis used. This is beneficial in that it provides a foundation on which to divideefforts, makes for a logical flow of work and implies certain approaches of validatingresults. A strict V-model protocol will not be discussed or followed, but thegeneral development approach as adapted and understood within this project willbe presented below.

The development approach used here divides development into three distinctphases, containing several layers each. The output of each phase corresponds toa major project deliverable, either parts of the report or the actual classificationsystem. In figure 3 below, a schematic image of the development approach isshown:

9

TestingDefinition

Implementation

Theory and best practice

studies//Requirements development

Functional and technical system

definitions

Software design//Offline system

development

Piece wise system verification

Complete system verification// Performance assessment

Performance OK?

RT implementation of SW functions

Yes

No

Verification

Verification

VerificationData

gatheringData

gathering

Frame of reference// Methods

Results and discussion

Classification system

Figure 3: Development approach schematic

Definition First, during the definition phase, we adopt a top-down approachof first defining requirements of the total system, then define how the system willoperate at a functional level, and finally the technical level. In this phase, theorystudies will be performed and current best-practice examined.

This phase also requires the gathering of pre-existing data that can be analyzed,in order to properly define necessary functionalities and gain sufficient domainknowledge.

The outputs of the definition phase constitute what is presented in chapters 2,Frame of reference and chapter 3, Methods.

Implementation After the definition phase, the implementation phase isperformed. In this phase, code is written and implemented. We use amodel-based method of first developing an offline system that operates in asimulated environment. When this offline system is deemed to perform well,a real-time implementation of the same system is developed through means ofcode-generation for implementation on the target system. This is continuallychecked against the offline version, using the same testing data, to ensure that theRT-functions give the same output as the offline functions.

The output if this phase is the actual classification system.

Testing The third major phase then consists of testing the implementationagainst the respective levels of definitions and requirements. Here, a bottom-up

10

approach is adopted: functionality is tested at the lowest useful level first, thenput together and tested in groups, and finally as a complete system against therequirements defined in the first phase. The verification is done against therequirements and definitions constructed in the definition phase.

This testing phase requires the gathering of more data, to evaluate the completesystem in a realistic environment or appropriate experimental setup.

The output of this phase constitutes chapter 4, Results and discussion.

Remarks on validation This project contains many sub functions, theperformance of which can be evaluated separately. But since the project scope islimited, it is clear that too much time cannot be spent trying to optimally assessthe performance of every subsystem. In cases where there is theory or used-practiceavailable with regards to the assessment of performance, the suggested methodscan and will be used. For subsystems where such methods are not clearlyavailable, heuristic approaches will be used instead in order to reach appropriateperformance.

1.3.2 Delimitations

This project will not investigate what classification method is best for anyparticular purpose. Instead one method will be chosen and focus will lie in theimplementation of an integrated system on Astator. Through conclusions reachedin the background study, it is decided that support vector machines will be theclassification method used throughout the project.

The system shall not include sensor data other than that delivered by the Dopplerradars, and information about the EGO vehicle speed.

Detection history shall not be part of the system, meaning that there will be nofeedback loops and that the system will perform processing and classification oneach sample cycle separately.

The goal is to create and evaluate a methodology as well as to identify majorobstacles. Constructing an end-user product is not within the scope of this project.As such, the performance requirements when it comes to classification are not verystrict.

A mining site is generally off-limits for the public, and compared to for example aninner-city environment it contains few moving objects. These objects can also beheld under strict supervision in such a controlled environment. This means thatthe scope of this project can be limited to areas sparsely populated by movingobjects, without affecting the validity.

11

1.3.3 Sustainable development considerations

Here, the sustainable development aspects of this project are discussed. The morenarrow area explicitly covered in this project is of little such interest, but in abroader context there are interesting discussions to be had.

The broader area of machine autonomy is certainly an area subject to somecontroversy, with regards to ethics and how the area should be approachedby legislation and such. Most of these discussions can also be applied to theautomation of heavy vehicles.

For example, it has been said that 50 percent of Swedish jobs could be gone within20 years [1]. It has always been the case that machines and new technology takeover manual labour previously performed by humans, but perhaps the pace thatis currently experienced is unprecedented.

This phenomena has huge implications both economically and socially. Forcompanies and particular businesses (such as Scania), this presents a hugeopportunity. The human in the loop often represent a major part of costs, and ifthis can be minimized great profits can be made.

On a nation-wide level, however, the economic benefits can be more diffuse. If 50percent of current jobs disappear without being replaced in the same pace withnew ones, this will obviously place some strain on society. Such a scenario cancause economic and social vulnerability for many individuals. If, however, thebenefits of heavy automation can be shared by the entire community (of a nation,a continent or world-wide), this could revolutionize the human experience.

Autonomy has the potential to eliminate hazardous and monotonous jobs (such asdriving heavy vehicles for long periods), and lead to safer machine operations ingeneral. It can also be beneficial for environmental reasons, by allowing increasedoptimization of resource usage.

Another important debate about autonomy is the dilution of responsibility. Thissubject has been heavily discussed, both in academia and in regular newspapers(see for example [2]). The basic question is who is to blame if an autonomousmachine causes an accident? Currently, it seems that this is dealt with throughextreme caution before introducing autonomous systems, but in the future thingsmight be different.

Used with responsibility and afterthought, autonomous vehicles will almostcertainly provide benefits to most. As such, the broader context that this projectoperates in is compatible with a sustainable development.

12

Part 2: Frame of reference

This chapter contains the theoretical framework on which the rest of the project isbased. A brief overview of work done in similar fields is given. This is followedby a review of the theoretical foundations for the particular methods used in thisproject.

2.1 Previous work

In this section, some of the previous work in the area is discussed. This is dividedinto the more general case of vehicle perception, and the more specific case of radarusage within learning systems. This provides a different background perspectivethen what was discussed in the introduction, and serves as a foundation for theexploration of a solution space.

2.1.1 Radar based vehicle perception

There are several examples of projects that have tried to accomplish a similarthing as this one. Below, some of these are presented and their relevance to thisproject is discussed.

Vu 2010 In [3], Vu proposes a two-fold way of detecting and classifying movingobjects. The work performed was done within the framework of the Europeanproject PReVENT ProFusion.

The first part consists of the usage of SLAM (Simultaneous Localization AndMapping) with Detection of Moving Objects. This approach uses odometry inconjunction with a laser scanner to sense the environment. An object is perceivedas dynamic if it occupies a space that was previously unoccupied.

Vu also stores a dynamic map of the environment in conjunction with the staticone. This serves to increase the likelihood that an object is dynamic if detectedin an area with a history of containing several of dynamic objects.

For the purpose of clustering detections into objects, a simple distance thresholdof 0.3m is used.

In the context of moving object classification, the output of the dynamic SLAMcan be seen as clustered laser detections that are hypothesized to be dynamic.

The second part of the method proposed by Vu is for classifying moving objectsas well as estimating tracks, through the solution to the DATMO (detection andtracking of moving objects) problem. It operates by using a sliding window methodof finding sequences of clusters through a time-series of frames. Class hypothesesare then calculated through fitting the box size and movement characteristics of

13

four pre-defined classes to the detection sequences. Here, a data-driven Markovchain Monte Carlo (DDMCMC) is used, and the maximum a posteriori estimationis calculated from the space of all possible hypotheses. This method can then beused to predict future positions of objects.

The pre-defined classes used is pedestrian, bike (including motorbike and bicycle),car and bus. The models for the respective class is derived from the averages ofexternally gathered statistics.

The work done by Vu has some points that are interesting to this project, butdiffers greatly on a few critical points:

The use of a laser scanner (a sensor that provides an almost complete surfacemap of the surrounding environment, but at a very high cost) as the main sourceof sensory input, and lack of usage of radar sensors.

The use of detection history (a static and a dynamic SLAM map) for any stage ofthe detection and classification to be conducted.

The application of detection and classification in later signal processing stages(after SLAM).

Since our project is not specifically concerned with tracking objects (this willoccur at later stages), nor with using detection history, the usage of the DATMOnomenclature is not directly applicable (all though concepts might be similar).

Garcia 2014 In [4], a similar nomenclature is used, but more focus is placedon using several different sensors and through sensor fusion reaching better results.Here, the distinction is made between SLAM as modelling the static environment,and DATMO as covering the dynamic parts. This project was supported underthe European Commission project interactIVe.

A fundamental insight for Garcia is that the object class information is usefulfor tracking (later stages), and thus classification is better performed at detectionlevel (earlier stages). He assumes that the SLAM is solved, and concentrates onthe DATMO.

Garcia uses radar, LIDAR and camera as sensory inputs. Two classificationapproaches are proposed:

The first approach uses camera images to classify objects. This method usesHOG (Histogram of Oriented Gradients) descriptors and integral images derivedfrom the cameras, along with machine learning methods (discrete Adaboost) toconstruct a classifier.

The second method uses radar sensor input, but only to infer object velocity(relative velocity or estimated target velocity) as an input to a sensor fusion objectrepresentation.

14

Garcia uses a qualitative system evaluation that basically consists of showingsystem outputs for a few different scenarios and discuss whether they are good ornot. A quantitative evaluation is done through the creation of truth data fromseveral different driving scenarios, and comparing the classification results withthe truth data.

The approach used by Garcia has more in common with the aims of this project, inthat it uses radar sensors and machine learning methods for object classification.It also has useful methods for system evaluation. Critically, it differs in the use ofseveral other sensors, of detection history, and of the choice of machine learningmethod.

Mercedes Bertha 2013-14 In [5] and [6], the team behind the Bertha projectdescribes their use of a sensor platform consisting of several different radars inconjunction with stereoscopic cameras. They construct a light-weight objectrepresentation made from stixels and apply a mixture of experts machine learningmethod to classify objects. However, the reports concentrate heavily on theoutput produced by the cameras and do not specifically discuss the radar sensorscontribution. They also state a heavy reliance on pre-existing static maps andgood vehicle localization.

PROUD-Car Test 2013 In [7], the team behind the Vislab PROUD CarTest 2013 demonstrates an autonomous vehicle platform. Their sensor platformconsists of laser sensors and stereo cameras, and is thus of little specific interestto this project.

2.1.2 Doppler radar as input to learning systems

Vehicle perception is not the only area where Doppler radars and machine learningmethods have been used in conjunction. Below, some examples of other usagesare presented.

Waske and Benediktsson 2007 The use of machine learning methods asopposed to statistical models can be beneficial in areas where little knowledgeexists about the data. This is because one is not constrained to a prioriassumptions on how the input data is distributed. Machine learning methods alsoallows for weighting of different features that might be more or less representativefor different classes, which is hard to do with statistical models. For this reason,[8] uses machine learning methods for classification of land coverage using multiplesensors. It is concluded that a multi-layered support vector machine is the mostaccurate for classifying on the particular data type.

Allthough using radars as one sensor type, besides the use of general machinelearning methods, [8] give little advice with regards to the specifics of this project.

15

Cho et al. 2009 In [9], the authors explore a vehicle classification schemeinvolving radar data, support vector machines and k-means clustering. Theyuse the frequency domain response of the signal, and extract two features toaid classification of vehicles into the classes small vehicle and big vehicle. Theauthors use FMCW (Frequency modulated continuous wave) radars, which differsfrom the mono-pulse radars used in this project. The authors also have access tothe frequency response on which they compute features and classify. This data isvery different from the data used in this project (consisting of mono pulse Dopplerradar detections, not frequency responses). Hence, the methods described in [9]are not directly applicable to this project.

Others There is an abundance of studies both regarding classification methodsof Doppler radar data (such as [10], [11] and [12]), but they either concern theDoppler frequency response, or do not use machine learning methods (or othermachine learning methods than support vector machines). Many other studieswithin a similar area have been read and discarded as being irrelevant. No studieshave been found that have been thought to be more relevant then readily accessiblemachine learning and clustering theory.

The above research suggests that even though much research exists within thegeneral topic, the specifics of this project differs from previous work on severalcritical points.

2.2 Doppler radar perception and integration of multiplesensors

This section contains a brief outline of the basic theory of Doppler radars and howthey can be used to enable perception. Also in this section is a description of howseveral radars can be integrated and combined with different types of sensors.

2.2.1 Basic Doppler radar theory

Radars can provide different types of information about an object, such as theangle at which the object is detected, and the distance and speed of the objectrelative to the radar [13]. This is done by emitting radio waves in a certaindirection and studying the properties of returning waves reflected by a target.

The fact that all electromagnetic waves travel at the speed of light makes it possibleto calculate the distance to an object by measuring the time delay between atransmitted wave and the return of its reflection.

In [14], the range r to an object is expressed as:

r =c∆t

2, with c ≈ c0√

εr(1)

16

Here c0 ≈ 3 × 108 m/s is the speed of light in vacuum, while εr is the materialpermittivity.

Due to the Doppler effect, an object that moves towards or away from theradar will cause the frequency of its reflected waves to be different than thatof the transmitted waves. In Doppler radars, this phenomena is used to obtaininformation about the velocity of an object relative to the radar.

This velocity r is usually called the range-rate, radial speed or Doppler speed. In[14] it is expressed as follows:

r =cfd2ft

for v � c (2)

Here fd = fr−ft is the Doppler frequency shift; the difference in frequency betweenthe transmitted wave ft and the reflected wave fr.

In pulse-Doppler radars, the emitted waves are modulated by a pulse train, causingthe radar signal to be emitted in short bursts [14].

Modern radar systems use signal processing techniques to modulate signals withdifferent frequencies or polarization. This enables separation of waves originatingfrom multiple different targets, and can also prevent radars from interfering witheach other [13].

Doppler radars exist in a multitude of different configurations and with manytypes of outputs. For more detailed technical information about the radar systemused in this particular project, see section 1.2.2, Target system.

The next section contains details of how multiple radar sensors can be integratedinto one single system, and work together with other sensors such as IMU andGPS.

2.2.2 Sensor fusion and integration

When using radar to enable vehicle perception, it is common to use several separateradar sensors mounted in different locations. The reason for this is that a singleradar has a limited field of view whereas a radar system may be required to seein several directions.

In the case of multiple independent radars, each separate radar may deliver datain relation to its own inherent polar coordinate system. Here r is the distance tothe detection, while φ is the angle between the central normal line n of the radar,and the line on which the detection is located.

In order to integrate separate radars into one single sensor system, the data fromeach radar must be translated into the same coordinate system.

17

(a) Radar coordinate system (b) Local coordinate system

Figure 4: Coordinate systems

An illustration of a radar-specific polar coordinate system xr,pol =[r φ

]Tcan be

seen in figure 4a. To fuse the radars, the data from each specific radar should betranslated to a common coordinate system. In the case of this project, a suitablecoordinate system to work in is a Cartesian system that moves with the truck itself.

An example of such a coordinate system is the local system xlocal =[x y

]T, with

origin located at the center of the rear axis of the truck. This local coordinatesystem and its relation to a radar specific system can be seen in figure 4b.

In order to transform data from a radar-specific system to a local Cartesian system,it should first be transformed into a radar-specific Cartesian coordinate system

xradar =[xr yr

]T. Such a system is seen in the lower right part of figure 4b.

It has the same origin as the original polar system, but with x-axis parallel, andy-axis perpendicular, to the radar normal axis n.

This first transform (from polar to Cartesian coordinates) can be formulated asbelow:

xr = r cos(φ)

yr = r sin(φ)(3)

Once a position in the radar-specific Cartesian system xradar is known, a transforminto the local Cartesian system can be made as follows:

xlocal =

[cos(ψ) sin(ψ)−sin(ψ) cos(ψ)

]xradar +

[xsrr,localysrr,local

](4)

Here, xsrr,local, ysrr,local and ψ represent the position and orientation of theradar-specific coordinate system relative to the local system (details of the exactradar mounting position is required here). These are the parameters seen in redin figure 4b.

The transformations in equations (3) and (4) can also be performed on the range

rate r to derive the components xlocal =[x y

]Tin the local system. However, it

18

is important to differentiate between the velocity component caused by a movingobject, and the one caused by movement of the EGO system.

The range rate r only contains information about the velocity with which a targetis approaching, or moving away, in the normal direction of the radar. This doesnot yield any information on whether it is the target that is moving, or the radaritself. To get around this, the radar system can be combined with data from othersensors.

Using data from position and acceleration sensors such as IMU and GPS, accurateinformation of the vehicles EGO velocity can be obtained. By combining thisdata with the known positions of each radar, the radar velocity xSRR can bederived. By projecting this derived velocity on the Doppler speed of a detection,and subtracting it, the velocity component caused by the actual target movementis obtained.

Figure 5: Projection of radar EGO component on the range rate of a detection.

In figure 5 the concept is illustrated. An interesting case is when xdoppler = xproj,which in fact means that the detection is stationary (in the normal direction ofthe radar).

Mathematically, the projection is done as follows:

xproj =xSRR · xdopplerxdoppler · xdoppler

xdoppler (5)

The actual target velocity in the normal direction of the radar is then obtainedby:

x = xdoppler − xproj (6)

2.3 Theoretical overview of machine learning conceptsand methods

In this section, an introduction to machine learning is given together with asummary of some of the methods and problems that are of concern to this project.

19

A general explanation to some basic concepts of machine learning is followed bya more detailed description of the primary machine learning method used in thisproject - support vector machines.

The field of machine learning is usually divided into three main categories;Supervised learning, unsupervised learning and reinforcement learning. Most ofthe methods that are of concern to this project fall into the category of supervisedlearning, that is - learning from examples.

2.3.1 Supervised learning, classification and overfitting

In a supervised learning problem, pre-labeled input-output pairs must be availablefrom which a system can learn [15]. The learning system uses the information inthese training examples to adapt to patterns or trends in the data. If the learningis successful, the trained system can mimic behaviour, creating its own outputwhen exposed to new input data. Supervised learning is characterized by the needfor labeled examples or training data, that the operator or teacher uses to trainthe system.

Two classical problems in the field of supervised learning are the problems ofclassification and regression. Here, the objective is to create a mapping betweenone or several inputs and their corresponding outputs. In the classification case,the output takes the form of an integer, while regression methods can yield anoutput of any value [15].

The input is given as a vector of values. These values or features can take the formof continuous or discrete numbers that describe some property of the entity that isobserved. In the classification case, the output integer expresses which class thisinput vector likely comes from.

Each training example is composed of a pair of input and output values. In theclassification problem, each training example is given as a feature vector, and itscorresponding output class. Each input can be seen as a point in a certain featurespace. Here each element of the feature vector represents a position along anorthogonal axis in a Euclidean geometrical space.

Using supervised learning methods such as neural networks, support vectormachines or Gaussian processes, the training examples can be used to create acomplex structure that separates the feature space into areas of each class. Whena new input point is given, the output class is determined depending on where inthe feature space this point exists.

Binary classification In a binary classification problem, the goal is to predictfor a given data point in a considered feature space, which of two possible classesthis data points belongs to. In the case of a 2D linear classifier, this will be doneby finding a straight line (the decision boundary) that divides the feature spaceinto two separate parts. All samples on one side of the line will be predicted as

20

belonging to the first class, while samples on the other side will be predicted tobelong to the second class. In the case of a higher dimensional feature space,the linear decision boundary will instead take the form of a plane (in 3D) or ahyperplane (in higher dimensions). [15]

Figure 6: Linear decision boundaries in two and three dimensions

A dataset is said to be linearly separable if a straight line, plane or hyperplane isenough to separate the dataset so that that each point is on the correct side ofthe decision boundary. In many cases, this will not be possible. An example of anon-linearly separable case is given in figure 7.

Figure 7: An example of a non-linearly separable dataset

In order to handle non-linearly separable data sets, a non-linear classifier is needed.Some classification methods produce inherently non-linear decision boundaries,while others can be modified to enable non-linear separation. The usage of kernelmethods in support vector machines, or using hidden layers in a neural network

21

are examples of such modification. Using the right method, a decision boundaryof arbitrary shape can be created.

The decision boundary is found by presenting labeled training points to a learningalgorithm, that positions the boundary according to certain criteria. As can beseen in figure 6, there are an infinite amount of ways to place the decision boundaryand still achieve separation of the two classes.

One way to determine which decision boundary is optimal is to maximize thedistance between the decision boundary, and the closest data points on each side ofthe boundary. This distance is called the margin, and a classifier that determinesits decision boundary by maximising this distance is called a maximum marginclassifier.

Multiclass classification Binary classification is a very common problem toconsider in supervised learning, however, not all problems are limited to just twoclasses.

Many classification methods such as support vector machines are inherentlylimited to binary classification only.

For the methods that cannot inherently produce multiclass classifiers, techniquesexist with which to combine several binary classifiers into one multiclass model.

Studies of the most commonly used so called ”binarization” techniques can befound in [16], [17], [18]. A brief summary is given below:

One vs All A quite straight forward method to produce a multiclass ensemblemodel is the ”One vs All” scheme. (Also called One against All, One against restor OvA). Here a separate binary classifier is created for each class present in theoriginal multiclass problem.

Thus an m-class classification problem is substituted by m binary classificationproblems. Each classifier is trained to separate one particular class from all otherclasses. The training of classifier c is conducted by letting the class label yc = 1 forall training points belonging to this certain class, while yc = −1 for the trainingpoints belonging to any other class. This process is repeated for each of the mclassifiers.

To classify a new data point x, one output is obtained from each of the mclassifiers, and the model output is chosen to be the class whose classifier gavethe highest output score.

The OvA method has a downside in that it will be quite demanding in trainingtime, due to the fact that m classifiers must be created, and every one of them usesthe full training data set. Another problem with OvA is that each classifier willbe trained on inherently unbalanced data sets since generally, there will always be

22

much fewer positive training examples than negative. This can make the resultingclassifiers become biased, since they could be prone to adapt more to the biggerof the training sets.

The strength in OvA lies in its simplicity, and the fact that it may be slightly fasterin prediction than other techniques. This is due to the fact that in prediction time,only m classifier scores need to be calculated, whereas with other techniques thenumber may be higher.

One vs One The ”One vs One” (also known as all pairs, OvO or one againstone) method implies training one separate binary classifier for each pair of classesin the original multiclass problem. Each model is trained to differentiate betweena single pair of classes, and is trained using only the subset of training pointsbelonging to either of these two classes. For an m-class problem, this will resultin m(m−1)

2different classifiers.

In order to classify a new data point with the OvO scheme, the point is provided asinput to each of the classifiers. The most common way to combine their outputs isto let each classifier vote for one of the two classes upon which it has been trained,and then chose the majority vote as the final prediction of the multiclass model.

Since only a subset of the training data is used for each training, OvO is generallyfaster in training time than OvA even though a higher number of classifiers mustbe trained.

In prediction time however, OvO is generally slower due to the fact that m(m−1)2

scores must be calculated for each new input point, compared to just m in theOvA method. Another problem is that since each classifier is only trained todifferentiate between two of the classes in the original problem, the classifiers willoften encounter data belonging to none of the classes upon which they have beentrained. When this happens, the vote of these classifiers will be worthless in thefinal prediction, and they are sometimes referred to as incompetent predictors.

DAG and others Although OvA and OvO are the most commonly usedbinarization techniques, there exists a multitude of other techniques. The directedacyclic graph (DAG) and the binary tree of classifiers (BTC) are two othercommonly used methods. They are similar to the OvO method in that oneclassifier is trained for each pair of classes in the data. However, they have anadvantage in that not all classifier scores must be evaluated to classify a datapoint. Instead, a binary tree structure is traversed, where each node representsone classifier and every leaf represents one class to which the data point is finallyassigned.

In [19], it is argued that as long as the underlying binary classifiers are wellmade, the methods presented here are very similar when it comes to classificationperformance. For this reason, OvA could be the preferred method due to itssimplicity. If time complexity proves to be a challenge however, it may be worthlooking further into the other methods available.

23

Overfitting When performing supervised learning, a common problem is thatthe learning system adapts too much to the training data.

Should this happen, the system will not generalize well, meaning that it willperform poorly when exposed to unseen data. This problem is known as overfitting[15].

The problem has a great effect on classification performance when noise is presentin the training data. A model would be considered robust if it could look atthe trend of the data, without adapting to individual noise observations. A lessrobust model would adapt to the noise points, giving a high score when evaluatingperformance on the training data, but showing drastically decreased performancewhen looking at new data. This is a sign of overfitting.

The problem of overfitting is tightly connected to what is called the bias-variancetrade-off or dilemma [15]. As a predictive model grows more complex, its abilityto adapt to training data will increase, resulting in a decreased bias. However, thiswill also increase the risk that the model adapts to noise or temporary componentsin the data. This causes the variance - the dependence on what particular set oftraining data is used, to increase. As such, a successful model will neither be toocomplex, or too simple, as a simple model may have difficulty to adapt to thetrend of the data (underfitting). An illustration of the bias-variance dilemma canbe seen below in figure 8.

Figure 8: Illustration of the bias-variance dilemma

Dividing the dataset In order to evaluate a models capability to generalize,its prediction performance must be tested on unseen data. This can be achieved bydividing the original labeled training data into two separate parts, one for trainingand one for validation.

The prediction model is created using only the training part of the data, and itsprediction performance is evaluated on the validation part. This method can be

24

hard to use if the amount of labeled data is limited, as the division into trainingand validation sets will further limit the amount of data available.

Another downside of the method is that the resulting model, and the performancevalue, may be very dependent on the initial shuffling of data - which observationsend up in the training part, and which end up in validation part.

Cross-validation A commonly used method to reduce problems withoverfitting is cross-validation. In the k-fold cross validation method, the set oflabeled data is first divided into k equal parts. Out of these k parts, k − 1 areused to train a classification model, while the last part is used for validation. Byrepeating this process k times, with a new validation part each time, and averagingthe k different scores, the cross-validation score is obtained. [15]

By using this method, each observation in the data will be used for both validationand training purposes, making the cross-validation score less dependent on theinitial shuffling of the data. Another advantage is that this method can efficientlybe used even when the amount of labeled data is limited, as only a small part willbe used for validation in any single iteration.

2.3.2 Support vector machines as a method for classification

The support vector machine (SVM) is a learning method first introduced byVladimir Vapnik in [20]. The method was further refined in [21] and [22], andhas since become one of the most widely used machine learning methods.

The support vector machine is a maximum margin method that can be used forboth classification and regression.

One of the benefits of SVM is that it is formulated as an optimization probleminstead of an iterative method such as the neural network. This is beneficial sinceit provides a mathematical insight into the structure. Another advantage is thatthe training can be performed using optimization techniques such as quadraticprogramming [23].

Below, a linear support vector machine is shown. The name of the method isderived from the ”support vectors” - the data points that exist closest to thedecision boundary, thus laying on the margin itself.

25

Figure 9: The support vector machine visualized in a two-dimensional featurespace

In figure 9, the concept of support vectors and margin are visualised. More detailsabout the mathematical formulation of SVM are presented below.

Mathematical formulation and primal problem Since a linear decisionboundary takes the form of a hyperplane, it can be expressed as the set of pointsx that satisfies:

ω · x+ b = 0 (7)

Here ω represents the normal vector of the hyperplane, while the bias b is thedistance along ω to the origin.

The margin is defined by two separate hyperplanes both parallel to the decisionboundary:

ω · x+ b = −1 and ω · x+ b = 1 (8)

The size of the margin can be expressed as the geometrical distance between thetwo planes of equation (8):

Msize =1

‖ω‖− −1

‖ω‖=

2

‖ω‖(9)

The objective of support vector training is to maximise Msize while ensuring thatno data points exist between the two planes of equation (8). In fact only datapoints of class 1 should exist beyond the first hyperplane, and only points of class2 beyond the other.Mathematically, this condition is formulated as two inequality constraints.

26

For all data points x of the first class:

ω · x+ b ≤ −1 (10)

For all data points x belonging to the second class:

ω · x+ b ≥ 1 (11)

Using the class label yi = ±1 the two constraints (10), (11) can be rewritten as asingle inequality constraint:

For all N training data points 1 ≥ i ≥ N :

yi(ω · xi + b) ≥ 1 (12)

From equation (9), it can be seen that in order to maximize the size of themargin, ‖ω‖ needs to be minimized.

Thus, the primal SVM optimization problem is stated as:

minω,b

1

2‖ω‖2

subject to yi(ω · xi + b) ≥ 1

i = 1, . . . , N. (13)

Since the evaluation of ‖ω‖ requires a square root, the reformulation into 12‖ω‖2

is made. This has the same minima and results in a problem suitable to be solvedusing quadratic programming techniques.

Dual SVM formulation In equation (13), the primal formulation of the SVMoptimization problem was given. However, when implementing support vectormachines, it is more common to make use of the dual problem.

In [23], the dual formulation is written as:

minα

1

2·α · diag(y) ·G · diag(y) ·α− e ·α

subject to α · y = 0

αi ≥ 0 i = 1, . . . , N. (14)

This formulation of the SVM problem is on a form readily solvable using convexoptimization toolboxes. Here e is a vector containing only ones, while diag(y) is a

27

diagonal matrix containing all class labels yi = ±1. Also used is the Gram matrixof dot products G ≡ xi · xj.

Solving the optimization problem (14) yields the vector of Lagrange multipliersα. These can in turn be used to find the optimal separating hyperplane normalvector ω:

ω =N∑i=1

(αiyixi)

Only a small subset of the training points xi will have corresponding αi 6= 0.For these points, the inequality constraint in equation (12) will be an equalityconstraint:

yi(ω · xi + b) = 1 (15)

This means that they lie exactly on the edge of the margin; they are the supportvectors [23].

Once the optimal hyperplane normal vector ω is known, equation (15) can be usedto obtain the hyperplane bias b:

b = yi − ω · xi (16)

In [23], the bias value used is a weighted average value of b over all support vectors:

b =∑i

αi(yi − ω · xi)/∑i

αi (17)

When ω and b are known, the SVM classifier is constructed. The decisionboundary is defined by equation (7) and thus any new datapoint xn can beclassified according to

class(xn) = sgn(ω · xn + b) (18)

Kernels and soft margin The method on which SVM is based was firstpublished in 1963 in [20]. However, this method, as well as the formulation givenin equation (13), is unable to handle classification problems unless they are linearlyseparable.

In practise, very few data sets can be separated this way, and for this reason themethod was not widely used. In [21], a way to enable nonlinear support vectorclassification using kernels was presented.

If the feature data can be transformed to a feature space of higher dimensionality,the classes may be linearly separable in that space. A linear decision boundarythat separates the points in the high-dimensional space will be non-linear in theoriginal feature space, enabling separation of non-linear data sets.

28

A simple illustration of the transformation and resulting decision boundaries isgiven in figure 10.

Figure 10: Illustration of the kernel concept

Mathematically, the transformation is described by the function φ.If x ∈ Rn then φ(x) ∈ RN , where N > n.

To enable advanced decision boundaries in the original space, N is usually a veryhigh number. With the primal formulation of SVM, this can lead to computingproblems since the resulting optimization problem will be extremely big. With thedual formulation however, the size of the optimization problem will not change[23].

What makes the method even more powerful is that there is no need to computeany transform φ(x) directly, due to what is called the kernel trick. Since the dualformulation of SVM (see equation (14)) only contains data points x in the form ofdot products within the Gram matrix, it is enough to compute the dot productsof the points in the transformed space φ(xi) · φ(xj).

The kernel function K is a way to calculate these dot products implicitly, withoutever computing the data point coordinates in the higher-dimensional space.

Thus in order to implement non-linear classification with SVM, the only thingneeded is to replace the Gram matrix G ≡ xi · xj with the kernel functionK(xi,xj) ≡ φ(xi) · φ(xj).

The function K(xi,xj) can be chosen as any function that satisfies certainproperties of an inner product. When implementing support vector machineshowever, it is common to use one of the following basic kernel functions [23], [24]:

• Linear kernel (the Gram matrix)K(xi,xj) = xTi xj

• Sigmoid KernelK(xi,xj) = (γxTi xj + r)

29

• Polynomial KernelK(xi,xj) = (γxTi xj + r)p

• Radial basis function (RBF kernel)

K(xi,xj) = e−γ‖xi−xj‖2

Here γ, p and r are kernel parameters. A method to choose suitable values forthese parameters is given in the section about model selection further below.

It should be noted that the usage of kernels may result in a slightly extendedexecution time in the prediction stage, since the Gram matrix only represents a dotproduct, while the different kernel functions may require additional computations.

If class separation is possible using the linear kernel, it should always be chosen,since this will minimize execution time and also reduce the risk of overfitting.

Soft margin To reduce the risk of overfitting, it is common in machine learningtechniques to have what is called a regularization parameter. This parameter canbe seen as a tool to control the complexity of the model, and to prevent it fromadapting to noise in the training data.

In the SVM formulation given in the beginning of this section, it was stated thateach data point should be on the correct side of the two planes that make up themargin. Mathematically, this was formulated as the constraint in equation (12).

yi(ω · xi + b) ≥ 1

Due to this constraint, the original formulation is very sensitive to noise in thetraining data. With linear SVM, a single noise point can lead to a margin ofdrastically reduced size, or an optimization problem that is not solvable at all.

In the case of non-linear SVM (using kernels), the constraint, in combination withnoise in the training data, often leads to a model of unnecessary complexity. Thedecision boundary of such a model will curve and adapt to every single noise pointin the data, resulting in problems of overfitting.

In [22], what is now called ”soft-margin” SVM was presented. With thismodification, the primal optimization problem can be expressed as follows:

minω,b

1

2‖ω‖2 + C

N∑i=1

ξi

subject to yi(ω · xi + b) ≥ 1− ξii = 1, . . . , N. (19)

As can be seen, the constraint from (12) has been slightly modified. Thesoft-margin formulation of SVM introduces slack variables ξi that permits data

30

points to exist on the wrong side of the margin. By specifying the regularizationparameter C, the user can decide how much ”slack” is allowed. The resulting SVMmodel is much more resistant to noise, since individual points are allowed to existon the wrong side of the margin.

In the dual formulation of SVM, the soft margin implementation only leads to anadditional constraint αi ≤ C.

Figure 11: Illustration of soft-margin SVM

In figure 11 the benefit of soft margin is clear; a much wider margin is createdinstead of adapting the decision boundary to the individual noise point.

Model selection The predictive performance of a support vector machinemodel is dependent on not only the training data from which it is constructed,but on SVM parameters as well.

A normal scenario is that there are two parameters to select: the soft-margin costparameter C, and a kernel parameter in the case of kernel-SVM.

When C →∞, the cost for slack in the model will be so high that it will in effectbe a hard margin classifier. This means that each training observation must beplaced on the correct side of the decision boundary.This will yield a model that always gets a 100% score on training data, but isextremely sensitive to overfitting and noise. As C gets smaller however, the errorrate of the model will increase as more and more points will be placed on thewrong side of the decision boundary, and the model complexity is reduced.

The mathematical formulation of SVM, and the fact that this parameter can takeany value, makes it hard to intuitively decide a suitable value for C, it is insteadrecommended to use some form of iterative method and choose the value whichyields the best results for the dataset in question.

31

Grid-search A straight forward, and very powerful method to chooseparameters, is the parameter grid search, suggested in [25], [24]. This is a bruteforce method, and as such it has a very high time complexity and may require alot of processing power and time. In spite of this, the simplicity and performanceof the method has led to it being very commonly used, such as in [16], [26]

In the case of linear SVM, with only one parameter to find, a region of possiblevalues to assess is defined, and for each C value in this region, a cross-validationscore of the classifier on this particular dataset is obtained and stored. The Cvalue is then chosen as the one that yielded the highest score.

In the case of kernel SVM, there may be additional parameters to identify, such asthe RBF-kernel parameter γ or the polynomial kernel parameter p. In these cases,a region of values to assess is defined for all parameters, and a grid composed of allpossible parameter combinations is created. After looping through the grid andgetting all the cross validation scores, the parameter combination that yielded thehighest score is chosen.

If several parameter choices yield similar cross validation scores, it is important toselect one that yields good generalization in the model. For C and p, this meanschoosing as low a value as possible, since this will reduce the complexity of themodel. For γ however, the highest value should be chosen, since γ → ∞ will ineffect result in a linear kernel and a less complex model.

A less complex model is always desirable since it will reduce the risk of overfittingand improve generalization capability. See the part about overfitting in section2.3.1.

A less complex model will also mean that fewer support vectors are required, andthus reduce the execution time of the classification stage.

SVM outputs and Platt-scaling When using an SVM model to predict theclass of a new data point xn, the point is inserted into the hyperplane equationusing the known hyperplane parameters:

f(xn) = ω · xn + b (20)

The sign of f describes which side of the decision boundary the data point existson, and thus which class it likely belongs to. The size of the output in turn givesan indication of how far from the decision boundary the point exists. In this sense,a large value of f would mean that the prediction is quite accurate, while a valueclose to zero means that the point is close to the decision boundary and couldeasily belong to the other class as well.

As such, the SVM output score is quite unintuitive, as it is hard to know whatvalue the output should take before the prediction can be considered to have agood level of certainty.

32

In order to facilitate post-processing, there have been methods developed thatcalibrate SVM scores to a probabilistic output. One such method is called Plattscaling.

The method works by fitting the outputs of the original SVM to a sigmoid function.In [27], the probability that the data point belongs to the positive class is calculatedas:

p(y = 1|f) =1

1 + eAf+B(21)

A large positive value of f will yield a Platt score approaching 1, while a largenegative value will give a Platt score approaching zero. Besides being boundedbetween 0 and 1, the probabilistic score has an advantage in that it gives a moreintuitive idea of the certainty of a prediction.

The idea behind Platt scaling is the assumption that the output of a SVM isproportional to the log odds of positive examples. Using this assumption, Plattfinds the parameters A and B using maximum likelihood estimations from aset of training data. It should be remembered that the Platt output is onlyprobability-like (not a true probability but an estimation).

2.3.3 Classification performance analysis

As is described in section 2.3.1, a common way to evaluate the performance of aclassifier is to test the system on a set of validation data that was not used whentraining the system. By comparing the outputs of this test against the knownclass labels of each validation data point, several performance parameters can beobtained.

In a binary classification problem, the parameters below are commonly used toevaluate performance [28]:

• True positives - These are the positive examples correctly classified aspositive.

• True negatives - These are the negative examples correctly classified asnegative.

• False positives - These are the negative examples incorrectly classified aspositive.

• False negatives - These are the positive examples incorrectly classified asnegative.

To enable an easy overview, the parameters can be used to construct a confusionmatrix or table of confusion:

33

Table 2: Binary classification confusion matrix

Actual class Classified as 1 Classified as -11 nr of true positives nr of false negatives-1 nr of false postitives nr of true negatives

Based on the four parameters, several measurements of classification performancecan be calculated. The simplest way to measure performance is the accuracymeasurement [29]:

Accuracy =ntp + ntnntot

Here, ntp and ntn are the number of true positives and true negatives respectively,while ntot = ntp +ntn +nfp +nfn is the total amount of validation data examples.

The accuracy measurement has a weakness in that the score does not necessarilyprove good performance for unbalanced data sets. For example, a datasetcontaining 90% positive examples would yield a score of 90%, for a classifier thatcan only yield positive outputs.

The error measurement is tightly connected to accuracy:

Error =nfp + nfn

ntot

The error measurement can be useful due to the fact that a small change inaccuracy often reflects a big change in error. For example, an improvement inaccuracy from 0.9 to 0.95 means a 50% reduction in error.

To provide further insight into the performance of a classifier, several othermeasurements can be calculated [28], [29]:

Precision =ntp

ntp + nfp

The precision or positive predictive value is a measurement of the fraction ofexamples classified as positive that were actually positive.

Sensitivity and specificity, also called true positive rate and true negative raterespectively, are two additional measurements:

Sensitivity =ntp

ntp + nfn, Specificity =

ntnnfp + ntn

34

Sensitivity is a measurement of the fraction of positive examples that werecorrectly classified as positive, while specificity is the fraction of negative examplesthat were correctly classified as negative. Another word for sensitivity is recall.

If a classifier is tuned with the purpose of getting a high precision value, the recallvalue has a tendency to decrease. If the aim is a high recall value, the precisionmay decrease.

F-Measure is a measurement that combines the two as the harmonic mean ofprecision and recall:

Fmeas =2 · precision · recallrecall + precision

For multiclass classification problems, the parameters true positive, true negative,false positive and false negative have a slightly different meaning, since there aremore than just the positive and negative output. However, a class-specific variantof the measurements can be calculated by looking at each class separately.

In the multiclass case, for one specific class c, the number of true positives tpc areall validation data points correctly classified as belonging to c. The true negativestnc are all the data points that did not belong to c, and were classified as notbelonging to c (regardless if they were correctly classified or not).

The false positives fpc are all the points predicted as belonging to c, when in factthey did not. The false negatives fnc are all points that belong to c, but wereclassified as not belonging to c.

These new definitions enable class specific variants of the measurements mentionedearlier, such as precision and recall.

In [28] and [29], additional measurements are considered for multiclass problems:

Averageacc =1

C

C∑c=1

(ntpc + ntnc

ntotc

)This is the average of the class-specific accuracy values over all C classes.

Averageerr =1

C

C∑c=1

(nfpc + nfnc

ntotc

)This is the average of the class-specific error over all C classes.

MeanFmeas =1

C

C∑c=1

(2 · precisionc · recallcrecallc + precisionc

)Mean F-measure (MFM) is the average of F-measure values over all C classes.

35

2.4 Extraction and analysis of object descriptions

In this section, means of extracting and analysing relevance of descriptive datais discussed. As discussed in section 2.3.1, Supervised learning, classificationand overfitting, a pre-condition to supervised learning is the existence of trainingexamples. These examples, consisting of input-output mappings in the form offeature vectors with class labels, have to be constructed somehow.

Consider the specifics of this project: input data is delivered in the form of radardetections with spatial location and some other parameters. These detectionsshould be used to describe a real-world object, and in the end determine whichclass this object belongs to. There are at least two necessary steps required toaccomplish this: the clustering of data, and the subsequent feature extraction fromclustered data.

In the clustering step, input data points are grouped together into clusters believedto come from the same real-world object. This is done using some heuristicor domain knowledge, such as spatial closeness. In a machine learning context,clustering can be seen as unsupervised learning. This is because it needs no trainingor labeled examples to extrapolate category information or discriminate from aset of data.

In the feature extraction step, computations are performed on the clustered subsetsof data in order to extract descriptive values. Such a descriptive value can be theaverage or variance of a certain measurement type, like amplitude.

Closely connected to the feature extraction is the concept of feature selection. Sincethere is a virtually limitless space of feasible computations that can be made ona set of data (implying a nearly infinite set of different feature extractions), oneneeds a method to determine the relevance of a certain feature as well as someheuristic as to what is a good feature in general.

While feature extraction is straight forward and only implies performingcomputations, feature selection covers the more delicate considerations that haveto be made in order for the extraction step to perform well.

The methods for feature selection and evaluation, specifically the use of principalcomponent analysis, are also useful for general data exploration. This explorationis naturally an integral part of a project such as this, since domain knowledgeregarding the data is necessary.

In the sections below, methods available for spatially clustering data, as well asmeans to perform feature selection are discussed.

2.4.1 Clustering of sensor data

In this section, some of the relevant aspects of the theory of clustering of data arepresented. As stated in section 1.2.3, Project requirements, clustering is considered

36

a soft requirement to the success of this project. Hence, some dedication to thetheory available with regards to clustering of data is presented below.

There is no exact definition of what constitutes a ”cluster”. One needs somedomain knowledge and pre-conception of what a cluster is in a particular contextin order to apply an appropriate clustering method [30]. If one has little domainknowledge, and a data set consisting of densely populated regions separated bysparse areas containing noise points, a density-based clustering method can bepreferable [31][32].

There are several such density-based clustering algorithms available, for exampleDBSCAN (Density Based Spatial Clustering of Applications with Noise) [32] thatwill be described in detail below, and OPTICS (an extension of DBSCAN thatallows for a more automated choice of clustering parameters [33]). There are alsoseveral extensions for parallel computing of the algorithm, such as PDBSCAN[31].

Clustering of sensor data with DBSCAN The fundamental idea ofdensity-based methods is to formalize the notion of density that comes intuitivelyfor a human [32].

DBSCAN essentially works by defining a point as belonging to a cluster if it lieson at least distance Eps from another point belonging to said cluster, said tobe (directly) density reachable. A cluster is formed if there are at least MinPtsnumber of points that are density reachable to each other. The points of a clusterneed not be directly density reachable, it is enough if they can be indirectlyconnected [32]. Figure 12 below illustrates these clustering requirements:

Figure 12: DBSCAN clustering method

In figure 12, the parameter Eps determines whether points are within densityreachable distance from one another. Clustered points are displayed as green,points that are density reachable but not clustered due to too few points are red

37

and the black point in the middle is not density connected to any other points.The green points are all considered indirectly density reachable and thus becomedensity connected. Since the number of points that are density connected is alsoat least MinPts, they form a cluster.

DBSCAN can incorporate any distance function, with the most common being theEuclidean distance (as is the case in figure 12 above). However, if one has domainknowledge that suggest clusters have a particular shape, other distance functionscould be used [32].

When using DBSCAN, one can determine suitable clustering parameters withlittle domain knowledge, which is beneficial in many cases. The parameters Epsand MinPts do have to be specified, so SOME domain knowledge is required.However, if one has an idea of the characteristics of a typical data set this isquite straight forward. For example, one can determine the smallest cluster thatshould be detectable by manually looking at data sets. One can also determine areasonable Eps distance threshold by the same method.

Since the methods behind parameter selection are so heavily dependent on the datadomain, general theory about this would be of little use. Instead, the specifics ofparameter selection are discussed in the method chapter.

The time-complexity of DBSCAN on an un-indexed data set is O(n2). If oneneeds a faster computing time and can pre-partition the data, a time-complexityof O(nlogn) can be reached [31].

There could be performance optimizations to be made from several different anglescompared to just using DBSCAN with a Euclidean distance function. One coulduse a different distance measurement that takes advantage of some data domainknowledge (for example, if one knows that clusters always appear in elliptic shapeswith a certain direction, one could use such a distance measurement). One could,if performance is too slow, try to incorporate a pre-partitioning of the data set toinput into the clustering algorithm. And finally, one could use some more advancedalgorithm like OPTICS instead of DBSCAN to automatically, even dynamically,determine clustering parameters and therefore have an easier time dealing withclusters of different sizes.

Algorithm description Consider the data set D containing N number ofpoints pn, n = 1, ...N to be clustered. The epsilon neighbourhood Nε(p) denotesthe collection of points within distance ε (the Eps parameter) to a chosen pointp. The minimum number of points a cluster should contain is denoted MinPts.A point is considered undecided if it has not been assigned to a cluster, or labeledas noise. The algorithm goes:

38

Algorithm 1 DBSCAN

1: Initialize the state of each point pn in data set D to be undecided.2: while there exists undecided points in D do3: choose an undecided point pn and compute Nε(pn)4: if |Nε(pn)| ≥MinPts then5: form a new cluster C and insert pn into C6: form set C ′ containing Nε(pn)− pn7: while there are undecided or noise points in C ′ do8: for each undecided or noise point qn ∈ C ′ do9: insert qn in C and compute Nε(qn)10: if |Nε(qn)| ≥MinPts then11: expand C ′ to contain Nε(qn)12: end if13: end for14: end while15: else16: label pn as noise17: end if18: end while

When the algorithm has finished, one is left with the original collection of points,with a cluster ID (or a corresponding noise label) associated with each point.

2.4.2 Selecting and extracting features from data clusters

In this section, the methods and concepts regarding the selection of featuresrelevant to this project are presented. The methods presented here includesome theory behind feature selection, as well as some theory behind themethod of principal component analysis which is a common way of looking athigh-dimensional data.

Feature selection The concepts of feature selection are important for manymachine learning applications. If there exists a large space of features availablefor a particular data set, unless computing power is not an issue, some form offeature selection has to be performed. This can be either a formalized way, suchas a specific algorithm, or one can use some domain knowledge or useful heuristic[34].

Data exploration is a closely connected theme, as the evaluation methods usedfor feature selection can also be seen as descriptors of the data set. Thus, themethods presented here can also be used for stating certain characteristics of thedata set used.

Feature selection is done to reduce the dimensionality of the data set to be usedin a particular machine learning application, with the intention of increasing

39

time-performance, increasing classification accuracy, or both. Feature selectionis to be distinguished from feature extraction (the process of combining featuresfrom the original data set into a new set of features).

It is an area subject to much research, and there is ample theory available (see forexample [35] and [36].

According to [34], the process of feature selection (or feature evaluation) can bedivided into three different approaches. These are the filtering, wrapper andembedded approaches. They will be briefly explained below.

Filtering The filtering approach works independent of which classificationmethod is used. It uses parameters such as quality measurements from informationtheory (with regards to for example information gain) to generally determinewhether a feature is useful or not. The upside of using the filter approach isthe universality, it can be applied in any circumstance. The downside is thatit could negatively affect performance, and also that it inherently lets throughfeatures that might provide very little (but still positive) information gain.

Wrapper The approach of using wrappers means evaluating the performanceof a subset of features on the classification results (thus ”wrapping” the featureselection into the rest of the machine learning procedure). If classification accuracyis the most important performance characteristic, this approach is good. It hasthe downside of possibly inducing bias (in that it adapts the features used tothe particulars of the rest of the machine learning methods). It can also be verycomputationally expensive during learning, and thus prove to be unfeasible.

Embedded approaches The usage of embedded approaches imply usingmachine learning methods that have inherent means of feature selection. Methodssuch as artificial neural networks applying pruning, and decision trees, areexamples that have inherent mechanisms for ordering or selecting features.

The above distinctions can be useful from a theoretical perspective. However,when conducting feature selection (and often in conjunction with featureextraction), the usual and widely accepted method is to have some initialproposition of features to use, evaluate their performance and optimize the featureset for some criteria. To have an initial feature proposition, one can use somealgorithmic methodology, but more common is the use of some heuristic or whatcan be referred to as expert knowledge [34].

Below, a commonly used method of evaluating features and of looking athigh-dimensional data in general is presented.

2.4.3 Principal component analysis for feature evaluation

Principal component analysis (PCA) is a useful tool for reducing thedimensionality of data sets that would otherwise be hard to visualise. The

40

main idea is to transform the data set in question into a new set of variables,the principal components (PC). These are uncorrelated with each other, and areordered in such a way that the first variable contains most of the variance of thedata set, the second contains second most and so on[37]. Mathematically, thefirst PC is defined as:

α′1x = α11x1 + ...+ α1pxp =

p∑j=1

α1jxj (22)

Where α1 is a vector of p constants, x is a vector of p random variables and thelinear function α′1x is a line along which x has the highest variance. The second PCis the line α′2x, which is the line uncorrelated with α′1x having maximum variance,and so on. There are p possible PCA:s to find, all though one rarely wants touse all of them (since that would defeat the purpose of dimensional reduction) [34].

The PC:s correspond to the eigenvectors of the covariance matrix Σ (or inthe more usual case of an unknown covariance matrix, the sample covariancematrix S is used) of the random variables x. The first PC corresponds to theeigenvector with the largest eigenvalue, the second PC with the second largesteigenvalue and so on. They are usually found by means of Lagrange multipliers,by solving:

(Σ− λkIp)αk = 0 (23)

so that α′kx is the k:th PC and var(α′kx) = λk where λk is the k:th largesteigenvalue [34].

When deciding how many principal components to use, one can look atthe cumulative percentage of total variation, defined as:

tm =100

p

m∑i=1

li (24)

where tm is the cumulative percentage of variation, p is the total number ofvariables, m is the number of principal components used and li is the variationpercentage of the i:th principal component.A sensible cutoff is between 70 and 90 percent, all though an exact figure is hardto recommend[37].

2.5 Frame of reference conclusions

In this section, current best-practice and previous work has been examined.Existing theory behind concepts and methods of interest have been presented anddiscussed. In this section, we state some conclusions regarding the areas identifiedas having a major relevance for what the final system solution looks like.

41

2.5.1 Current best-practice in vehicle perceptionWhen it comes to current best practice, a few critical issues that separate thisproject from other projects within similar fields have been identified:

• Others use other sensors as well as radar (and often not radar at all).The employment of laser sensors is extensive, and while this sensor systemprovides a very good source of input, it is tremendously expensive comparedto radars. Also, since it contains moving parts, it is much less robust then thefixed electronics of a radar. The use of cameras is also extensively employed,and while cameras have a comparable cost and robustness of radars, theyget easily obstructed and demand more computing power then radars. Also,in the cases where cameras are employed together with radars, it is often thecamera data that is the main source of classification data. Radars are oftenjust helping the system, instead of constituting a majority of it.

• Others always have history-dependent solutions (like Kalman filters orMarkov chain Monte Carlo based methods). In our case, this is bothimpossible (due to requirements) and unwanted (due to wanting a low-levelsystem). The accuracy of systems in previous work must be considered inthe light of this when being compared to our project.

• In the cases where there is great similarity to this project, previous workdo not divulge the specifics of their solutions, such as which algorithmsthey use for clustering or what features they look at for their supportvector machines. While there might be many reasons to avoid being specific(such as business secrets, focus being put elsewhere, uninterest/ignorance oracademic fuzziness), this also means that there is little theory available withregards to state-of-the-art methods.

Previous work differ to the point that the current nomenclature usually employedin vehicle perception is not really applicable here. Despite heavy sensor prevalencein both industry and research, the usage of radar sensors in this particular contextis not well-explored.

No real advice have been found in the more general cases either, in the articlesdescribed under 2.1.2, Doppler radar as input to learning systems. Critically, alot of previous work in the area of Doppler radar data classification deals with rawdata, and not the pre-processed radar detections that the sensors in this projectdelivers. It seems that the particulars of this project is not a well-explored area.

There might exist previous cases where a similar usage of radar sensors is discussed.However, none such have been found. Thus, all though concepts and certainmethods can be used within this project, the solution approach has to be differentand has to be developed without a comprehensive foundation of previous projectsto rest on.

Consequently, our approach is based on lower-level research of the different stepsnecessary to complete the requirements of this project.

42

2.5.2 Theory and methods employed

Here, some brief conclusions regarding the choice of methods in this project arediscussed.

Two major areas of theory have been presented: the machine learning area, andthe data extraction and analysis area. These two major areas have been deemednecessary and sufficient in order to gain a working system. This is excluding morepractical considerations such as coding, filtering, sensor data fusion and hardwareimplementation.

In a machine learning context, these two areas can be seen as different approachesto the same problem, namely supervised versus unsupervised learning. Supervisedlearning constitutes training a classifier from labeled examples. Unsupervisedlearning constitutes extracting data and discriminating between it in a mannerthat does not need labeled training examples, such as clustering and filtering.

From a theoretical viewpoint, the major methods and structures of one area couldbe replaced without having to completely revisit the other area. The choice ofmethods here should be seen as initial approach suggestions, and not as beingdependent or logically following one another.

Practically, the division into a pre-processing step (clustering and featureextraction) and a classification step (employing support vector machines) isbeneficial since it allows parallelization of work flow. They are also associatedwith different types of results: extraction and analysis can be used if one desiresan increased domain knowledge for other reasons than classification. Supportvector machines can be used with a different set of input data.

Data extraction and analysis In order to extract a coherent and usefulfeature vector from a set of data inputs, two steps are performed. First, the datais spatially clustered using the algorithm DBSCAN. Then, feature extraction isperformed. Which features to use is decided through feature selection, where aprincipal component analysis of the data is performed to gain domain knowledgeand determine the usability of features.

Classification In order to classify a feature vector as belonging to one or noneof the proposed classes, the method of choice is support vector machines. Toprovide a multiclass classification system from the binary nature of SVM, themulticlass ensemble method of One vs All can be employed.

To enable integration of the class output into the object tracker, Platt scaling canbe used to provide a confidence value of probabilistic nature. This process willmake the class output more valuable since an intuitive measurement of confidenceis provided with each prediction.

43

Part 3: Methods

This chapter aims to explain the methods developed in order to achieve the projectgoal stated in chapter 1. The methods presented here are based upon the theoreticalframework laid out in chapter 2 and will be evaluated in chapter 4.

3.1 Method overview and system introduction

In this section, a brief introduction to the methods used in the development processand an overview of the complete system solution is presented. Below in figure 13,a schematic overview of the development process and the classification system isshown:

Development process Classification system

ClassificationSensor data input Detection filter

Data clustering

Feature Extraction

System output Class and

confidence output

Gather data

Data exploration and feature selection

System development

Sensor data input

System implementation

Figure 13: Development process and classification system overview

The figure above outlines the major components of the process required in orderto gain a complete system. Each segment represents a major functional part ofthe system or a major development process. These are all presented more indetail below. How each method and implementation is verified and validated isdiscussed in its respective section, and the means of validating the complete systemis discussed in the section 3.6.3, Validation of final system implementation.

Verifying whether the chosen system structure is optimal is not an easy task, andwill not be attempted in this project. Whether the structure is functional is useful,and this can be derived from the complete system validation.

3.1.1 Stages of system development

The green box in figure 13 above represents the system development process. Thisprocess contains the pre-requisites to the classification system and also the system

44

implementation process. The major development and system input at this stage isthe sensor data, symbolized by the blue arrow. The intended output is a workingsystem, in turn with a reliable system output, symbolized by the orange arrow.

Sensor data input This is the input data stream, consisting of sensor datadelivered by the radar sensors. The data is pre-processed, partly in the sensorsthemselves (they deliver radar detections and not raw radar signal data) and partlyby pre-existing software structures (see section 1.2.2, Target system).

It should be specifically noted that all data used throughout the project comesfrom actual logged sensor input. So when simulations are performed, they aredone so using real sensor data in the simulated environment.

Gather data In order to perform supervised learning and to gain data domainknowledge, the first process needed is the gathering of data. How this is done isdescribed in detail in section 3.2.1, Test-track data gathering.

Data exploration Data exploration in this sense implies exploring the usageof different features to describe objects, as well as using statistical methods todetermine characteristics of the labeled data. It can be performed after sensordata has been gathered. In this process, different feasible features that can beextracted from the training data are examined. The best ones are selected to bepart of the feature extraction. This is described more in detail in section 3.3,Practical selection and analysis of object descriptions.

System development In this step, the classification system is constructedand trained. This represents an umbrella stage, containing all of the processesand systems described below belonging to the classification system.

System implementation Finally, after the system is developed, it has tobe implemented into the real-time environment of the Astator platform. Here,considerations such as timing constraints and time-complexity of algorithms needto be examined. Details of the system implementation process can be found in3.6, System implementation on target platform.

The constituents of the classification system are presented briefly below.

3.1.2 Classification system overview

The yellow box in figure 13 above symbolizes the actual classification system beingdeveloped. The components of this system, parameter choices and training of theclassifier are developed using offline methods and continuous verification beforebeing implemented to the real-time system (within the System implementationstep).

Detection filter Here, an initial filtering step is performed, with the mainpurpose of removing detections belonging to stationary objects. This is discussedmore in detain in section 3.4.1, Filtering of radar detections.

45

Clustering This is where detections are grouped into clusters, to allow for abetter extraction of object characteristics. The methods used here are discussedmore in detail in section 3.4.2, Clustering of radar detections using DBSCAN.Additionally, a post-clustering filter structure is implemented with the intentionof removing ”unwanted” clusters.

Feature extraction In this step, computations are done on clusters in order toextract a feature vector that contains characteristic data of the presumed objectthat the detection cluster comes from. What features are extracted, why, and howthis is done is discussed more in section 3.4.3, Feature vector calculation.

Classification This is where each feature vector is classified as belonging toone of the four classes. Many considerations are required here, discussed in section3.5.1, Implementation of support vector machine system.

Class and confidence output Here, the final system output is constructed.The computations done in this step include ensemble methods for multiclassclassification, discussed in section 3.5.2, Multiclass, rejection and confidencestructures. A confidence measurement is also constructed here, also discussedin 3.5.2.

This section has outlined the rest of the contents of this chapter and provides anoverview to much of the work performed in this project. In the sections below,the different subsystems and processes are explained in detail.

3.2 Gathering radar detection data

In order to create a classifier using supervised learning, labeled training examplesare required. Gathered data is also necessary for most parts of the systemdevelopment process, such as system simulation and verification, analysis of datadomain and the development of filtering processes. This section describes theprocess of gathering and labeling the data required in order to construct theclassification system.

3.2.1 Test-track data gathering

In order to acquire labeled data from each of the four classes, several test scenariosusing the Astator platform have been realized. Below, the details of these testscenarios are presented.

The tests were conducted at the Scania testing track. Several different tests wereconducted in which a known object moved on the road while the Astator systemcontinuously logged all radar input and saved it to file.

In order to have more control over the test environment, the EGO vehicle wasstationary, while the observed object was the only moving object in the vicinity.

The following scenarios were constructed:

46

Pedestrian For the pedestrian class, it was decided that instead of moving onthe road, the pedestrian should move close to Astator. This was partly because theradars have difficulty detecting pedestrians at range, and partly to avoid havingthe pedestrian on the road were occasionally, heavy vehicles could be driving.

The pedestrian scenarios are as follows: The pedestrian moves around Astator ina circle of approximately ten meters radius. In the first test walking slowly, andin the second and third tests testing a jogging and running gait respectively. Toinvestigate the ranges at which the radars can detect pedestrians, additional testswere performed. In the first, a pedestrian starts far from Astator, at a distance ofapproximately 60 meters, and moves straight towards the vehicle. In the secondthe pedestrian moves towards the vehicle, but at an angle. Both of these tests aredone in both a walking and a running gait.

In addition to these seven tests, a scenario were the pedestrian moves in a morechaotic manner was added. Here both velocity, heading, and distance to EGOare varied throughout the test in order to get a bigger spread in any consideredfeature space.

All pedestrian tests were conducted in two variants, one where the pedestrianis wearing black clothes, and one with a reflective safety vest. This was to getadditional spread in the feature space.

Bicyclist For the bicycle class, the first test scenario was set up as follows:

Astator is parked parallel to, and at a distance of approximately 30 meters tothe road. The bicyclist moves straight along the road at one velocity per test.The first is a relaxed cruising velocity, the second a more active, normal bicyclistvelocity and the third is the maximal possible speed.

The second test scenario has the first three tests repeated, but with Astator placedin a 45 degree angle with respect to the road. This was to get a bigger spread inthe headings of the detected object.

Lastly, a random speed and direction test is suggested were the bicyclist movesaround Astator in a snakelike pattern. As with the pedestrian case, all bicyclisttests were repeated in two variants, the first in black clothing and the second withmore reflective clothes.

Vehicles To gather data from the personal vehicle and truck classes, similartests to the bicycle case were conducted. Three velocities were chosen: 10 km/h,30 km/h and 50km/h. A higher speed was avoided due to limitations on the testtrack, and the low speed of 10 km/h was added to ensure that the class velocitiesoverlap in the training data.

Data from these three different velocities were gathered in both the parallel andangled scenario. An additional test with random direction and velocity was addedfor both personal vehicle and truck.

47

To increase spread in the data, two different vehicles were used for each class, themain difference being their size. For the personal vehicle tests, the first car usedwas the VW E-Up, a small electrical car. The second was a VW Transport Crafter35, a much bigger personal vehicle. For the truck class, a Scania truck was used.The tests were performed once with the truck standalone, and once with an addedtrailer.

3.2.2 Labeling of gathered data

After data is gathered, it needs to be labeled according to class in order to be usefulfor supervised learning. Preferably, this would be purely scripted. However, thisapproach is problematic due to noise being present in the sensor signals, whichcould lead to a lot of mislabeled samples.

Instead, logs are gone through manually frame by frame. Detections belonging toa known object are labeled as such, and detections that clearly do not belong toany moving object are labeled as noise. This method is very time consuming, butdoes provide some benefits.

With manual labeling, noise data can be stored in addition to the wanted data,but with a separate noise label. This noise data can be used to research theproperties of radar noise, and possibly develop methods to handle and evaluatefilter structures.

These labeled samples, each containing a feature vector and a class label, providea necessary pre-requisite for supervised learning but can also be used for manyother purposes, such as analysing characteristics of the different classes.

3.3 Practical selection and analysis of objectdescriptions

This section contains an overview into the methods used to choose and evaluateobject representations. The same methods can also be used for general explorationof data characteristics, and how this is done will also be presented below.

In order for features to be computed, the sensor data first has to be clusteredinto groups thought to belong to the same object (as described in section 2.4,Extraction and analysis of object descriptions). The specifics of the methods usedfor clustering can be found in section 3.4.2, Clustering of radar detections usingDBSCAN.

Feature selection method As discussed in section 2.4.2, Feature selection,there are several approaches that can be taken to find suitable features. Inthis project, an applied version of the filtering strategy is used. The reason forthis is mostly practical, as the use of wrapper methods are deemed to be too

48

complex both coding-wise and time-complexity wise to be applied here. The useof embedded methods is also disregarded, due to the choice of support vectormachines as classification method (which does not inherently contain a featureordering mechanism).

The filtering strategy implies the usage of some external measurement to determinethe usefulness of a feature.

One such measurement is the normal distribution of each feature with respect toeach class. As long as the input data does not cause the particular feature to bebiased, or the feature itself causes bias, a feature can be considered descriptive(or ”good”, or ”usable”) if there is a significant difference in normal distributionbetween the classes. A set of features can then be considered good if they fulfilthe above, and also are somewhat uncorrelated with each other.

Another measurement is to do a PCA (discussed in section 2.4.3, Principalcomponent analysis for feature evaluation). This is a more complex analysisthan just comparing normal distributions, and provides a powerful tool to viewcorrelation between features and the usefulness of a particular feature in explainingdata set variance. If there is a very uneven distribution where many featuresexplain very little, and one or two give much of the variance explanation, this isa sign of bad feature selection. It is however important to not over-interpret theresults of a PCA when it comes to prediction, as features with a small variancestill can hold good predictive value [35].

In this project, both measurements are used in order to provide both a robust andeasy to grasp evaluation of the features selected.

3.3.1 Description of features used

Despite the presence of good evaluation methods, an initial set of features has to besuggested, from knowledge about the data or from some heuristic, in order to havesomething to evaluate [34]. Below, a list of the features that are used within thisproject and the domain knowledge behind their usage is presented. The featuresbelow will, in the complete classification system, be calculated through a simplefeature extraction process.

Each feature constitutes some function of one or several of the data types containedwithin each radar detection. They are meant to be simple and plentiful, in orderto provide smaller pieces of valuable information from many sources rather thenmuch information from few sources.

Number of detections This feature simply consists of the number ofdetections found in a specific cluster. This gives some sense of the largeness ofthe object the cluster belongs to. It is also possible that some characteristic ofthe radar sensors cause them to deliver more detections from certain surfaces orfrom certain types of movement. This could then amplify the separability of thedifferent classes for this feature.

49

Minimum length This feature consists of the length of the diagonal of arectangle drawn from the edge points of the cluster. This length is called theminimum length because, provided the cluster in question describes an actualobject, the object is at least this long (but could be longer in reality). Thisfeature is thought to distinguish classes based on average size, and will thus likelybe correlated with the number of detections feature.

Area This feature is the area of the rectangle drawn around the edge points ofthe cluster. It is likely to be heavily correlated to especially the minimum lengthfeature, but could provide extra information from certain shape characteristics ofspecific classes.

Density This feature is calculated as one divided by the average distancebetween each detection in a cluster. Instead of just computing the density throughdividing number of detections by area, this method gives a somewhat uncorrelateddensity value. This feature is thought to hold some additional descriptive valueover just the area or the number of detections features. For example, a certainclass might have average characteristics of being very large, but having such asurface or such a movement characteristic that the number of detections are low.

Mean Doppler velocity This feature measures the mean Doppler velocityof the cluster. This is thought to distinguish classes well at their correspondingaverage speed. However, since all classes span the same low area of possiblevelocities (a car could travel at very low velocity), this feature could introducebias.

Variance of Doppler velocity This feature measures the variance of theDoppler velocities of the detections within a cluster. It measures the weightedmean of the velocity variance of the detections from each radar. This methodavoids some bias (since two different radars can pick up the same object, but dueto different placements register very different Doppler velocities), compared to justtaking the variance of a complete cluster.

It is thought that this feature can provide useful information for separating classeswith large difference between the slowest and fastest moving parts within the sameobject. For example, a truck has large wheels that, at the lowest and highestpoints, have vastly different velocities. If a radar sensor happens to detect thesetwo points, the Doppler velocity variance will be large.

Amplitude per distance This feature measures the average detectionamplitude in a cluster and divides it by the mean cluster distance. Exactly howthe amplitude data type is measured is unknown, but it is thought that it varieswith distance (possibly, or even likely, with distance squared). In order to notintroduce a large bias, the amplitude is divided by the mean distance. This thenlowers the likelihood of a class (for example, pedestrians) that is generally detectedclose to the EGO vehicle (due to smallness) gets a higher average amplitude scorethen other classes.

50

Variance of amplitude This feature measures the variance in the amplitudeof the detections of a cluster. The amplitude is thought to vary with surfacedifferences. For example, a detection from the side of a truck trailer is thought tohave a higher average amplitude then a detection from a pedestrian jacket (softfabric). This implies that classes with few differences in surface material are likelyto have a lower variance of amplitude then classes that can have many differentsurface materials.

A PCA of the usage of the above features, together with a mean and varianceanalysis of each feature, is presented in section 4.2.1, Characteristics of selectedfeatures together with a discussion about possible bias problems regarding thefeature selection.

3.3.2 Analysis of features and data with PCA

By performing clustering on the gathered and radar labeled data, and computingthe features above for every cluster, a set of labeled feature vectors is obtained.Since 8 features are present, this dataset is 8-dimensional, making it hard to getan overview into the separability and distribution of the classes.

By projecting the data into a lower dimensional space consisting of the first twoor three principal components, a better overview is obtained. Thus, the PCAmethod will be used in determining whether the final classifier performs as canbe expected. If there is a clear correlation between the separability of particularsubsets of data (for example, a low separability of the pedestrian and the bicycleclass), then it can be expected that the classification system will have troubledistinguishing between these particular classes.

3.4 Pre-classification signal processing

As with all sensors, the radar system mounted on Astator is affected by noise. Toreduce the risk of classifying false objects, a methodology to reduce the impact ofnoise is developed.

First, the properties of noise detections is studied visually by inspecting recordedradar data, and logging parameters from detections believed to be noise. Ifproperties can be found that separate noise detections or detections with falseparameters from ”true” detections, these properties can be used to form a filterstructure which will attempt to remove ”bad” detections before the classificationstage.

3.4.1 Filtering of radar detections

Before getting further in the classification system signal chain, each radar detectionpasses through a filter structure. The purpose of this filter is to remove undesired

51

detections as well as noise detections in order to provide a less cluttered view ofthe vehicle surroundings. What the input data looks like and how it has beenpre-processed is described in section 1.2.2, Target system.

The classification system is only concerned with moving objects, and for this reasonthe primary purpose of the detection filter is to remove stationary detections. Thedifferent detection level filters used are described below.

Detection filter 1 The first stage in the detection filter is a thresholdminMoveInd in the movement index of the detection. This is to remove alldetections believed to belong to stationary objects. When choosing threshold,it has to be considered that the highest value (3) has the highest probability ofremoving stationary objects. However, objects such as pedestrians move slowlyand are hard to detect in that they rarely produce many detections. For thisreason, a lower value may be needed to avoid missing these objects.

Detection filter 2 The second stage in the detection filter is set up to removedetections where the calculated velocity value is above a certain threshold maxdR.The reason for this is that noise detections occasionally have higher velocity valuesthan what is reasonable for any of the objects desired by the classification system.For example exceeding 150 km/h.

Detection filter 3 The third stage in the filter is a threshold maxRange,and any detection outside this range is discarded. The reason for this is thatboth the number of detections on any given object, and the accuracy in thesedetections, tend to decrease with increasing range. The possibility to set thisparameter enables the possibility to limit the range of the classification system,while possibly increasing accuracy.

3.4.2 Clustering of radar detections using DBSCAN

This section contains an overview of the methods concerned when clustering radardetection points into objects.

The clustering method chosen for this project is the DBSCAN algorithm, explainedmore in detail in 2.4.1, Clustering of sensor data. The main reason behindusing DBSCAN and not one of its more advanced cousins (such as OPTICS,also briefly mentioned in 2.4.1) is simplicity. Due to the relatively small datasets to be handled simultaneously (256 radar detections), time-complexity is notseen as an issue and thus parallel extensions are disregarded. Neither is thechoice of clustering parameters seen as particularly troublesome (although severalconsiderations need to be taken here, discussed more in detail below). The wishfor simplicity weighs more heavily than the potential performance increase of adynamic parameter usage. However, one should be aware that there could existboth a performance and functionality gain to be had with the usage of a morecomplex clustering method.

52

In addition, no effort is put into the pre-partitioning of data. Since the systemis to work in a RT-environment, pre-partitioning would only serve to ”move” thetime-complexity elsewhere (since it would still need to be executed within thesame cycle).

As discussed in section 2.4.1, one can use any distance function within DBSCAN.In this project, the two dimensional Euclidean distance will be the distancefunction of choice.

In the offline implementation, open source functions from [38] were usedto implement the clustering algorithm, while the online implementation wasdeveloped from the algorithm description in section 2.4.1 .

Clustering Parameter Selection The choice of clustering parameters willhave a big effect on the entire chain of calculations within the classification system.Clustering is an essential part of discerning objects from a lump of single detectionswithin a frame, and the choice of cluster parameters greatly affect how an object isperceived. It is therefore important to have a well-grounded method of parameterselection.

As stated in 2.4.1, there are two parameters to decide on: MinPts and Eps.MinPts decides how many points a cluster has to contain, and Eps describesthe distance between clustered points. What constitutes a good parameter choicedepends both on how the data stream (the output from the radar sensors) looksand on real-world considerations.

Choice of the Eps Parameter When choosing the Eps parameter, there isa tradeoff between how far apart two objects have to be to be to not get clusteredtogether, and how often a single object is perceived as several separate clusters.

For example, consider the case shown below in figure 14:

53

eps = X eps = X´

Figure 14: The eps tradeoff

The figure above shows detections (crosses) belonging to two different object(silhouetted by black borders). The sensors might deliver detections from anobject at a particular distance with an average distance between detections of Xmeters, which is then used as the Eps = X. However, there can be gaps presentwhere there are no detections within the specified area. Thus, two clusters areformed (green) when there was only one object present. However, if one changesthe parameter to Eps = X ′, the green detections get clustered together, but thered detections belonging to another object are also put in the same cluster.

Within the iQmatic project, the aim is to perform autonomous driving inmining sites. This environment greatly differs from, for example, an inner cityenvironment where pedestrians and bicycles mix with cars at very close distances.At a mining site, safety distances to vehicles is likely to be maintained, andsituations where many moving objects are present at the same time in a smallarea are unlikely to arise. Thus, it is deemed more important in this project tohave a sufficiently large Eps that detections from a single object are clusteredtogether.

The method of choosing the right Eps parameter can therefore be to observe datalogs with large objects (for example trucks) that can provide enough detections inan area to form several clusters, and at long distances (between 60 and 80 meters),where detections are likely more spread out. Observing such data logs, one choosesthe smallest Eps that allows all points that clearly belong to the same object tobe clustered together.

This approach is very much a heuristic, but more thorough or optimal ways ofverifying results are deemed to be outside of the scope of this project and are thusnot considered.

54

Choice of the MinPts parameter The method of choosing the MinPtsparameter is also a heuristic that needs to be suited to the data in question. Inparticular, since the same MinPts is used for all clusters, it has to be adaptedto the smallest objects that are meant to be detected. In the current case, thismeans the pedestrian class provides the limiting factor. So, in order to choose agood MinPts parameter, one can observe data logs of pedestrians at a maximumdistance (as far away as they are detectable), and select a value that allows fordetections of pedestrians to be clustered.

There are also noise points to consider. Such is almost always the case withcomplex sensors, and the radar sensors used within this project have from previousexperience been known to produce detections of unknown origin. These noisedetections are of a very low density, but can appear anywhere in a frame. If onewants to avoid sending all these noise points further in the system structure, onehas to choose the MinPts parameter to be sufficiently large to filter out thesenoise points.

The noise points could also appear close enough to a cluster (within the Epsdistance) to be put in said cluster. This could lead to a warping of the clustercharacteristics compared to if all detections come from real objects. If this provesto be a problem, the Eps parameter can be tuned to a lower value then otherwisepreferable. Determining whether this is the case will also be done through heuristicapproaches, as a more meticulous review is deemed outside the scope of thisproject.

One method of determining this could be to qualitatively assess a few difficultframes, and see whether a lower value causes problems.

3.4.3 Feature vector calculation

After the clustering step, each cluster of detections is fed to the feature extractionstage. The purpose of the feature extraction is to calculate properties of eachcluster that can be used for classification and possible further filtering.

The output from the feature extraction system is a feature vector, composed ofthe eight parameters described in section 3.3, Practical selection and analysis ofobject descriptions.

In order to prevent features with inherently large values causing bias, eachparameter is scaled to a value between -1 and 1. This is done both on the trainingdata used to create the classifier as well as on new data points. When classifyingnew objects, it is important that each parameter in the feature vector is scaled bythe correct scale factor (the same one used when the system was trained).

3.4.4 Filtering of radar detection clusters

After the feature extraction step, several new properties are available that providenew information about a clustered object.

55

In addition to being used for pure classification purposes, these properties couldpossibly be used in a second filtering step. The purpose of this proposed filteringis to remove clusters that for any reason are undesirable to send further in thesystem.

Reasons to discard clusters could be either that they are thought to belong to noneof the classes specified, or that they are believed to be clusters of noise or corruptdetections. In order to remove undesired clusters, several filtering hypothesesare put forward. The ideas for these filters are developed by watching loggedradar data with known objects and studying the properties of corrupted clustersencountered.

Using labeled data from both real objects and noise objects, the filters can beevaluated by the amount of noise clusters vs the amount of real clusters that areremoved.

Cluster filtering hypothesis 1: In false detection clusters, a tendency is seenin which the velocity values of individual detections are more spread out than inany real moving object cluster. For example it may be common for false clustersto have velocity values in a greater span than detections that truly originate froma moving object. This proposed filtering step will look at the Variance of Dopplervelocity feature, described in section 3.3.1, and remove all clusters where this valueis above a certain threshold maxClusterV elV ar.

Cluster filtering hypothesis 2: False clusters seemingly appear with a smallspread in the amplitude values of individual detections, while detections belongingto an object such as a moving vehicle seems to have greater variance in thisparameter.

This proposed filter will be based on the Variance of amplitude feature,removing all clusters were the variance is below a certain threshold valueminClusterAmpV ar.

Cluster filtering hypothesis 3: It is believed that a large part of falsedetections are created through phenomena such as radar interference and doublereflections. Clusters of these false detections may be especially hard to discernfrom the real object which caused the interference.

A proposed filtering method is to look where in the local coordinate system theclosest detected cluster is positioned. From the EGO vehicle, two lines are drawnto the edges of this cluster, and then continue outwards from the vehicle. Thiswould create a circle segment of a certain angle, and the proposed filter wouldremove all clusters that lie in the ”shadow” of the closest cluster detected. Thisfilter structure has validity in that it is improbable that any object could bedetected straight behind another object, similar to how the human eye cannot seewhat is behind something else. However, it is known that some of these detectionscould be from real objects, for example if the radar wave passes under or over thefirst object, or through windows.

56

3.5 Classification of processed objects

This section contains a description of the methods used in order to implement aclassification stage in the system. The SVM implementation structure is describedtogether with an outline of how parameters are chosen. The section is concludedwith a description of how validation with respect to classification performance isconducted.

3.5.1 Implementation of support vector machine system

The optimization problem (14) given in section 2.3.2, Support vector machines asa method for classification, is quite easily solved using any quadratic programmingtoolbox, such as quadprog in MATLAB, or cvxopt in Python.

When implementing SVM however, it can be practical to use one of the manyopen-source libraries available.

In this project, the open-source library LIBSVM is used since it provides a simpleand efficient interface for training support vector machines. LIBSVM [39] is librarythat is suitable both for beginners and advanced users and is currently one of themost used SVM applications.

LIBSVM has support for both classification and regression and has built-infeatures like kernels, soft-margin and cross validation. In [39], the dual formulationof the SVM optimization problem is stated as follows:

minα

1

2αTQα− eTα

subject to yTα = 0

0 ≤ αi ≤ C, i = 1, . . . , N. (25)

Here, e is a vector of length N containing only ones, while Q is an N ×N positivesemi-definite matrix with elements:

Qi,j ≡ yiyjK(xi,xj) (26)

It can be seen that this implementation is identical to the dual formulation (14)given in section 2.3.2, with the addition of the constraint αi ≤ C (soft-margin),and with dot products xi · xj replaced by K(xi,xj) (the kernel function). Tounderstand the meaning of these changes, see section 2.3.2, Kernels and softmargin.

The use of kernels leads to a slightly changed decision function:

class(xn) = sgn(ωT · φ(xn) + b) = sgn

(∑i

yyαiK(xi,xn) + b

)(27)

57

LIBSVM provides scripts for both training and prediction, and stores allparameters needed for classification. This means that even if a model is trainedusing LIBSVM, prediction can be made manually using equation (27).

This could be worth considering in a real-time implementation, since it couldprovide benefits from an embedded systems perspective to have the predictionfunction as light as possible.

Model selection From section 2.3.2, Model selection, it is known that thereare multiple parameters to choose before a support vector machine model can beconstructed. These parameters have a huge effect on the performance of an SVMclassifier, and if chosen poorly can lead to problems with overfitting.

In this project, the grid search method explained in 2.3.2 is used to identifysuitable values for each parameter. The reason that the grid search method isused is partly because it is a straight forward, simple to understand method, andpartly because it is so commonly used and shows great performance.

The only downside with the cross validation grid search is the processingcomplexity of the algorithm, which makes the process very time consuming.However, since the only timing requirements in this project are on the on-lineclassification tasks, as opposed to the offline training of the system, this is not aproblem.

For the radial basis kernel SVM with soft margin, there are two parameters thatneed to be identified, the SVM slack parameter C, and the kernel parameter γ.

In [24], a grid of γ = (2−15, 2−2 . . . 23), C = (2−5, 22 . . . 215) is suggested.

This is very a broad search, but with a large step size of k = 2. For this reason,it results in only 165 parameter combinations to test, and a quite simple search.It can be seen as an initial search with the purpose of getting a broad overview ofthe parameter space and seeing what range of values could be worth looking at ina finer search.

A second, more detailed grid search should always be conducted based on theresults of the first search. This is a finer search in the areas that showed mostpotential in the first search.

3.5.2 Multiclass, rejection and confidence structures

Aside from the choice of slack and kernel parameters, there are a number ofarchitectural considerations to make before implementing a multiclass SVM. Inthe case of this particular project, the implications mostly concern how to dealwith noise or borderline cases in the data, but the desired output format is alsosomething to consider. If the SVM output scores can be scaled to a probability-likemeasurement, the ability for the system output to be integrated into other systemsis increased.

58

When having a multiclass problem such as this, the common ways of turning thebinary nature of support vector machines into multiclass (or ensemble) solvers isto either have one vs. one, or one vs. all classification, as discussed in section2.3.1, Multiclass classification.

Confidence output As discussed above, it is beneficial if the SVM outputprediction can be accompanied by a score that provides an intuitive measurementof confidence. In section 2.3.2, SVM outputs and Platt-scaling, one method toconvert the raw SVM outputs to probability-like measurements was provided.However, this method is inherently made for binary classifiers, and may beproblematic to apply in a multiclass problem.

If using the OvA ensemble scheme, one SVM score will be provided for eachclass. These scores could be Platt-scaled independently to obtain 4 differentprobability-like scores. It must be noted that these scores will not reflecttrue probabilities, and will also not necessarily be reasonable in a multiclassperspective. For example, the Platt-scores obtained for the different classes arenot limited to a probabilistic sum of one.

However, the Platt-scaled outputs would still be very valuable in deciding theworth of a prediction, enabling easy integration in a system-wide perspective.

If using the OvO ensemble scheme, more SVM outputs will be obtained than theamount of classes, additionally, the SVM outputs will not represent an actualconfidence measure for any specific class but rather how each class compares tothe other class in each pair. These facts will make it more complex to obtain aprobabilistic output, as the original Platt-scaling method can not be applied.

Handling of noise and borderline data Noise and borderline data (clustersof radar detections that share similarity with several classes) should preferably betreated differently. Noise should be discarded, while borderline data should beassigned the more likely class. An example of borderline data could be a largepersonal vehicle that borders on truck characteristics.

There are at least three main ways of dealing with these data categories: Theusage of a noise-class, the usage of additional filtering, and the usage of a rejectionthreshold. Furthermore, the performance in any of these three may be affectedby the choice of multiclass ensemble method. The pros and cons of each of thethree concepts are discussed below with regards to the OvO and the OvA schemerespectively.

Noise-class One method is to create a fifth class (the noise class) and train theclassification system with additional labeled noise data. This could be a sensiblething to do, if the noise class had any sort of homogeneity. However, due to themany types of different noise, the noise class will span a much larger space thenthe individual classes, and thus there is the risk of introducing a bias. Because

59

of the wide diversity of the supposed noise class, the risk is also high that realobjects, especially borderline cases, are misclassified as noise.

The noise class method would have similar results with both OvO and OvA,all though with OvO the time-complexity would increase more when adding anadditional class.

Filters Another method is to have a cluster filter that attempts to remove badclusters before classification. Three ideas on how clusters could be filtered werepresented in 3.4.4, Filtering of radar detection clusters.

The filtering method is cumbersome, because in order for it to be valid the differentfilter parameters would have chosen and verified in some structured way, preferablythrough a rigorous statistical evaluation. Additional filtering will also lead toincreased time complexity of the system, however, it is independant on the choiceof ensemble method.

Rejection threshold This method relies on the application of a rejectionthreshold, where one can label data objects as noise if the classifier reports alower output then specified. This method avoids the problems with of noise classmethod, but may be hard to implement depending on the choice of ensemblemethod.

If using OvO, the SVM output scores are not dependent on the absolutecharacteristics of a class, but rather how they compare to another class.Additionally, there will be more SVM scores than classes. These properties makethe rejection method problematic as there is no natural way of chosing a rejectionthreshold. There may exist some method that suitably combines or scales theOvO outputs to a score that is more fitting for a rejection structure.

The OvA scheme makes it easier to apply a rejection scheme, since there is onlyone SVM score associated with each class. However, it is unclear whether therejection approach might introduce some bias (different classes may have differentaverage scores, and thus an absolute threshold might cause bias towards havingthese classes being rejected more often).

One way of avoiding the bias of a noise class, as well as bias introduced by anarbitrary threshold, could be to use OvA and have a rejection threshold of 0. Thiswould lead to a system which will invariably let through noise. However, it doesallow for removing the most obvious noise (the clusters that no classifier deem asbelonging to its own class). It also ensures that borderline data is classified assomething, instead of being discarded, as a borderline point would be assignedseveral non-negative scores. This method also has the benefit of not needing anyadvanced methods in choosing rejection threshold.

60

Conclusions There are many different architectures and considerations tomake when constructing a multiclass classification model. The choice of archetypemay have an impact on time-complexity as well as systems capability to handlenoise and borderline cases.

Above, three methods in which these cases could be handled were discussed.However, it is hard to know which of the three methods is best, or if a combinationis the suitable choice. It may very well be that the same objects that are filteredaway in a cluster-filter or classified as belonging to a noise-class, are the sameobjects that would be rejected in a rejection structure.

To improve the usefulness of the classification output, a probabilistic outputstructure is desired. As discussed in section 2.3.2, Platt scaling is one methodof getting a probability-like output. There are other feasible methods, but in thisproject the Platt method is used.

Furthermore, the One vs All method is used as it is easy to combine with bothPlatt-scaling and a rejection structures. In this project, a rejection structure isimplemented with a rejection threshold of 0.

To solve the problem of combining and choosing rejection threshold withprobabilistic outputs, the rejection is performed on the raw OvA SVM scores.The conversion to probabilistic output (Platt scaling) is then performed after therejection process, on the clusters that passed the rejection structure.

3.5.3 Evaluating classification performance on validation data

To evaluate the performance of a finished classifier, outputs are computed for aset of validation data. By comparing these outputs with the known class labelsexisting for each validation data point, the performance measurements describedin 2.3.3, Classification performance analysis can be calculated. The scores usedin this project are found below.

The first is the total accuracy score:

TotalAccuracy :Correctly classified

total nr of validation data

To gain further insight into the performance of a classifier on the different classes,a multiclass confusion matrix is constructed. This is a variant of the confusionmatrix described in section 2.3.3 but containing all classes. Furthermore, the classspecific values of precision and recall are calculated for each class and included inthe matrix. An example can be seen below in table 3.

61

Table 3: Multiclass Confusion matrix

Class 1 Class 2 Class 3 Class 4 RecallClass 1 ntp1 R1

Class 2 ntp2 R2

Class 3 ntp3 R3

Class 4 ntp4 R4

Precision P1 P2 P3 P4

Here, the values on the diagonal represent the true positive predictions foreach class, while all the values outside the diagonal will be different types ofmisclassifications.

To provide even more insight into classification performance, the class specific andaverage values of accuracy, error and F-measure should be shown in a table. Anexample of such a table is seen below in table 4.

Table 4: Classification performance measurements

Accuracy Error F-measureClass 1 Acc1 Err1 Fmeas1Class 2 Acc2 Err2 Fmeas2Class 3 Acc3 Err3 Fmeas3Class 4 Acc4 Err4 Fmeas4Average 1

C

∑Cc=1Accc

1C

∑Cc=1Errc MFM

3.6 System implementation on target platform

In this section, the methods related to the real-time system implementation onthe Astator platform will be detailed. As stated in section 1.3.1, Developmentapproach, the development and implementation of the classification system isconducted in two stages. First, an offline version of each system function is created.Details of these individual subsystems are found in the respective sections above.

When the offline system is verified to operate according to specifications, areal-time implementation is constructed. Both the offline and real-time systemscan be tested and simulated in a desktop environment with logged radar data,using Matlab/Simulink as simulation and verification tools.

There are several considerations to be made in order for the system implementationto be functional. Many of these are purely practical and will not be discussedfurther. Some considerations are more relevant, and these are discussed below.

62

3.6.1 Real-time implementation goals and restrictions

There are three main objectives during the real-time implementation. The firstone is to have each RT function give the exact same output as the correspondingoffline system function (as presented in section 1.3.1, Development approach.

The second one is for the real-time implementation to be executable within thetime-frame allowed. These considerations are discussed more in detail below insection 3.6.2, Timings and tasks.

The third is for the real-time implementation to perform well as a complete systemimplemented on the test-vehicle. This is discussed in section 3.6.3, Validation offinal system implementation.

The real-time implementation software is created in Simulink as a modification ofthe offline functions. This environment can then be used to simulate the real-timeimplementation. Using code-generation tools, embedded MATLAB blocks can beconverted to C code, which is then transferred to the target system in order toproduce a real implementation.

Frame-size considerations The frame size (meaning the length of time thatis considered a single frame) is an important characteristic to consider. The radarsensors deliver data at a pace of 20 Hz. However, they do not update at the sametime. Each radar delivers 64 detections every time it sends data. It flags thedetections as being updated if the data in that particular detection has not beensent by the radar before. The full set of detections are rarely (if ever) updatedat the same time, so every radar delivers somewhere between 0 and 64 updateddetections every 50 ms.

The real-time system operates on 10 ms execution cycles, and data is sent fromthe radar sensors every 10 ms, but new data can only arrive 50 ms after a radarlast delivered updated detections.

There are at least two different ways of handling the frame size. One way is to seeeach unique time stamp as one frame. The benefit of this is that data is processedas quickly as possible and each detection within one frame is derived from thesame ”true” time.

The other way is to keep each frame at a fixed size of 50ms (same as the updatetime of the radars), and to collect all detections delivered within this time windowinto a single frame, which is delivered each 50 ms to the clustering algorithm.The benefit of this is that each frame is going to be a better representation ofthe environment, since it will contain data from all radars. This also gives moredetections per moving object (if the object is within an overlapping field of view),which might improve classification accuracy. Another benefit is that it is easierto describe the timings of the real-time implementation, since they become moredeterministic.

63

A drawback with this approach is that there is a possible time difference of 50 msbetween the oldest data content and the newest within a single frame, which willnegatively affect classification speed and might have more severe implications forobjects moving very fast.

In this project, the method of considering frames to consist of 50 ms worth ofdata is used (due to the benefits mentioned above), but one could just as validlyprocess data as soon as it arrives.

3.6.2 Timings and tasks

In this section, the timing and time complexity aspects of the real-timeimplementation are discussed. A simplified collection of tasks performed withinthe classification system is seen in the timing diagram below in figure 15.

makeFrame

t_exec = ? mst_system = 10 ms

system

cluster

t_frame = 50 ms

classify

gatherFrame

Figure 15: Timing Diagram

System parallelity is in this case assumed to be non-present (so every task hasto finish in order for consecutive ones to start). Since code generation toolsare used and the hardware itself is powerful and runs on several cores, this isa simplification and some parallelity likely exists. However, for the purpose ofreal-time performance assessment, this view is helpful. In future cases if onewould look to significantly increase performance, applying methods of additionalparallelization could be useful.

System is the inherent cycle of the target system with an execution time of 10ms. This cycle time is the basic timeframe to relate to. Within the system taskare all of the Astator functions. Amongst these are preprocessing and sensor datafusion aswell as the functions that translate sensor data to different coordinate

64

systems. A brief overview of the different functions inherent to the target systemcan be found in 1.2.2, Target system.

gatherFrame is the task that collects the updated detections from the radarsensors, and when a complete frame cycle of 50ms has been finished, it triggersthe rest of the classification algorithm to execute.

makeFrame, cluster and classify are the tasks that make up the rest ofthe classification system. These have to be executed one after another, and alsoneed the system task to be finished for the corresponding cycle in order to execute(since they need the data to be transformed to the EGO local coordinate system).They also wait for the gatherFrame task to be finished, so that they can operateon a coherent frame of data.

The total execution time of these tasks together with gatherFrame is texec, and it isthis time that is of interest when it comes to evaluating the RT timing performance.Simply, this is the execution time of the complete classification system cycle.

The total execution time of texec + tsyst shall never exceed 10ms, since this wouldprevent the system task to complete every 10 ms. It is of course preferred tohave as low an execution time as possible, as this will increase the ability of thecomplete object tracking system to be extended with further tasks.

In order to validate timing performance, the built-in Simulink tool Profiler is used.This tool checks how much time is spent within each function.

Using a model to evaluate real-time performance, on hardware that is differentfrom the real target system, can not directly prove adequate performance.However, the simulated environment is very likely to be slower than compiledc-code given the same hardware. In addition to this, the target system hardwareis more powerful than the processors on which the Simulink profile assessmentis made. This should mean that the profiler assessment is still valuable as anindication of RT-performance.

3.6.3 Validation of final system implementation

To evaluate complete system performance, the following steps will be taken:

First, it will be ensured that the real-time system produces the same outputas the simulated system given the same input. This is because the simulatedenvironment produces its own time-stamps that do not precisely match the actualrun-time sample times. Hence, it cannot be guaranteed that the simulatedclassification receives all sensor data at precisely the same timestamps as thereal-time implementation, and this will invariably produce slightly different results.

Once it is ensured that the same input yields the same output in both theimplemented system and the simulation, continued system evaluation can be madein the simulated environment.

65

To add to the real-time evaluation made in the simulated environment usingSimulink profiler, an additional investigation will be made on the actual targetplatform, ensuring that the system cycle time is never exceeded. These two checkscombined should provide a confidence that the real time requirements are met.

When it comes to the actual classification system performance validation, a variantof the approach used by Garcia [4] and discussed in section 2.1.1, Radar basedvehicle perception will be used. The first step is a qualitative analysis made bylooking at the system output made for a set of radar data logs with known objects.

Then, a quantitative approach of looking through a log frame for frame and notingthe number of correct vs misclassified clusters is performed. In this process, the”truth” data will be composed of the logged objects for which we know the type.

The results and analysis of the system evaluation is found in section 4.5, Completesystem performance assessment.

66

Part 4: Results and discussion

In this part, the results of the project and the methods described in section 3 arepresented and discussed. The chapter is a merge of results and discussion andas such, each section will contain the results of a particular method or subject,together with a discussion of said results. The chapter is concluded with the resultsof validation and verification of the classification system, both in offline contextand in the real time implementation.

4.1 Results of data gathering and labeling

Through the process described in section 3.2.1, Test-track data gathering, a totalof 60 different data logs have been produced. The logs contain radar data from thedifferent test scenarios described. In addition to these controlled test scenarios,several other logs have been produced and used in the evaluation of differentfunctions.

Two scripts for labeling individual clusters in the recorded data have also beenproduced, and by using these scripts a total of 13 logs have been processed andlabeled. These constitute the labeled data set used throughout this chapter.

The results of these processes are 2947 separate labeled radar detection-clustersbelonging to either pedestrian, bicyclist, personal vehicle or truck.

Each of the clusters have a corresponding calculated feature vector, however,should the amount or structure of the features used in the system be modified,new feature vectors can easily be calculated from the labeled data. The labelinghas been made on each individual detection within the cluster, thus a change inclustering parameters or a new filter structure does not mean that the labelingprocess needs to be performed again. The labeled detections will simply passthrough the new filtering, clustering and feature extraction stages in order toobtain new labeled feature vectors.

Table 5: Radar detection clusters gathered and labeled

Class Pedestrian Bicyclist Car Truck NoiseInstances 479 533 734 1201 1709

Aside from the labeled class data, additional labeling has been performed on noiseclusters. In table 5, the spread of labeled clusters over the different classes isshown.

Since the beginning of the project, it has been suspected that the task of acquiringlabeled training data could pose a challenge. The reason for this is that data

67

gathering and labeling are both time consuming processes and it was unknown ifresources such as access to the test track could be given.

Luckily, the chance to gather training data was presented, and a multitudeof examples of radar signals coming from the different classes were recorded.However, with just one day of data gathering, it was inevitable that the datawould be insufficient. It is a big challenge to avoid correlation in the trainingexamples.

For example, in all data gathered within this project, only one type of weatherenvironment exists. Also, only two different types of cars and trucks were used,which may not be enough to get a good coverage in the feature space.Despite the fact that two different types of clothing were used for the pedestrianand bicyclist classes, all the tests were performed on the same person, and withthe same bike. In reality, data would be needed from persons and vehicles ofadditional different sizes and with additional materials.

Even though it was attempted to acquire test data of many different velocitiesfor each class, with limited time it was only possible to get three different speedsper scenario and class. This is another example of high correlation in the trainingdata.

To add to the challenge of data gathering, the labeling of said data is just asproblematic. Of the sixty separate data logs that were gathered, only 13 havebeen processed and labeled. This is because labeling is a time consuming processand it was decided that the time was better spent on other tasks.

From a single data log consisting of a one minute long recording of radar data, itis possible to get thousands of labeled clusters. However, many of them will beheavily correlated, since the radar system yields 20 samples per second and anymoving object considered here will not change much in such short time frames.

It is also important to note that the difference in number of labeled clusters forthe different classes could cause bias in the later stages of machine learning. Whenused for training examples, a more equal distribution would be preferable, sinceit would eliminate this source of potential errors. Since focus in this project hasnot been on fine-tuning performance, this has not been applied. However, it issomething to keep in mind for future work.

The labeled training data acquired within this project is sufficient to demonstratethe feasibility of the concept and that good classification results can be obtainedwithin the requirements stated at the beginning of the project. However, to achievethe full potential of the system, a much bigger set of training data will be neededand with a bigger variation in velocity, heading, distance, environments and objecttypes.

68

4.2 Analysis of feature and data characteristics

In this section, results regarding the analysis of data and selection of features touse for classification are presented. First, the selected features are presented withmean and variance measurements. Then, a PCA is shown for the data and thechosen features. Finally, a discussion regarding these results and their implicationsis presented.

4.2.1 Characteristics of selected features

Below, characteristics of the features from the feature selection process arepresented. These results consists of a table of mean and variance measurements forthe different features selected with respect to the different classes. The analysiswas done using Matlab on object data extracted from the recorded data logs,and features were calculated as described in section 3.3, Practical selection andanalysis of object descriptions.

In order to give a real-world sense of what is being presented, the data is not scaled.This means that values for a certain feature cannot be meaningfully compared tovalues of another feature. Only comparisons between classes, regarding the samefeature, is useful in a classification context.

Pedestrian Bike Car TruckFeature mean var mean var mean var mean varnrOfDets 3.20 1.84 3.32 2.29 5.82 11.14 17.69 303.88minLength 1.73 5.29 2.26 5.43 4.62 6.43 12.24 26.33areaVal 2.87 49.03 2.91 33.76 10.63 160.44 76.91 5982.80densityVal 4.05 26.00 1.37 3.68 0.55 0.17 0.23 0.01mean dR 1.23 0.75 2.53 2.42 3.75 7.52 3.02 7.15var dR 1.10 173.10 0.50 8.88 0.93 6.90 5.24 774.95ampPerDist -1.68 0.65 -0.33 0.49 -0.13 0.03 0.16 0.02varAmp 15.34 376.19 20.43 816.42 30.69 1286.18 53.63 895.74

Table 6: Mean and variance of features used for object description

The table above clearly shows that most features have mean values that distinguishbetween most classes. Notable exceptions are nrOfDets and areaV al whencomparing pedestrians and bicyclists: the mean values for these are very close toeach other. This makes sense, since both these classes detection surfaces mainlyconsists of persons.

Disregarding the variance, this implies that the features chosen carry explanatoryvalue. It is however important not to overinterpret the results. How much, andwhat features are good for what distinctions, can not be directly inferred fromthis particular analysis alone.

69

4.2.2 Principal component analysis of features on training data

In this section, the results of the principal component analysis, discussed in section2.4.3, Principal component analysis for feature evaluation above, is shown. Thisanalysis was done on the labeled data presented in section 4.1, Results of datagathering and labeling above.

Below in table 7, the results of equation 24 is shown. This table shows how mucheach principal component adds to the variance explanation of the data set.

Principal component Cumulative variance1 0.4792 0.6533 0.8184 0.9125 0.9426 0.9707 0.9928 1.00

Table 7: Table of the cumulative variance explanation per principal component

Below in figure 16, a biplot (16a) of the features used together with the trainingdata projected onto the first two principal components (16b) is shown. The biplotis a way of visualizing a principal component analysis. The features used aredisplayed as projections onto the 3D principal component space. The length of thevectors corresponding to each feature are analogue to the amount of ”explanatoryvalue” (its portion of total data set variance explanation) they carry comparedto other features. Additionally, similarity in direction indicate high correlation.The projection of the training data onto the two first principal components givea visual feeling for how separable the different classes are.

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−0.50

0.5

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

minLength

var amp

nrOfDets

areaVal

amp per dist

Component 1

mean dR

Biplot of training data with 2947 samples

var dR

densityVal

Component 2

Co

mp

on

en

t 3

(a)

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3training data with 2947 samples projected onto first two PC

princmp 1

princm

p 2

(b)

Figure 16: Biplot of features and training data projected onto first two PC

70

In figure 17 below, the training data is projected onto a biplot using the firstthree principal components. It gives a visual indication for how the features usedexplain the variance of the training data.

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8

−0.5

0

0.5

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

minLength

amp per dist

var amp

nrOfDets

areaVal

mean dR

var dR

training data with 2947 samples projected onto first three PC with biplot

densityVal

princmp 1

princmp 2

prin

cm

p 3

Figure 17: Training data and biplot projected onto first three PC

In the figures above, the pedestrian class is plotted in blue, the bicycle class ingreen, the car class in yellow and the truck class in red. It is apparent that thepedestrian and the bicycle class are hard to separate in the first three principalcomponent dimensions.

It can also be seen in the figures above that certain features, such as amplitudeper distance minimum length, carry much of the explanatory value for principalcomponents one and two.

It can also be seen that the features minimum length, number of detections andarea are closely correlated with each other (they point in a similar direction). Themean dR and the var dR features are also correlated.

4.2.3 Feature and data analysis discussion

Below, a discussion regarding the results of the specifics of the feature selection,and the general data exploration, is presented. A discussion about the existenceand influence of bias in the selected features is also presented.

71

Feature usefulness Due to the nature of the PCA method and the difficulty ingrasping high-dimensional data, it is hard to draw conclusions about the individualusability of features. Rather, they should be evaluated together.

A sign of good feature selection is that the variance explanation does not comefrom just one or two features. As seen in table 7 above, the cumulative varianceis only slightly over 80 percent at three features. This also implies that what canbe seen in the PCA plots is just part of the truth, since so much of the varianceexplanation occurs in higher dimensions. There is a clear contribution from eachprincipal component, except for the last one. This implies that most features areuseful in explaining variance in the data set. It should be stressed that, since theprincipal components do not correspond to any single feature, this is not the sameas saying that the least useful feature does not add explanatory value.

One can determine from the figures that the features minLength, ampPerDist,varAmp and mean dR have the longest vectors, meaning they carry the mostinformation for the three first PCs. It can also be seen in table 6 above that theseparticular features have a good distinction between the mean values for each class.This implies that these features are particularly useful in object classification.

It can also be seen that several features point in similar direction, but with muchshorter vectors, than other features. For example, the features nrOfDets andareaV al both point in a very similar direction to minLength, but they both havemuch shorter vectors. This implies that these features are highly correlated, andthat the minLength feature possibly carries most of the information needed. Ifit would be necessary to speed up classification, it is probably possible to excludethese less useful features and maintain classification accuracy.

Data analysis and class separability As seen in figure 16b and 17 above,when projected onto the first two or three principal components, it is not apparentthat there is an easy separability of the classes. In particular, the bicycleclass (green) and the car class (yellow) are very entwined, which can explainclassification difficulties between these particular classes. However, one shouldbear in mind that these plots are only low-level representations of the data, andthat they may be more easily separated in the original 8 dimensional feature space.

Feature bias Below, a discussion about the presence of bias in the featuresselected is presented.

In this context, bias present in features is to be understood as the tendency ofa certain feature to introduce systematic errors in classification. This is separatefrom the beneficial distinction between classes that a feature is meant to induce,but which could also be labeled as bias using a different nomenclature.

Striving to use features that possess no bias is preferable in most circumstances.There is however little possibility of accomplishing a total absence of bias, since

72

it can be present in ways unimagined and it is hard to experimentally check forbias.

The presence of bias in both training data and features used is a possibly big sourceof error. Great afterthought has to be applied when making the feature selection,and discussions about the presence of bias need to be comprehensive. Otherwise,there is a risk of seriously compromising the validity of the end results. Particularbias considerations with regards to specific features are presented below.

Variance of Doppler velocity Since this feature is calculated as a weightedmean of the variance of velocity values from each specific radar, some bias inducedby sensor placement is avoided.

There are however situations in which this feature can still induce bias. Forexample, consider the case of an object passing perpendicular to a radar sensor.If the object is a point, no Doppler velocity can be detected (since there is nodetected point having a relative velocity in the radial direction of the sensor).But if the object is very long, there is a high probability that the sensor producesdetections from points that have a velocity not strictly perpendicular to the sensorradial direction. Points from the back end of the object will appear as movingtowards the sensor, while points from the front end will move away. Thus, thevariance in Doppler velocity of such an object is likely large.

The bias in this case is that the same object, at a further distance, would produce alower Doppler speed variance. Thus, this feature could introduce a dependency ondistance to the EGO vehicle, or other unwanted behaviour that leads to systematicerrors.

Amplitude per distance This feature is possibly biased from construction.Depending on what unit the amplitude is calculated as within the radar sensors,it could be that a division with distance squared or distance cubed would be moreaccurate. If the wrong exponent is used in the division, this can lead to a bias.Since the exact amplitude calculation is unknown, the division with distance tothe power of one has been maintained, but this should be changed if more exactknowledge about the calculations done within the radars can be gained.

This bias would introduce a dependency of the distance to the EGO vehicle forthis feature. Clearly, the distance should not in itself be considered a feature: itis implausible that a certain class of moving objects would on average be locatednearer or further away than another.

4.3 Signal processing and filtering performance

This section contains results related to the methods used for processing radardata, such as filtering and clustering. The section also contains some results and

73

discussion related to common types of noise that has been detected in the radarsensors.

4.3.1 Common types of noise in the radar output

In order to create a suitable signal chain structure, an investigation has been madeinto the different types of noise coming from the radar system. The results of thisinvestigation are found below.

It has been seen that when it comes to noise detections from the radar sensors,some forms occur more often than others and are repeatedly seen in almost alldata logs.

Reflection lines A common scenario where noise detections appear is when abig object moves close to Astator and in specific angles. The noise is seen as linesof detections behind the real object.

Examples of the phenomena are given in figure 18. Here a big object is movingclose to astator, and lines of corrupt detections are seen behind the actual object.

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60

80Radar−detection based classification

Astator

Raw Detections

Filtered detections

Clustered detections

(a)

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60


Astator

Raw Detections

Filtered detections


(b)

Figure 18: Lines of noise detections behind an object

Scatter Similar to the reflection lines, this type of noise seemingly appearswhenever there are objects moving close to EGO and at certain angles. Thedifference here is that instead of appearing in straight lines behind the object,the noise appears as sparse detections spread out in an arc behind the object.The detection velocity values also seems to be low in this case, as opposed to thereflection lines where the velocity values are often amplified.

Examples of scattered noise detections are given below:

74

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60


Astator

Raw Detections

Filtered detections


(a)

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60


Astator

Raw Detections

Filtered detections


(b)

Figure 19: Scattered noise detections behind an object

In both figures 19a and 19b, the only real moving object is the cluster seen closestto the EGO vehicle (centered in the figure).

The noise described in figures 18 and 19 are believed to be caused by radar wavesbouncing multiple times on an object before being detected by the sensor.

Theoretically, this would cause detections to appear with range values rdetappearing in multiples of the true range to the object.

rdet = n× rtrue where n = (2, 3 · · · ) (28)

Also, since the wave is reflected several times on the object, several Doppler shiftswould be superimposed. Theoretically, this would show in the range rates of thedetected points:

rdet = n× rtrue where n = (2, 3 · · · ) (29)

In reality, the noise is seen as lines and arcs of false detections appearing behinda detected object. These noise detections do not seem to be restricted to appearwith ranges in multiples of the true range. This could be a sign that some otherphenomena is the cause of this noise.

One theory is that the noise appears due to radar signals reflecting on an object butbeing detected by a different radar than the one in which it originated, causingboth the angle, the range, and the velocity of the detection to be corrupted.However, since the radars are frequency modulated, such interference between theradars should not happen.

75

The reflection lines do not appear as commonly as the scattered noise (whichappears just about every time an object passes by), but they are also harder toremove using the filter structures developed in this project. The reason for this isthat the lines usually appear with detections very close to each other, causing themto be clustered into an object instead of being marked as noise in the DBSCANstage.

False clusters One type of noise detections seemingly appears independent ofwhether or not there is an object close to Astator. These ”false” detections oftenappear briefly, in a single sample only, but in high numbers and with high velocityvalues.

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60


Astator

Raw Detections

Filtered detections


Velocity

(a)

−80 −60 −40 −20 0 20 40 60 80−80

−60

−40

−20

0

20

40

60


Astator

Raw Detections

Filtered detections


Velocity

(b)

Figure 20: Clusters of noise detections with high velocity values

Figures 20a and 20b are examples of this type of noise. No real objects are presentin either of the figures.

The false clusters are only apparent in a single frame and do not return in thesame place. Also plotted in the figures are the velocity values of the detections(shown as arrows).

This type of noise by far makes up the most common type of noise seen in theradar data. The quantity and density of these false detections make it hard tofilter them using DBSCAN since they usually end up being clustered, as is thecase with the reflection lines.

One thing that can be seen is that this type of noise usually has a very highvariation in velocity values among detections, which can appear to move atunrealistic speed and opposing directions, even though they are all seem to comefrom the same ”ghost” object. Looking at figure 20, it can be seen that the velocityvalues within these false clusters are very high, often above 100 km/h.

76

Clutter When the EGO vehicle is moving, an increased amount of corruptdetections are seen in the radar data. These detections are thought to originate instationary objects that are interpreted as moving objects by the radar processingsoftware.

In some cases, the amount of corrupt detections is so great that the points areclustered and sent onwards in the signal chain.

Below in figure 21, a frame from when the EGO vehicle is moving at around 60km/h along a straight road with low road fences on both sides is shown.

meters

-60 -50 -40 -30 -20 -10 0 10 20

mete

rs

-40

-30

-20

-10

0

10

20

30

X: -18.28

Y: 2.637

X: -16.2

Y: 2.8

Figure 21: Fence detections and noise when moving at 60 km/h

In this figure, no real moving objects are present; every detection is either fromnoise or stationary objects. The detections that lie on a straight line are fromfences. Despite this, the radar processing software labels these detections as havinga movement index of 3, meaning the system is certain they originate from movingtargets.

Conclusions regarding noise The types of noise shown above are extremelycommon, being present in some form throughout most logged data.

Since it is impossible to strictly guarantee a 100% visual overview of the test track,it is problematic to rule out the possibility that some moving object was actually

77

there to cause the detections. However, great care was taken to prevent this byonly recording data when there could be seen no moving objects in the vicinityexcept the object being studied.

Assuming that there were in fact no objects other than those deliberately movingon the test track, two possibilities remain which could cause false detections toappear:

1. A stationary object is registered as moving (corrupted detections)

2. Something within the radar sensor internal processor causes false detectionsto appear (pure noise)

It should be noted that even though these types of noise are very common in theradar data gathered within this project, this does not guarantee that they arerepresentative of all situations. In a crowded traffic environment for example, thenoise situation may be entirely different than what has been shown here.

The main bulk of data gathered within this project is from a large open part ofthe test-track with rarely more than one other object present at a time. This isthought to be somewhat representative of the mining context in which the iQmaticproject is mainly focused, and for this reason the investigation of noise is thoughtto be valid.

The study of noise was done to gain more insight into the downsides of the radarsystem as well as to provide a basis for developing filter structures.

4.3.2 Results of developed filtering structures

As is known from the previous section, there is a considerable amount of noisecoming from the radar system. The most common problem is that movingdetections sporadically appear, where there were in fact no moving objects. Thissection contains results of the different filter structures developed within thisproject.

Detection filtering In order to remove unwanted detections, the threedifferent radar detection filters described in section 3.4.1 have been implemented.

The movement index filter constitutes a crucial component of the overallsystem, since stationary detections contain no desired information. The weaknessof this filter is that, as described above, detections coming from stationary objectscan be corrupted and appear with nonzero movement indexes.

The detection velocity threshold is an attempt to reduce the amount ofcorrupt detections sent to the clustering stage. Since corrupt detections oftenappear with velocity values in a wider range than real objects, detections withunreasonably high velocities are discarded.

78

Below, results of the maxdR filter on two different data logs is shown.

Table 8: maxdR detection filter statistics

Datalog maxdR [m/s] [km/h] Filtered20150311 110218 25 90 1096/4442620150311 110218 14 50.4 2166/4442620150311 140531 25 90 575/292420150311 140531 14 50.4 1102/2924

In table 8 it can be seen that out of the detections that appear as movingdetections, some are discarded by the filter. The first log contains radar datacollected as a truck is driving in a snakelike pattern close to EGO. The secondlog is a shorter log from a bicyclist test scenario. In both logs, the velocity of thestudied object never exceeds 30 km/h. Despite this, a large amount of detectionshave velocity values exceeding 50 km/h.

The results show that the number of corrupt velocity detections vary highlythroughout logged data, but in both cases, the maxdR threshold is useful. Thevalidity of this filter is easily defended, since even if the removed detections belongto real objects, the velocity values are corrupt, and thus they should not be usedfor classification.

The maxRange threshold described in section 3.4.1 is currently set to 80 m,which is the maximum range of the radars. The performance of this particularfilter step has not been evaluated, but reasonably this parameter could be useful.

Cluster filtering The first two solution hypotheses of the cluster filteringstructure described in section 3.4.4 have been implemented and evaluated.

The results presented below were obtained by testing different threshold values forminClusterAmpVar and maxClusterVelVar respectively, and studying the amountof corrupt clusters (labeled noise clusters) that are removed vs the amount of trueclusters (labeled class objects) that are removed. This evaluation is made on thefull set of labeled data.

Table 9: maxClusterVelVar filtering results

Threshold Noise clusters removed Real clusters removed-5 1406 (100.0 %) 2933 (100.0 %)0 1377 (97.9 %) 2812 (95.9 %)5 772 (54.9 %) 99 (3.4 %)10 727 (51.7 %) 62 (2.1 %)15 699 (49.7 %) 54 (1.8 %)

79

Table 10: minClusterAmpVar filtering results

Threshold Noise clusters removed Real clusters removed0 0 (0.0 %) 0 (0.0 %)1 254 (18.1 %) 180 (6.1 %)2 335 (23.8 %) 268 (9.1 %)3 385 (27.4 %) 357 (12.2 %)4 421 (29.9 %) 433 (14.8 %)5 445 (31.7 %) 494 (16.8 %)

As can be seen in tables 9 and 10, the two first cluster filtering hypotheses bothshow potential. The filtering of clusters with highly varying velocities clearlyremoves more noise clusters than real clusters, and when studying logs manually,it has been seen that a significant part of the false clusters described in section4.3.1 can be removed using this filter.

The amplitude variance filter is not as effective, since even at a small threshold asignificant amount of real clusters are discarded. However, if the right thresholdis chosen, this filter could still be useful.

It should be noted that, as described in the end of section 4.3.1, the labeled noisedata gathered within this project does not necessarily represent all situations. Forthis reason, the filtering evaluation made above can only validly be applied to thesituations considered within this project, and contained within the data gatheredhere.

Filtering conclusions If a detection caused by a stationary object, such as aground reflection or a building, appears to have movement in the sensor, it maypass the detection filter. If the velocity value of this detection is unreasonablyhigh, it will be discarded by the maxdR filter. However, many detections slipthrough the detection filter as their velocity values are just high enough to appearas moving, but not so big as to be discarded.

If several of these noise detections are close together, they will pass the clusteringstage as well, thus reaching all the way to the classification stage of the signalchain.

A more strict rejection threshold in the multiclass ensemble could reduce theimpact of this, as a larger amount of noise clusters would be rejected. However,this would unavoidably lead to an increase in the amount of true clusters beingrejected. For this reason a lower rejection threshold is avoided.

The cluster filter hypotheses was an investigation into the possibility to removecorrupt clusters before they reach the classification stage. However, even though

80

it showed potential, it has been deactivated in the final system. The reason forthis is described below.

Since the system concerned in this project aims to operate on the same cycle timeas the radars, and use the data of every sample separately, it is very sensitive tonoise which appears on a sample to sample basis. The usage of detection history,that is, to use data from earlier samples in a probabilistic filter structure, wouldgreatly reduce the impact of noise detections that only exist in a single sample.

Even though the system developed in this project was specified to work withoutdetection history, and thus may be sensitive to noise, the tracking system for whichthe class output is intended uses a probabilistic filter structure with detectionhistory. This means that from a system-wide perspective, noise data in the classoutput will not have such a big impact.

This is also the reason why the cluster filter structure was disabled, and it waschosen to avoid focusing further on filter structures, such as the third filteringhypothesis (the object shadow filter) described in 3.4.4.

It is simply not worth the risk of real objects being filtered, just to remove noisethat would have been removed in later system stages either way.

4.3.3 DBSCAN clustering parameter evaluation

In this section, results regarding the clustering of radar detections are presented.The focus here is on the choice of clustering parameters Eps and MinPts, throughthe methods described in 3.4.2, Clustering of radar detections using DBSCAN.The overall performance of the clustering step is also discussed.

The Eps Parameter The Eps parameter was decided by looking through adata log where a truck passes the EGO vehicle. In figure 22 below, a typicaldifficult to cluster frame is shown. The difficulty consists of the object being quitefar away ( 70 meters, the maximum range of the radar sensors is 80 meters), theobject being large (truck) and the detections belonging to the object being quitefew and spread out (there are clear gaps between detections, and also betweengroups of detections).

81

−70 −60 −50 −40 −30 −20 −10 0 10

−30

−20

−10

0

10

20

meters

me

ters

#of clusters: 1

Figure 22: Radar Detections Clustered with Eps = 4 meters

In the figure above, the red circles represent clustered radar detections, yellowcircles are detections that are not considered moving, and the black rectangleis the EGO vehicle. Here, Eps = 4 meters, which results in all the detectionsbelonging to the truck ending up in one single cluster. A lower value for Epsresults in two clusters being formed. As discussed in section 3.4.2, a small Epsvalue is beneficial since it reduces the risk of noise points being clustered togetherwith real objects. For this reason, the Eps value was set to 4.

The MinPts Parameter The choice of the MinPts parameter was made bylooking at the pedestrian class and noticing the typical number of detections apedestrian produced. It was discovered that pedestrians often produce as few as asingle detection, even at close distances to the EGO vehicle. A minimum value ofMinPts = 2, meaning at least two detections per cluster, was deemed necessaryin order to avoid much of the clustering of noise (as discussed in 3.4.2, Clusteringof radar detections using DBSCAN). Thus, a parameter value of MinPts = 2 waschosen.

Remarks and discussion about overall performance of clustering stepThe choice of both algorithm and clustering parameters is a delicate matter thatgreatly affects the performance of the entire classification system. Due to thepresence of noise, having a non-dynamic set of parameters inevitably leads toissues. For example, there are problems when trying to filter out stationary objectswhile moving (especially at higher velocities, but also in general).

In figure 21, found in section 4.3.1 above, a frame from when the EGO vehicle ismoving along a straight road with fences was shown. The frame is an example of

82

noise caused by movement of the EGO vehicle. This type of noise is problematicas there is no apparent way to filter it. Dealing with these kinds of noise detectionsin the clustering step (and avoiding sending the resulting clusters to the classifierstep) would therefore be beneficial. The close detections lie on about 2 metersdistance from each other, meaning that with the same Eps parameter as above,many clusters would be created. If on the other hand the Eps parameter wouldbe lowered, many detections that are really from a single object would be put intodifferent clusters, making the classification more unreliable.

What also happens in the situation described above, is that when an actual movingobject appears too close to the fence, it will get clustered together with it. Almostcertainly, the system will reject this massive and weird looking cluster as noise,and the object is therefore not classified.

This illustrates the difficulty with choosing suitable fixed parameters for allsituations.

These results imply that the choice of DBSCAN as clustering method is probablynot enough to reach satisfactory performance in all situations. Instead, a moreflexible option such as OPTICS could be considered which would allow for adynamic parameter usage. This would however demand extensive analysis andwork.

Another thing to consider is the usage of a different distance function. SinceDBSCAN works with any distance function (not just the Euclidean, which iswhat has been used here), a function that better fits the data could provide betterperformance. For example, since it is known that the radars can produce detectionsonly from physical surfaces, moving objects should tend to look more like ellipsesthen circles (or more like lines then boxes). A distance function that is morestrict in the normal direction of the radar could have a better correlation with thephysical reality, and therefore increase clustering performance.

4.4 Classification-related results

In this section, the results of classification-related tasks are presented. Firstly, thechoice of SVM parameters is explained via the outputs of parameter grid-searchesconducted. This is followed by a complete evaluation of the classificationperformance on validation data.

4.4.1 Support vector machine model selection

To determine SVM parameters, several cross-validation grid searches have beenperformed. The results of two grid searches are presented below.

The first grid search was conducted on a parameter grid composed of:

γ = (2−4, 2−2 . . . 28), C = (20, 22 . . . 212)

83

and with k-fold cross-validation with k = 5.

SVM parameter grid search using cross−validation scores

Slack parameter C

Ke

rne

l p

ara

me

ter

ga

mm

a

1.0e+00 4.0e+00 1.6e+01 6.4e+01 2.6e+02 1.0e+03 4.1e+03

6.3e−02

2.5e−01

1.0e+00

4.0e+00

1.6e+01

6.4e+01

2.6e+02

Cro

ss−

va

lida

tio

n s

co

re [

%]

65

70

75

80

85

Figure 23: Coarse grid search for SVM parameters

In figure 23 it can be seen that the best cross-validation scores were achieved inthe upper part of the grid. For this reason, the second grid search was extendedto allow higher values in both C and γ. The step size was also reduced from k = 2to k = 0.25, resulting in a time consuming but detailed grid.

This second grid search was conducted on the grid composed of:

γ = (2−4, 2−3.75 . . . 214), C = (2−4, 2−3.75 . . . 216)

This search was also done with k = 5.

SVM parameter grid search using cross−validation scores

Slack parameter C

Ke

rne

l p

ara

me

ter

ga

mm

a

3.0e−01 1.7e+00 9.5e+00 5.4e+01 3.0e+02 1.7e+03 9.7e+03 5.5e+04

3.0e−01

1.7e+00

9.5e+00

5.4e+01

3.0e+02

1.7e+03

9.7e+03

Cro

ss−

va

lida

tio

n s

co

re [

%]

60

65

70

75

80

85

Figure 24: Fine grid search for SVM parameters

84

In this grid search, a maximum cross-validation score of 89.80% was found withthe parameter combination C = 214, γ = 26.9. However, as can be seen in figure24, the central area contains a wide range of parameter combinations yieldingsimilar results. In order to improve generalization in the model, a lower C valueand a higher γ was chosen: C = 860, γ = 45. This combination resulted in across-validation score of 89.50%.

It should be pointed out that while it is apparent in figure 24 that the ridge ofhigh cross-validation scores continue to the southeast, that direction also leads toincreased overfitting. Since a global optima cannot be guaranteed anyway (dueto the grid search being a pure brute force method), it is deemed unnecessary tofurther extend the search.

4.4.2 Offline evaluation of classification performance

Here, the results of the classification system created within this projectare evaluated in an offline context, meaning a simulated MATLAB/Simulinkenvironment.

The evaluation is made by using 50% of labeled class data for training purposes,and verifying performance on the remaining 50%. The results are found below, intable 11.

Table 11: Offline evaluation of Classification performance

Total Accuracy: 1325/1473 ≈ 90.0%

(a) Confusion matrix

Pedestrian Bicyclist Car Truck RecallPedestrian 252 3 2 1 0.98Bicyclist 14 213 36 3 0.80Car 4 54 284 12 0.80Truck 1 2 16 576 0.97Precision 0.93 0.78 0.84 0.97

(b) Performance measurements

Class Accuracy Error F-measurePedestrian 0.98 0.02 0.95Bicyclist 0.92 0.08 0.79Car 0.92 0.08 0.82Truck 0.98 0.02 0.97Average 0.95 0.05 0.88

As expected, the lowest performance scores are those achieved for the bicycleclass. This is due to the fact that these examples are easily misclassified as both

85

pedestrians and personal vehicles. From section 4.2.2 it can be seen that thesethree classes are overlapping in the feature space and that the bicycle class isplaced in between, and highly overlapping with these two other classes.

Physically, this is reasonable since bicyclists are similar in size to pedestrians, butcan also move at higher speeds and have moving wheels. This brings them furthertowards the vehicle classes. The bicycle class simply does not have enough aspectsthat set it apart from the other classes considered in this project.

Luckily, since the iQMatic project is a research project regarding autonomousdriving in mines, the bicycle class is perhaps not the most important of the classesconsidered, since the mine is a restricted zone with a low chance of encounteringbicyclists. However, the inclusion of the bicyclist class has yielded much insightinto the limits of the radar data and the capability of the classification system.

Overall, the results in table 11 show that the finished classification system is quitepowerful and has the ability to separate the four classes. It should be notedhowever, that these results do not necessarily prove that the system has goodgeneralization.

If the discussion made in section 4.1 is taken into account, it can be further arguedthat if the labeled data has high correlation, this may also lead to a higher scorewhen evaluating classifier performance on validation data.

In order to get a better estimate of the generalization capability of the classificationsystem, a more diverse set of labeled data is required for validation purposes.However, the results show that there is a spread between the classes in the featurespace and that a good classification performance is definitely possible given thelimited input data considered in this project.

4.5 Complete system performance assessment

In this section, the performance of the complete integrated system is shown anddiscussed. Both via results obtained from the simulated environment and fromactual run-time implementation on the target system. To assess the system welook at both classification and real-time performance.

4.5.1 Real time performance

As discussed in section 3.6.2, Timings and tasks, the classification system real-timeperformance was partly evaluated using Simulink Profiler. These simulations wererun on a Intel core-i5 CPU @ 3.20 GHz with 6 GB RAM running Windows 7.Notably, this hardware is considerably less powerful than the hardware availableon Astator (see section 1.2.2, Target system). The relevant results of the profileranalysis is shown below in table 12:

86

Table 12: Real-time simlation performance from Simulink Profiler

Name Time [ s / % of total] Calls Time/CallTotal (clas sim) 50.98 100.0 1 50.981clas sim/meas 10.28 20.2 1001 0.0103clas sim/clas 2.76 5.4 1001 0.0028clas sim/vego 0.20 0.4 1001 0.0002other 37.74 74.0 - -

The above table shows proportion of total simulation time taken by the three majorfunction blocks present in the simulation of the model clas sim. The clas functionperforms all the tasks of the classification system (gatherFrame, makeFrame,cluster and classify as discussed in 3.6.2, Timings and tasks). The vego andmeas functions constitute the system task.

In the simulation, the complete clas function is performed every 10 ms, butonly contains new data every 50 ms. Thus, the time/call for the clas functionshown in the table above would have a lower average if, as is the case in the realimplementation, it conducted computations every 50 ms.

It can be seen in the table above that the clas function has an average executiontime of around 2.8 ms, which is well within the required maximum execution. Tocompare, the meas function that computes transforms has more than triple theaverage execution time per call.

Since the Simulink Profiler evaluations were run on a desktop with notably lesspowerful hardware than what is available on the Astator platform, and also inthe simulated environment, the time performance is likely considerably worsethan what is the case for the implemented system using compiled C code. Thus,these results strongly indicate that the classification system has adequate real-timeperformance.

It should be noted that the Simulink Profiler analysis was conducted using atypical data log. Such a log usually contains between 0 and 15 clusters per frame.Since it is theoretically possible (though implausible) for a single frame to contain128 clusters (that is, every one of the 256 possible detections is present and alsoclustered together with precisely one other detection), the worst-case scenario isconsiderably more computationally expensive than what has been evaluated here.For this reason, it is possible that the real-time performance can be inadequate inextreme load situations (all though such situations are very unlikely to occur).

Real time performance on the target system To further ensure that thereal time requirements are met, the real time properties of the system were alsostudied as the system was running on the actual Astator hardware.

87

Below in table 13, the average execution times of the main software functions(same as discussed above in table 12) is shown:

Table 13: Real-time performance on the target system

Function exec time min. [ms] exec time avg. [ms] log IDclas 0.018 0.349868 20150603 150621vego 0.263 0.344751 20150603 150621meas 0.242 1.61016 20150603 150621clas 0.018 0.317206 20150603 150521vego 0.26 0.341969 20150603 150521meas 0.26 1.3693 20150603 150521

The data above was extracted from data logs gathered in a highway scenario.Since such a scenario contains a lot more input data then the typical use case,these results can be seen as a heavy load-scenario. It is clear from the table abovethat the real-time performance of the classification system (the clas function) iswell within the boundaries specified, averaging around 0.3 ms in execution timefrom start to finish.

It should also be noted that since the Astator ECU is running on a real timeoperating system and with several threads, there is a possibility of the operatingsystem interrupting the currently executed function. Thus, actual functionexecution time may actually be lower than what is seen above.

4.5.2 Classification performance

In this part, an evaluation of the complete system classification performanceis shown. First, an input/output comparison between the implemented andthe simulated system is presented. Then, the simulated classification systemperformance is measured on two different logged radar data scenarios.

Input output comparison between implementation and simulation Asdiscussed in section 3.6.3, the first step of the system validation is to ensure thatthe system gives the same output when running on the target system, as in thesimulated environment, provided the same input.

This was done through comparing outputs of the two systems, and finding clustersthat had the exact same position in both environments. Since the position of acluster is calculated as the average position of all detections contained withinsaid clusters, this should mean that clusters in the exact same positions, alsocontain the same detections. If two clusters contain the same radar detections,they provide the exact same input to the classification step. Thus, the classificationoutputs of the two different systems can be compared.

The different feature values are not directly available in the implemented system.This is due to the code generation creating data structures that are very hard

88

to logically follow. As a result, these values cannot be compared. Instead, wecompare the probability outputs of the different implementations. If they are thesame, it is very likely that all other parameters have been the same. Therefore,if the same input results in the same class probability output, than the twoimplementations are likely functioning the exact same way.

In table 14 below, the input and output of objects classified as belonging to oneof each of the four different classes are compared for the real-time implementedsystem and the simulated implementation:

Table 14: Input output comparison of the two systems for the different classes

Syst time xpos ypos class prob c1 prob c2 prob c3 prob c4Sim 0.140 -24.90 57.70 4 0.0375 0.0065 0.0000 0.997Impl 0.159 -24.90 57.70 4 0.0375 0.0065 0.0000 0.997Sim 0.140 -1.35 43.09 3 0.0720 0.0000 0.9890 0.0068Impl 0.159 -1.35 43.09 3 0.0720 0.0000 0.9890 0.0068Sim 0.140 7.27 5.28 2 0.0650 0.8670 0.0000 0.0469Impl 0.159 7.27 5.28 2 0.0650 0.8670 0.0000 0.0469Sim 1.900 3.04 -6.91 1 0.7677 0.0000 0.0021 0.0047Impl 0.200 3.04 -6.91 1 0.7677 0.0000 0.0021 0.0047Sim 0.190 -6.491 26.34 -1 0.1135 0.0051 0.0000 0.1314Impl 0.200 -6.491 26.34 -1 0.1135 0.0051 0.0000 0.1314

Above, the fields Sim correspond the simulated environment and Impl is theimplementation on the target system. The Impl data was taken directly from alog created while running the classification system on Astator, while the Sim datawas created using the stored radar signals of said log to simulate the system outputin the Simulink environment. The class c1 corresponds to pedestrians, c2 to bikes,c3 to cars and c4 to trucks respectively. The -1 class corresponds to a cluster beingrejected (not likely belonging to any class). The different probabilities correspondto the Platt-scaled outputs of the classification system.

As can be seen in the table above, given the same cluster input the twosystems respond identically. Thus, it can be concluded that the real-time systemimplementation very likely delivers the exact same output as the simulated system,given the same input.

It should be noted that the above results do not indicate whether the systempredictions are correct or not, only that they give the same prediction outputprovided the same input, for each of the different classes.

It can be concluded that the implementation of the classification subsystem worksand performs adequately in a real-time perspective, both with regards to timingand with regards to correct computations. Thus, further evaluation can be donein a simulated environment, without negatively affecting validity.

89

Classification performance evaluation in the simulated environmentHere, results relating to the system performance in the simulated environment arepresented. Results from two different logs, one easy and one more difficult, arepresented and discussed. A frame from a scenario in which the EGO vehicle istraveling on a highway is also shown and discussed in relation to the other results.

Below in table 15 a table of classification results is shown. The first log is takenfrom a scenario where a truck with a trailer is passing the EGO vehicle slowly(around 30 km/h), and the EGO vehicle is stationary. This can be considered aneasy log, partly since trailers are very big and thus farther from other classes, andpartly because the EGO vehicle is stationary which greatly reduces the presenceof noise. Also, the log is taken in a big open area with no other moving targets.

The second log is from a scenario containing the EGO vehicle driving at around20 km/h and being followed by a car. This is of medium difficulty because theEGO vehicle is moving, which is causing a considerable amount of noise.

Table 15: Classification system evaluation on log with trailer

Log \Frames Clusters Bad tot/obj Rej Bike Car Truck1 \218 300 85 / 60 79 26 22 1732 \140 585 428 / 50 62 292 136 95

For the first log, out of a total of 300 clusters, 85 were misclassifications. Thisyields an overall error rate of 85/300 = 0.283. In this context, misclassificationswere counted as each cluster that clearly belonged to the truck, but was classifiedas something else, as well as the clusters not belonging to any real object thatwere not rejected. When looking at only the clusters coming from the truck(one per frame), and disregarding all noise clusters, 60 out of 218 clusters weremisclassified. This yields an error rate of a more qualitative nature, calculated as60/218 = 0.275.

As for the second log, the overall error rate was 428/585 = 0.732, this high valueis due to the fact that so many noise clusters are present and the system doesnot reject many of them. If we look only at the clusters belonging to the car,the error rate is instead 50/140 = 0.357, considerably lower, but still higher thanthe first log. The conclusion of this is that, as more noise is present in the radarsignals, beside from the misclassifications made upon the noise itself, the systemalso performs worse on the real objects. One theory that explains this is that asnoise is increased, the clusters coming from real objects are also corrupted as theymay contain some noise detections and corrupt detections as well.

In figure 25 below, a typical frame from each of the two logs is shown (with theonly cluster belonging to a real object marked):

90

meters

-80 -60 -40 -20 0 20 40 60 80

me

ters

-80

-60

-40

-20

0

20

40

60

80

X: -69.19

Y: 6.331

(a) Log 1, clusters from truck and noise

meters

-80 -60 -40 -20 0 20 40 60 80

me

ters

-80

-60

-40

-20

0

20

40

60

80

X: -55.51

Y: -0.9926

(b) Log 2, clusters from car and noise

Figure 25: Typical frames from evaluation logs

As can be seen in the figures above, a considerable amount of noise clusters isalmost always present. The case when the EGO vehicle is stationary (the firstlog) is seen to have less noise than the second log, where EGO was moving. As afurther comparison, a typical frame from when the EGO vehicle is moving fast ona highway is shown below in figure 26.

meters

-80 -60 -40 -20 0 20 40 60 80

mete

rs

-80

-60

-40

-20

0

20

40

60

80

Figure 26: Typical frame from highway log

91

In the figure above, the ego vehicle is moving at around 80 km/h, and severalvehicles of unknown classes are present around it. The detections that appearin lines are from stationary fences on both sides of the lane. Detections belowthe southernmost fence are likely stationary objects such as trees and signpostsperceived as moving objects by the system. These detections tend to appearwhenever the truck is moving at higher speeds. The preprocessing software doesnot seem to be able to deal with this in a satisfactory way, so when looking atradar data frame by frame, the presence of noise is very common. The noise pointsin turn lead to clusters being corrupted, as detections belonging to real movingobjects are grouped together with stationary things like fences or other types ofnoise.

It should be noted that misclassifications of clusters actually belonging to realobjects can also be the result of a lack of corresponding training data, (the systemis not trained on any object at these speeds) which is not reflected in the figure.

Overall, the ability of the preprocessing stage to separate stationary detectionsis inadequate in certain situations, which leads to low classification performance.(generally when the EGO vehicle is moving).

However, it is important to put this in the correct context. Since the end goalof the iQMatic project (that this system is part of) is in mining applications, itmight not be at all relevant for performance to be good on highways. Scenarioswith lots of open space and few moving targets, with the EGO vehicle moving atrelatively low speeds, is likely more relevant.

When looking at the classification performance (on clusters belonging to realobjects), the error rate is between 28 and 36 percent in the evaluated logs.With additional parameter tuning and a wider dataset, this could probably begreatly improved. Since this project has not aimed to specifically produce as highclassification accuracy as possible, but rather to provide a proof of concept, thesystem performance is considered adequate.

92

Part 5: Conclusions and future work

To conclude the thesis, we look back to the project goals stated in the beginningof the report in an effort to evaluate the work done. The report is also concludedwith thoughts and observations together with a discussion and suggestions aboutwork that could be done in the future.

5.1 Concluding discussion regarding research questionsand requirements

In an effort to relate and connect the results of the project to the goals statedin the first chapter, this section contains a discussion regarding requirements andresearch questions.

5.1.1 Requirements

In section 1.2.3, a list of project requirements was presented. As stated, theserequirements were developed in unison with Scania as well as KTH and are moreseen as guidelines for the project than something that is strictly imposed byScania. Nevertheless, the requirements have been seen as real requirements inthe development of the system.

Table 16: Fulfillment of functional requirements

Functional Requirement FulfilledCluster individual radardetections into objects

X

Remove stationary objectsand only be concerned withmoving objects

X*

Classify moving object asbelonging to one or none ofthe four classes

X

Provide a confidence output XNever exceed the real timeexecution time of 50 ms

X

As can be seen in table 16, all of the functional requirements stated at thebeginning of the project have been fully or partially fulfilled. One requirementthat has been hard to strictly fulfill is the filtering of static objects. This is dueto the fact that the radar output is easily corrupted (extra apparent if the EGOvehicle is moving), causing stationary objects to yield detections with nonzerovelocity values. This is discussed extensively throughout the report.

93

Several attempts to reduce the impacts of such corrupt detections have been madein the project, however, this has not been enough to strictly ensure that no staticobjects pass all the way through the signal chain. The radar sensors are simplynot accurate enough. The usage of detection history, as briefly discussed in 4.3.2,Results of developed filtering structures, could significantly reduce the impact ofsuch corrupt radar data.

As for the real time execution requirement, a significant evaluation has been madeand the system is deemed to fulfill the real time requirement by a large amount,with an average execution time of less than 3 ms in a non-optimized environmentand averaging 0.5 ms on the target system. However, since no worst-case scenariohas been tested, there may exist a case where the execution time of the system isincreased.

Table 17: Fulfillment of extra-functional requirements

Extra-functional Requirements FulfilledUse only radar and velocity information XAvoid using detection historyand feedback loops

X

Use SVM as classification method XDevelop the system in aMATLAB environment

X

Implement the system on theembedded hardware of Astator

X

As for the extra-functional requirements, shown above in table 17, they have allbeen followed and fulfilled.

5.1.2 Research questions

In 1.2.4, a list of four research questions was presented. The research questionswere based on the requirements and the original thesis formulation, and refinedtogether with the KTH supervisor.

The aim has been to provide answers to each research question by developing andimplementing an actual system.

Below, a recap of each research question is presented together with a discussionto serve as answer.

Research question 1: How can existing machine learning theory beintegrated into the embedded hardware of Astator with the purpose ofcreating a classification system?

94

There exists a variety of different ways in which machine learning theory can beused to achieve the goals stated in this project. In our system implementation,one successful way to do it has been presented.

Using support vector machines is a mathematically intuitive way to solve theproblem of classification. By utilizing the basics of SVM theory (see section 2.3.2),either via manual coding or using an open source library (both were done in thisproject), a classification system can be constructed in a quite simple softwarestructure. When it comes to implementation, the usage of code-generationprovided a powerful tool to convert m-code to lower level code implementableon Astator. However, should it be necessary, the SVM structure could just as wellbe implemented in C from the start.

In order to create the classification model, several prerequisites must first befulfilled:

• Gather representative data from each of the classes

• Label this data according to class

• Find features that represent the data set and separate the classes

Research question 2: How can this system be optimized for real timeexecution?

Inherently, the SVM structure is quite light-weight, as only a subset of the trainingdata is kept in the actual classification model. As for the actual prediction stage,all that is required in run-time is the evaluation of a dot product. This couldbe slightly extended if using kernel methods and/or a conversion to probabilisticoutput.

Throughout the report, several ways have been discussed in which the softwarecould be further optimized, below is a short overview:

• Switching to a linear kernel in the SVM structure will reduce the calculationsrequired to make predictions, and, provided the features are good enough,may result in model of similar performance. (see section 2.3.2, Kernels andsoft margin)

• If using kernels, selecting a lower value of the SVM slack parameter C willallow more slack, leading to a model of less complexity and a faster predictionstage. (see section 2.3.2, Model selection)

• Utilizing the capability to perform paralell computations, for example byusing PDBSCAN (paralell variant of DBSCAN) in the clustering stage, asdiscussed in section 2.4.1, Clustering of sensor data.

95

• Using a complexity-efficient multiclass ensemble, such as BTC or DAG(briefly discussed in section 2.3.1, Supervised learning, classification andoverfitting.

• Reducing the amount of features will lead to a reduced prediction executiontime. As shown in section 4.2, Analysis of feature and data characteristics,many of the features used within this project are correlated, and thus theamount could probably be reduced without a big loss in performance.

Research question 3: What can be done to improve the classificationaccuracy of this system with regards to robustness against noise andenvironmental factors?

A common problem in this radar-based system is the fact that the radarsensors occasionally deliver corrupt detections (stationary detections percievedas moving), and false detections. Both these categories are seen as noise in thecontext of this project.

In order to improve the systems robustness against noise, several strategies havebeen tested. The main concepts are listed below:

• Discard detections believed to be false or corrupt - Details of the methodsused to do this are found in section 3.4.1, Filtering of radar detections.

• Use a density based clustering algorithm specifically constructed to handlenoise. See section 2.4.1, Clustering of sensor data.

• Discard clusters believed to be corrupt - Several hypotheses on how toachieve this were put forward in section 3.4.4, Filtering of radar detectionclusters.

• Reject objects if the classification output is below the rejection threshold -More info can be found in section 3.5.2, Multiclass, rejection and confidencestructures.

In combination, the strategies listed above can remove a significant amount of noisedata and reduce the amount of false outputs coming from the classification system.However, as the radar sensors deliver a substantial amount of noise detections, itis very hard to achieve complete robustness without the usage of detection history.

As discussed in section 4.3.2, Results of developed filtering structures, the mosteffective way to handle noise would be to combine data from earlier samples, anduse this history to discard noise data that only appears sporadically in sampleddata. However, since this is already done in later system stages, one requirementof the classification system was to operate without detection history.

As for environmental factors, radar sensors have an advantage in that they arenot easily affected by outer conditions such as weather. However, as was shown in

96

section 4.3.1, Common types of noise in the radar output, the radars used withinthis project tend to output more false data in certain situations, such as whenseveral objects are present, or when objects are close to EGO. Another situationthat leads to more noise data is when the EGO system itself is moving at highvelocities.

These aspects are seen as a weakness in the radar system itself, and not in theclassification system. To gain increased robustness in these conditions, anothersolution would be to combine the radar data with data from other sensors.

Research question 4: What are the major obstacles in creating thesystem, and what can be done to overcome them?

As discussed above, noise in the radar sensor output has been a big challenge andobstacle. As a standalone system, the classification structure developed in thisproject may be insufficient as noise has such a big impact. However, the intentionhas never been to create a standalone system but rather an integrated part of theobject tracking system already present.

Aside from noise, one big obstacle has been the gathering and labeling of data.Since this is a supervised learning system, it is completely dependant on labeledtraining data in order to perform well. Thus insufficient data will lead to problemsin almost all stages of the project.

Since data gathering and labeling is manual work and very time consuming, it hasnot been a big focus of this project. Instead, the focus has been on providing aproof of concept and demonstrating the feasibility of the system.

As such, one way to overcome this particular problem is to dedicate more resourcesto data-gathering and labeling, as a bigger, more diverse set of data will improveall aspects of the system, including validation.

Another obstacle has been the difficulty in attaining in-depth knowledge regardingthe radar hardware and software, as well as the the software structures alreadyimplemented on Astator. Having such in-depth knowledge is of course beneficial.However, an upside to the methodology used in this project is that it is not neededto have extensive sensor and platform knowledge in order to produce a workingsystem.

5.2 Project-wide conclusions

In this project, we have developed a low-level, real-time moving objectclassification system based on Doppler radar detections and concepts familiar frommachine learning, specifically support vector machines. The process can be definedas follows:

97

• Gather data from sensors. This should strive to be unbiased and be asrepresentative as possible.

• Develop a filtering structure to reduce the impact of unwanted sensor dataand noise.

• If necessary or wanted, develop an algorithm that clusters data pointstogether, based on for example spatial closeness. This can be seen asunsupervised learning.

• Perform a feature selection and extraction process to gain a feature vectorthat is capable of separating different classes.

• Train a support vector machine model using an appropriate kernel andparameters. This step is considered supervised learning.

• Use code generation tools to convert the model from a simulated environmentto software that can be implemented on the target system.

The methods used differ from projects with similar goals, in being based solely ondata from radars and in being located early in the total system signal chain. Thisimplies our solution is cheaper, more robust (less reliance on fast moving partssuch as laser detectors that are more prone to break) and less complex then whatsimilar projects have accomplished. The trade-off is that the system describedis not as useful in itself, but only as part of a larger object tracking system.Performance is also affected in a number of ways, being lackluster especially inscenarios where the EGO vehicle is moving at a higher velocity.

The methodology presented within this project has been proven to work, and itis our opinion that it has the potential to be used within an end-user application.Significant performance improvements are thought possible through more finetuning of parameters, and gathering and labeling more training data.

The biggest problem with the current solution is considered to be the inadequatefiltering of stationary detections and pure noise. Radar detections that belong tostationary targets or noise and are never meant to reach the classification stage,are present and become more common the more complex a scenario becomes. Ifthe filtering stages can be improved, our methodology becomes even more relevant.

At a higher abstraction level, methods described in this project can be seenas a way of dealing with sensors that are similar to a black-box, doing sensormodel-fitting through the use of machine learning methods. This is an approachthat could become increasingly useful as sensors become more and more complex,and thus more likely to be provided by a third-party manufacturer that might bereluctant to share explicit details of how a sensor operates. Instead of trying toreverse-engineer, one can apply the methods used here and create an accurate fitbetween input and output.

98

5.3 Future work

In order to improve the performance of the system that has been developed inthis project, several things can be done. We suggest the following is investigated,ranging from most to least important for performance increase:

• First and foremost, working out a better way to distinguish between sensordata coming from stationary and moving objects. The system performs verywell when noise is absent, so this would greatly improve performance.

• Secondly, implementing a different clustering method with the use ofdynamic clustering parameters. Using the algorithm OPTICS instead ofDBSCAN could accomplish this.

• Thirdly, investigating the usage of more complex features that couldsignificantly improve performance and functionality. Somehow incorporatingdetection and classification history through use of features is one suchpossibility.

In addition, a bigger coverage in the labeled training and validation datawould improve performance in almost every stage of the process. This isespecially true for the scenarios that were not covered in the dataset createdin this project.

99

References

[1] “Vartannat jobb automatiseras inom 20 ar,” Stiftelsen for StrategiskForskning, 2014.

[2] S. Wong, “If an autonomous machine kills someone, who is responsible?,”The Guardian, 2009.

[3] T.-D. Vu, Vehicle Perception: Localization, Mapping with Detection,Classification and Tracking of Moving Objects. PhD thesis, Institut NationalPolytechnique de Grenoble, 2010.

[4] R. O. Chavez-Garcia, Multiple Sensor Fusion for Detection, Classification andTracking of Moving Objects in Driving Environments. PhD thesis, Universitede Grenoble, 2014.

[5] U. Franke, D. Pleiffer, C. Rabe, C. Knoeppel, M. Enzweiler, F. Stein, andR. G. Herrtwich, “Making Bertha see,” in IEEE International Conference onComputer Vision Workshop, 2013.

[6] J. Z. et al., “Making Bertha drive — an autonomous journey on ahistoric route,” Intelligent Transportation Systems Magazine, IEEE, vol. 6,pp. 101–141, 2014.

[7] VisLab, “Proud-car test 2013.” webpage vislab.it/proud/, 2013.

[8] B. Waske and J. A. Benediktsson, “Fusion of support vector machines forclassification of multisensor data,” IEEE Transactions on Geoscience andRemote Sensing, vol. 45, 2007.

[9] H. J. CHo, R. Li, H. Lee, and J. Y. Wu, “Vehicle classification usingsupport vector machines and kmeans clustering,” in Computational Methodsin Science and Engineering, Advances in Computational Science, 2009.

[10] M. Jahangir, K. Ponting, and J. O’Loghlen, “Robust doppler classificationtechnique based on hidden markov models,” IEE Pro. Radar Sonar Navig.,vol. 150, 2003.

[11] P. G. Kealey and M. Jahangir, “Advances in doppler recognition for groundmoving target indication,” in Automatic Target Recognition XVI, 2006.

[12] J. Kjellgren, S. Gadd, N.-U. Jonsson, and J. Gustavsson, “Analysis of dopplermeasurements of ground vehicles,” in IEEE International Radar Conference,2005.

[13] C. Alabaster, Pulse Doppler Radar - Principles, Technology, Applications.SciTech Publishing, 2012.

100

[14] D. Kissinger, Millimeter-Wave Receiver Concepts for 77 GHz AutomotiveRadar in Silicon-Germanium Technology. Springer-Verlag New York, 1st ed.,2012. p. 9-19.

[15] P. Flach, Machine Learning: The Art and Science of Algorithms That MakeSense of Data. Cambridge University Press, 2012.

[16] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclasssupport vector machines,” IEEE Transactions on Neural Networks, vol. 13,pp. 101–141, 2002.

[17] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera,“An overview of ensemble methods for binary classifiers in multi-classproblems: Experimental study on one-vs-one and one-vs-all schemes,”Pattern Recognition, vol. 44, no. 8, pp. 1761 – 1776, 2011.

[18] A. Daniely, S. Sabato, and S. Shalev-Shwartz, “Multiclass learningapproaches: A theoretical comparison with implications,” Advances in NeuralInformation Processing Systems, pp. 494 – 502, 2012.

[19] R. Rifkin and A. Klautau, “In defense of one-vs-all classification,” Journal ofMachine Learning Research, vol. 5, pp. 101–141, 2004.

[20] V. Vapnik and A. Lerner, “Pattern recognition using generalized portraitmethod,” Automation and Remote Control, vol. 24, 1963.

[21] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm foroptimal margin classifiers,” in Proceedings of the Fifth Annual Workshop onComputational Learning Theory, 1992.

[22] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,vol. 20, no. 3, pp. 273–297, 1995.

[23] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. Flannery, Numericalrecipes. Cambridge University Press, 3rd ed., 2007. p. 883-892.

[24] C.-C. C. Chih-Wei Hsu and C.-J. Lin, “A practical guide to support vectorclassification,” tech. rep., Department of Computer Science, National TaiwanUniversity, Taiwan, 2003.

[25] S. Abe, Support Vector Machines for Pattern Classification. Springer London,2nd ed., 2010. p. 93.

[26] D. Meyer, F. Leisch, and K. Hornik, “The support vector machine undertest,” Neurocomputing, vol. 55, no. 1–2, pp. 169 – 186, 2003. Support VectorMachines.

[27] J. C. Platt, “Probabilistic outputs for support vector machines andcomparisons to regularized likelihood methods,” in Advances in Large MarginClassifiers, 1999.

101

[28] M. Sokolova and G. Lapalme, “A systematic analysis of performance measuresfor classification tasks,” Information Processing & Management, vol. 45, no. 4,pp. 427 – 437, 2009.

[29] C. Ferri, J. Hernandez-Orallo, and R. Modroiu, “An experimental comparisonof performance measures for classification,” Pattern Recognition Letters,vol. 30, no. 1, pp. 27 – 38, 2009.

[30] V. Estivill-Castro, “Why so many clustering algorithms — a position paper,”ACM SIGKDD Explorations Newsletter, 2002.

[31] W.-K. Loh and Y.-H. Park, Ubiquitous Information Technologies andApplications, ch. A Survey on Density-Based Clustering Algorithms. SpringerBerlin Heidelberg, 2014.

[32] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithmfor discovering clusters in large spatial databases with noise,” in Proceedingsof 2nd International Conference on Knowledge Discovery and Data Mining,1996.

[33] M. Ankerst, M. M. Breunig, H. peter Kriegel, and J. Sander, “Optics:ordering points to identify the clustering structure,” in Proceedings of the1999 ACM SIGMOD international conference on Management of data, 1999.

[34] U. Stanczyk, Feature Selection for Data and Pattern Recognition, ch. 3.Springer, 2015.

[35] H. Liu and H. Motoda, Feature Extraction, Construction and Selection: AData Mining Perspective. Springer Science, 1998. p. 24.

[36] U. Stanczyk and L. C. Jain, Feature Selection for Data and PatternRecognition. Springer, 2015.

[37] I. T. Jolliffe, Principal Component Analysis. Springer, 2nd ed., 2002. p. 27-32,138.

[38] M. Daszykowski, dbscan (software), Matlab. Dpt. of Chemometrics,Institute of Chemistry, Univ. of Silesia., 2004. Available at:http://www.chemometria.us.edu.pl/download/DBSCAN.M.

[39] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vectormachines,” ACM Transactions on Intelligent Systems and Technology, vol. 2,2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

102

radar-detection based classification of moving objects ...kth.diva-portal.org › smash › get ›...

Documents