arxiv:1807.05722v1 [cs.ro] 16 jul 2018

Automatic generation of ground truth for the evaluation of obstacle detectionand tracking techniques

Hatem Hajri∗, Emmanuel Doucet∗†, Marc Revilloud∗, Lynda Halit∗, Benoit Lusetti∗,Mohamed-Cherif Rahal∗

∗Automated Driving Research Team, Institut VEDECOM, Versailles, France†InnoCoRe Team, Valeo, Bobigny, France

Abstract— As automated vehicles are getting closer to becominga reality, it will become mandatory to be able to characterise theperformance of their obstacle detection systems. This validationprocess requires large amounts of ground-truth data, whichis currently generated by manually annotation. In this paper,we propose a novel methodology to generate ground-truthkinematics datasets for specific objects in real-world scenes.Our procedure requires no annotation whatsoever, humanintervention being limited to sensors calibration. We presentthe recording platform which was exploited to acquire thereference data and a detailed and thorough analytical studyof the propagation of errors in our procedure. This allows usto provide detailed precision metrics for each and every dataitem in our datasets. Finally some visualisations of the acquireddata are given.

I. INTRODUCTION

Object detection and tracking both play a crucial role in au-tonomous driving. They are low-level functions upon whichmany other increasingly high-level functions are built. Thesefunctions include Intention prediction, Obstacle avoidance,Navigation and planning. Being depended on by so manyfunctions, the task of obstacle detection and tracking mustbe performed with a high level of accuracy and be robust tovarying environmental conditions. However, the generationof ground truth data to evaluate obstacle detection andtracking methods usually involves manual annotation, eitherof images, or of LIDAR point clouds [1][2][3].This paper showcases a method which takes advantage of themultiplication of autonomous driving platform prototypes inresearch structures to generate precise and accurate obstacleground truth data, without requiring the usual phase ofpainstaking manual labelling of raw data.Firstly, the methodology applied to generate this data willbe described in general terms, and some specific technicaltopics such as the sensors time-synchronisation method usedto collect data will be presented. Then analysis of errorspropagation is performed. Finally some plots of the collecteddata are given together with potential applications.

II. GENERAL METHOD DESCRIPTION

The proposed method requires two or more vehicles togenerate obstacles dynamics ground truth data. The firstvehicle -the ego-vehicle- is equipped with a high-precisionpositioning system (for example Global Navigation SatelliteSystem (GNSS) with Real Time Kinematics (RTK) correc-tions coupled with an inertial measurement unit (IMU)),

and various environment perception sensors (for example LI-DARs, cameras or RADARs). The other vehicles -the targetvehicles- only need to be equipped with a high-precisionpositioning system, similar to that of the ego-vehicle. Bysimultaneously recording the position and dynamics of allvehicles, it is possible to express the kinematics of allequipped vehicles present in the scene, in the ego-vehicleframe of reference, and therefore to generate reference datafor these vehicles. This reference data can then be usedto evaluate the performance of obstacle detection methodsapplied to the environmental sensors data collected on theego-vehicle.

III. DATA COLLECTION SETUP

In this section, we will present in details the set of sensorswhich were available for our data collection, and will de-tail the method applied to ensure the synchronicity of therecording process taking place in different vehicles. Threevehicles were used during this data collection campaign :the ego-vehicle was a Renault Scenic equipped to acquireenvironment perception data with a high accuracy. The targetvehicles were Renault ZOEs, modified to be used as au-tonomous driving research platforms, and therefore equippedwith precise positioning systems.

A. Ego-vehicle perception sensorsTo record perception data, our ego-vehicle is equipped withtwo PointGrey 23S6C-C colour cameras, a Velodyne VPL-16 3D laser scanner (16 beams, 10Hz, 100m range, 0.2

horizontal resolution), a cocoon of five Ibeo LUX laserscanners (4 beams, 25Hz, 200m range, 0.25 horizontalresolution) covering a field of view of 360 around thevehicle.The cameras are positioned inside the car, behind the wind-screen with a baseline of approximately 50cm. The VelodyneVLP-16 is positioned on the roof of the vehicle at a positionand height that minimise the occlusion of the laser beamsby the vehicle body. The Ibeo LUX are all mounted at thesame height of approximately 50cm, two on each front wing(one pointing forward, one to the side), and one at the back,pointing backwards (see Figure 2 ).

B. Precise localisation systemThe accuracy of the data generated using our method reliesentirely on the accuracy of the vehicles’ positioning systems.

arX

iv:1

807.

0572

2v1

[cs

.RO

] 1

6 Ju

l 201

8

Fig. 1. The perception vehicle used : two ibeo LUX, the VLP16, GNSSantenna and the precision odometer are visible

Fig. 2. Ibeo LUX cocoon setup

Therefore, each vehicle was equipped with state of the artpositioning sensors : a choke-ring GNSS antenna feedinga GNSS-RTK receiver coupled with a high-grade, fibreoptic gyroscopes-based iXblue Inertial Measurement Unit.Additionally, the perception vehicle is equipped with a high-accuracy odometer mounted on the rear-left wheel, while thetarget vehicles are equipped with a Correvit R© high-accuracy,contact-less optical odometer. The data emanating from thesesensors is fused using a Kalman Filter-based robust observerwhich jointly estimates the IMU and external sensors biases.The performance of this system can be further improved inpost-processing by employing accurate ephemeris data andsmoothing techniques. Table I provides an overview of thecombined performance of our positioning system and of thethe aforementioned post-processing.

C. Time synchronisation

One of the biggest challenges of performing data collectiondistributed across multiple vehicles is to precisely synchro-nise the clocks used for time-stamping the data in eachplatform. This is especially true when acquiring data in high-speed scenarios. Highway scenarios, in which the absolutevalue of the relative velocity of vehicles may reach 70m/s,require the synchronisation of the vehicle clocks to be atleast accurate to the millisecond. This inaccuracy induces an

TABLE IPOSITIONING SYSTEM PERFORMANCE

Heading(deg)

Roll/Pitch(deg)

PositionX,Y (m)

PositionZ (m)

NominalGNSS signal 0.01 0.005 0.02 0.05

60 sec GNSSoutage 0.01 0.005 0.10 0.07

300 sec GNSSoutage 0.01 0.005 0.60 0.40

incompressible positioning error, which adds to that of ourpositioning system (see Subsection VI-A).To achieve such a precise synchronisation, a Network TimeProtocol (NTP) server fed with the pulse-per-second (PPS)signal provided by our GNSS receivers was installed in eachvehicle to synchronise the on-board computers in charge ofrecording all the data. Prior to any recording session, theNTP servers and computer clocks were allowed 12 hours toconverge to a common time.

IV. SENSORS CALIBRATION

Generating ground truth data requires very accurate sensorscalibration. Given the difficulty of calibrating LIDAR sensorsrelatively to cameras[1], we propose the following calibrationpipeline : first, the positioning system is calibrated, thenthe cameras are calibrated intrinsically, and finally the rigidtransformations relating cameras and LIDARs to the vehicleframe are estimated.

A. Positioning system calibration

The calibration of the positioning system consists in calculat-ing the position and attitude of all positioning sensors (GNSSantenna, optical odometer) in the frame of reference of theIMU. After a phase of initialisation during which the vehicleremains static to allow the estimation of all sensors biasesand the convergence of the RTK, the vehicle is manoeuvred.By comparing the motion information emanating from eachsensor and comparing it to that of the IMU, one is then ableto determine the rigid transformation between the said sensorand the IMU.

B. Cameras calibration

To estimate the intrinsic parameters of our cameras, andthe relative pose of our stereo pair, we sweep the space infront of the cameras at three different depths, making sureto cover the whole field of view of each camera. We thenuse a mixture of Geiger’s checkerboard pattern detector [4]and of the sub-pixellic corner detector from openCV toextract the corners of the checkerboard. These are then fed toopenCV’s stereoCalibrate function to jointly estimatethe intrinsic parameters of each camera and their relativepose. The set of parameters thus obtained typically yieldsre-projection errors of less than 0.3 pixel.The position and orientation of our cameras are obtained ina semi-automatic fashion. Their position in the vehicle isprecisely measured with laser tracers, and their orientations

are estimated using a known ground pattern by minimisingthe re-projection error of the said pattern in the images.

C. LIDARs calibration

1) Velodyne VLP-16 calibration: The objective of the Velo-dyne to IMU calibration process is to determine the rigidtransformation TV el→IMU between the Velodyne referenceframe and that of the IMU. Our Velodyne to IMU calibrationprocess is fully automated. Using Iterative Closest Point, wematch the 3D point clouds acquired during the calibrationmanoeuvre one to another to generate a trajectory. Then,the pose of the IMU is re-sampled to the timestamps ofthe Velodyne scans, and TV el→IMU is estimated througha non-linear optimisation process applied to 1000 posesamples. This rigid transformation estimate is then refinedby repeating the process, using TV el→IMU and the linearand angular velocities of the vehicle to correct the motion-induced distorsion of the Velodyne point clouds. This processusually converges in just one iteration.2) Ibeo LUX: The calibration of an Ibeo LUX cocoon isslightly more complicated than that of a single Velodyne, asit involves simultaneously calibrating all the sensors. Indeed,calibrating each LUX separately will almost certainly resultin a poorly consistent point cloud when aggregating cloudsfrom all sensors. Likewise, precisely calibrating one sensor,and then calibrating all sensors relative to their neighbourwill also lead to such poor results. A simple way to ensurethe global coherence of the cocoon calibration is to use thepoint cloud from a calibrated Velodyne as a reference, andto calibrate all Ibeo LUX sensors relative to this spatiallycoherent reference.

V. GROUND-TRUTH DATA GENERATION

A. Notations

Let us call Xki = (x, y, vx, vy, ψ)ki the state of vehicle i,

with (x, y)ki the position of its reference point, (vx, vy)ki itsvelocity vector, and ψki its yaw angle, all expressed in theframe of reference k.In the rest of the paper, ·e and ·t respectively denote statevariables of the ego and of the target vehicle, and ·UTMand ·ego respectively denote a variable expressed in theUniversal Transverse Mercator and in the ego-vehicle frameof reference.

B. Processing

Generating a set of obstacle ground truth data from highaccuracy positioning recordings is a two step process :• generate the relative position and dynamics of the

obstacles relative to the ego-vehicle,• generate data carrying obstacle semantics from the

previously generated data.For each sensor recording, a whole set of ground truth datais generated, so as to provide ground truth synchronisedwith the sensor data. At each sensor data timestamp, theposition and dynamics of each vehicle are estimated fromthe positioning system recording using temporal splinesinterpolation.

From this, simple kinematics and velocity composition for-mulae allow the computation of the relative position anddynamics of the target vehicles in the ego-vehicle frame ofreference :[

xy

]egot

= R(−ψUTMe )

[xt − xeyt − ye

]UTM(1)

[vxvy

]egot

= R(−ψUTMe )

[vxt − vxe + ψe(yt − ye)vyt − vye − ψe(xt − xe)

]UTM(2)

ψegot = ψUTMt − ψUTMe (3)

Where R(α) denotes a rotation in SO2 of angle α

C. Exploitation

The reference relative positioning data thus obtained canthen be used to generate ground truth perception data. Onecan for example generate the bounding box of the targetvehicle in the ego-vehicle frame of reference to evaluatethe performance of LIDAR or image-based object detectionand tracking algorithms. Another possibility is to use 3Dmodels of the target vehicles and to project them in thecamera images to automatically generate a partial imagesegmentation.

VI. UNCERTAINTY PROPAGATION ANALYSIS

In this section, we perform a sensitivity analysis of ourground truth generation process, to characterise the accu-racy and precision of the generated data, depending on theperformance of our positioning systems, and the clock shiftbetween the vehicles.The inputs of our generation process are made of position,velocity and heading estimates provided by a GNSS-INSfusion system. These can be modelled as independent, Gaus-sian random variables[5].Therefore, we will treat the position, velocity and yaw angleseparately, so as to limit the calculation hurdle.

A. Position

Equation (1) yields :[xy

]egot

= F (dxUTM , dyUTM , ψUTMe )

with :

F (dx, dy, ψe) =

[dx cosψe + dy sinψedy cosψe − dx sinψe

].

Lemma 1: Let Ω be a Gaussian random variable such thatΩ ∼ N (mΩ, σ

2Ω). Then:

E(cos(Ω)) = cos(mΩ) e−σ2Ω/2

E(sin(Ω)) = sin(mΩ) e−σ2Ω/2

Proof: Using the explicit expression of the character-istic function of a Gaussian variable, we get:

E(cos(Ω) + i sin(Ω)) = E(eiΩ) = eimΩ−σ2Ω/2

The real and imaginary part of this expression yield thedesired result.

Under the assumption that Var(dx) = Var(dy) = σ2dx, and

noting Var(ψe) = σ2ψ :

Cov(F (dx, dy, ψe)) =

[a cc b

]With :

a = σ2dx + E(dx)2Var(cosψe) + E(dy)2Var(sinψe)

− E(dx)E(dy) sin (2E(ψe))e−σ2

ψ (1− e−σ2ψ )

b = σ2dx + E(dx)2Var(sinψe) + E(dy)2Var(cosψe)

+ E(dx)E(dy) sin (2E(ψe))e−σ2

ψ (1− e−σ2ψ )

c =1

2sin (2E(ψe))e

−σ2ψ (1− e−σ

2ψ )(E(dx)2 − E(dy)2)

− E(dx)E(dy) cos (2E(ψe))e−σ2

ψ (1− e−σ2ψ )

Therefore, under the assumption that the variance of theposition error is similar from one vehicle to another, andalong North and East axes (σ2

x = σ2y = σ2

pos = 12σ

2dx)), and

that the maximal distance between the ego-vehicle and anobstacle is dmax, we propose the following upper bound forthe position error covariance matrix:

a ≤ 2σ2pos + 2d2

max(1− e−σ2ψ ),

b ≤ 2σ2pos + 2d2

max(1− e−σ2ψ ),

c ≤ 3

2d2max(1− e−σ

2ψ/2).

B. Velocity

Equation (2) yields :[vxvy

]egot

= G(

(dx, dy, dvx, dvy, ψe, ψe)UTM

)with G(dx, dy, dvx, dvy, ψe, ψe) equal to :[

cosψe(dvx + ψedy) + sinψe(dvy − ψedx)

cosψe(dvy − ψedx)− sinψe(dvx + ψedy)

]Using the independence of the input variables, itis possible to express the covariance matrix ofG(dx, dy, dvx, dvy, ψe, ψe) as a function of the firstand second order moments of the input variables. Due tospace limitations, we only give bounds for the elements ofthis matrix. Similarly to Section (VI-A), these bounds tendto 0 as the variances of the input variables tend to 0.Lemma 2: Let X and Y be two independent random vari-ables with means mX and mY and variances σ2

X and σ2Y . Let

Ω be a Gaussian random variable N (mΩ, σ2Ω) independent

of X,Y . Let Z = cos(Ω)X + sin(Ω)Y . Then :

Var (Z) ≤ σ2X + σ2

Y + (1− e−σ2Ω)(|mX |+ |mY |)2

Proof: Using Lemma 1, we have the following bound:

Var(Z) ≤ σ2X + σ2

Y +m2XE(cos2(Ω)) +m2

Y E(sin2(Ω))

− e−σ2Ω(m2

X cos2(mΩ) +m2Y sin2(mΩ))

−mXmY sin(2mΩ)e−σ2Ω(1− e−σ

2Ω)

Using the identity cos2(Ω) = 1+2 cos(Ω)2 and again Lemma 1,

we get :

m2XE(cos2(Ω))− e−σ

2Ωm2

X cos2(mΩ) ≤ m2X(1− e−σ

2Ω)

The same identity holds with X replaced with Y and cosreplaced with sin. This gives the desired bound.

Let us denote the mean and variance of a Gaussian variableZ respectively by mZ and σ2

Z . Under the assumption thatVar(dvx) = Var(dvy) = 2σ2

vel and Var(dx) = Var(dy) =σ2dx, we have :

Var(dvx + ψdy) = 2σ2vel + σ2

dxσ2ψ

+m2ψσ2dx +m2

dyσ2ψ

Var(dvy − ψdx) = Var(dvx + ψdy)

Let[a cc b

]be the covariance matrix of

[vxvy

]egot

. Using

the previous results, we have the following upper bounds fora, b and c :

a, b ≤ 4(σ2vel + σ2

posσ2ψ

+ ψ2maxσ

2pos) + 2d2

maxσ2ψ,

+ 4(1− e−σ2ψ )(vmax + dmaxψmax)2

c ≤√a2√b2,

where vmax is an upper bound for dvx and dvy and ψmaxis an upper bound for ψ.

C. Yaw angle

The variance of the relative yaw estimate ψegot is triviallyderived from equation (3). Considering that the heading errorcovariance is similar in the target and ego-vehicle (σ2

ψUTMt=

σ2ψUTMe

= σ2ψ), we get:

Var(ψegot ) = 2σ2ψ

D. Results

Using the analytical upper bounds for the covariance matricesof the generated ground truth position, speed and yaw angledata calculated in the previous sections, we can estimate theprecision of the data generated using the process describedin section V.The following values will be used :

σpos 0.02 m

σvel 0.02 m.s−1

σψ 1.75.10−3 rad

dmax 50 m

vmax 36 m.s−1

ψmax 1 rad.s−1

They are representative of the performance of our positioningsystem, of the distance at which obstacles become trulyobservable, and of the maximal dynamics of the vehicle intypical use cases.These values, when injected in the upper bounds of theposition and velocity covariance matrices yield the followingresults :

‖Cov((x, y)egot )‖1/2F ≤ 0.12 m

‖Cov((vx, vy)egot )‖1/2F ≤ 0.30 m.s−1

These values represent the root mean square error on theposition and velocity information yielded by our groundtruth generation process. We stress the fact that they areobtained by taking extreme values for all variables describingthe dynamics of both vehicles, which can never actually beencountered simultaneously in real life situations.

VII. DATA VISUALISATION

In this section some plots of acquired data are given. Thesedata were collected during a tracking scenario on a tracklocated in Versailles France. The track has approximativelength of 3.2 km and is represented in Figure 3 togetherwith the starting and arriving points of the vehicles.

-600 -400 -200 0 200 400 600 800 1000

x [m]

-200

-100

0

100

200

300

400

y [m

]

Initial positionPathFinal position

Fig. 3. Test track

The following figures show temporal variations of the relativepositions x, y, relative velocities vx, vy and orientation ψsent by Lidar of the tracked vehicle (in the ego vehicle frame)and the corresponding ground truth plots. All these quantitiesare in the International System of Units.

0 20 40 60 80 100 120 140 160 180

time [s]

15

20

25

30

35

40

45

50

55

60

x-co

ordi

nate

[m]

Ground truthLidar

Fig. 4. Variations of x

0 20 40 60 80 100 120 140 160 180

time [s]

-6

-4

-2

0

2

4

6

8

10

12

14

y-co

ordi

nate

[m]

Ground truthLidar

Fig. 5. Variations of y

0 20 40 60 80 100 120 140 160 180

time [s]

-5

-4

-3

-2

-1

0

1

2

3

4

spee

d [m

/s]

Ground truthLidar

Fig. 6. Variations of vx

0 20 40 60 80 100 120 140 160 180

time [s]

-8

-6

-4

-2

0

2

4sp

eed

[m/s

]Ground truthLidar

Fig. 7. Variations of vy

0 20 40 60 80 100 120 140 160 180

time [s]

-0.4

-0.2

0

0.2

0.4

0.6

0.8

angl

e [r

ad]

Ground truthLidar

Fig. 8. Variations of ψ

VIII. APPLICATION TO THE EVALUATION OF PERCEPTIONALGORITHMS

This section gives some potential applications of groundtruth data to the conception and evaluation of perceptionalgorithms.

A. Clustering Lidar raw data:

Robust obstacle detection using sensors such as Lidar is akey point for the development of autonomous vehicles. Lidarperformance mainly depends on its low level processing (theclustering methods used for its raw data, estimations of thebounding boxes, velocities of the representative points of theclusters, etc.). The methodology presented in the paper canbe used to evaluate the performance of the main outputs ofa Lidar raw data clustering algorithm.

B. Lidar-Radar-Camera Fusion:

Lidars, Radars and Cameras are complementary sensors.Lidars are very accurate on obstacles positions and lessaccurate on their velocities. On the other hand, Radars aremore precise on obstacles velocities and less precise ontheir positions. Cameras provide images and can be usedto perform classification tasks. Fusion between these sensorsaims at combining the advantages of each sensor to providepermanent and more robust data (see for example [6], [7],[8], [9], [10]). In particular, the recent work [6] proposes aLidar-Radar fusion algorithm with evaluation on the databasegenerated in the present paper.

C. Dynamic models and prediction of obstacles motions:

Robust tracking of obstacles detected by sensors such asLidars, Radars and Cameras is crucial for a good func-tioning of autonomous cars. Kalman filters are among themost used methods to track obstacles. These methods areusually coupled to preset models such as constant velocity,acceleration or curvature. Ground truth allows to learn truedynamic models of obstacles using statistics, neural networksetc. The learnt models are to be compared with the presetones.

IX. SUMMARY AND FUTURE DEVELOPMENTS

In this paper, we have presented a method for automaticallygenerating sets of ground truth data to support advancesin the field of obstacle detection and tracking. We hopethis methodology will help contributors of this area ofresearch to challenge their approaches, and contribute tothe development of robust and reliable algorithms. In thefuture, we intend to propose a complete benchmark to unifythe usage of this dataset and the performance estimationof obstacle detection and tracking techniques. In addition,we plan to apply statistical and deep learning approachesto the generated ground truth in order to correct sensormeasurements, learn motion models etc.

REFERENCES

[1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomousdriving? the kitti vision benchmark suite,” in Conference on ComputerVision and Pattern Recognition (CVPR), 2012.

[2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen-son, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset forsemantic urban scene understanding,” in Proc. of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2016.

[3] H. Yin and C. Berger, “When to use what data set for your self-driving car algorithm: An overview of publicly available drivingdatasets,” in 2017 IEEE 20th International Conference on IntelligentTransportation Systems (ITSC), Oct 2017, pp. 1–8.

[4] A. Geiger, F. Moosmann, O. Car, and B. Schuster, “Automatic calibra-tion of range and camera sensors using a single shot,” in InternationalConference on Robotics and Automation (ICRA), 2012.

[5] X. Niu, Q. Chen, Q. Zhang, H. Zhang, J. Niu, K. Chen, C. Shi, andJ. Liu, “Using allan variance to analyze the error characteristics ofgnss positioning,” GPS Solutions, vol. 18, no. 2, pp. 231–242, Apr2014. [Online]. Available: https://doi.org/10.1007/s10291-013-0324-x

[6] H. Hajri and M.-C. Rahal, “Real time Lidar and Radar high-levelfusion for obstacle detection and tracking with evaluation on aground truth,” International Conference on Automation, Robotics andApplications, Lisbon, 2018.

https://doi.org/10.1007/s10291-013-0324-x

[7] C. Blanc, B. T. Le, Y. Le, and G. R. Moreira, “Track to Track FusionMethod Applied to Road Obstacle Detection,” IEEE InternationalConference on Information Fusion, 2004.

[8] D. Gohring, M. Wang, M. Schnurmacher, and T. Ganjineh,“Radar/Lidar sensor fusion for car-following on highways,” The 5thInternational Conference on Automation, Robotics and Applications,pp. 407–412, 2011.

[9] R. O. C. Garcıa and O. Aycard, “Multiple sensor fusion and classifi-cation for moving object detection and tracking,” IEEE Transactionson Intelligent Transportation Systems, vol. 17, pp. 525–534, 2016.

[10] A. Rangesh and M. M. Trivedi, “No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & Lidars,”https://arxiv.org/pdf/1802.08755.pdf, 2018.

arxiv:1807.05722v1 [cs.ro] 16 jul 2018

Documents