key technologies for intelligent and safer cars - from motion estimation to predictive collision...

8
Key Technologies for Intelligent and Safer Cars – from Motion Estimation to Predictive Motion Planning Rudolph Triebel * , Davide Scaramuzza * , Martin Rufli * , Luciano Spinello * , Francois Pomerleau * , Roland Siegwart * Abstract: This paper presents approaches to three of the major challenges for (semi-)autonomous driving in urban environments: self-localization and ego-motion estimation, detection of dynamic objects such as cars and pedestrians, and path planning in dynamic environments. For each of these tasks we present a summary of the techniques we employ and results on real data. All modules have been implemented and tested on our autonomous car platform SmartTer, except from the path planning part which was tested on an indoor robot and is currently being adapted to the SmartTer. Key words: Driver assistance, motion estimation, object detection, predictive path planning 1 Introduction One research area that has turned more and more into the focus of interest during the last years is the development of driver assistant systems and (semi-)autonomous cars. In particular, such systems are designed for operation in highly unstructured and dynamic environments. Especially in city centers, where many different kinds of transportation systems are encountered (walking, cycling, driving, etc.), the requirements for an au- tonomous system are very high. The key prerequisites for such systems are localization and ego-motion estimation, reliable detection and tracking of dynamic objects, and a path-plannig strategy that can cope with highly dynamic environments. For each of these sub-tasks of autonomous driving we present approaches in this paper. They have been - and still go on being - implemented and tested on an autonomous robotic platform based on a Smart car, that is equipped with several sensors (see Fig. 1, left). This paper is organized as follows. In Sec. 2 we present our approach to estimate the ego-motion of the vehicle by just using the information obtained from the omnidirectional camera. Sec. 3 describes our algorithm to detect cars and pedestrians from camera and 2D laser data. In Sec. 4, we describe the motion planning algorithm we use in dynamic environments. Finally, Sec. 5 concludes the paper. * Autonomous Systems Lab, ETH Zurich, Tannenstrasse 3 (CLA), 8092 Zurich, Switzerland (e-mail: [email protected], fi[email protected]).

Upload: ethz

Post on 12-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Key Technologies for Intelligent and Safer Cars – from

Motion Estimation to Predictive Motion Planning

Rudolph Triebel∗, Davide Scaramuzza∗, Martin Rufli∗,Luciano Spinello∗, Francois Pomerleau∗, Roland Siegwart∗

Abstract: This paper presents approaches to three of the major challenges for (semi-)autonomous

driving in urban environments: self-localization and ego-motion estimation, detection of dynamic

objects such as cars and pedestrians, and path planning in dynamic environments. For each of

these tasks we present a summary of the techniques we employ and results on real data. All

modules have been implemented and tested on our autonomous car platform SmartTer, except

from the path planning part which was tested on an indoor robot and is currently being adapted

to the SmartTer.

Key words: Driver assistance, motion estimation, object detection, predictive path planning

1 Introduction

One research area that has turned more and more into the focus of interest during thelast years is the development of driver assistant systems and (semi-)autonomous cars. Inparticular, such systems are designed for operation in highly unstructured and dynamicenvironments. Especially in city centers, where many different kinds of transportationsystems are encountered (walking, cycling, driving, etc.), the requirements for an au-tonomous system are very high. The key prerequisites for such systems are localizationand ego-motion estimation, reliable detection and tracking of dynamic objects, and apath-plannig strategy that can cope with highly dynamic environments. For each of thesesub-tasks of autonomous driving we present approaches in this paper. They have been -and still go on being - implemented and tested on an autonomous robotic platform basedon a Smart car, that is equipped with several sensors (see Fig. 1, left).

This paper is organized as follows. In Sec. 2 we present our approach to estimate theego-motion of the vehicle by just using the information obtained from the omnidirectionalcamera. Sec. 3 describes our algorithm to detect cars and pedestrians from camera and2D laser data. In Sec. 4, we describe the motion planning algorithm we use in dynamicenvironments. Finally, Sec. 5 concludes the paper.

∗Autonomous Systems Lab, ETH Zurich, Tannenstrasse 3 (CLA), 8092 Zurich, Switzerland (e-mail:[email protected], [email protected]).

RICR

ICR

S

Xc

Yc

Xc

Yc

Oideal

Oours

Figure 1: Left: The autonomous robot SmartTer. For the ego-motion estimation pre-sented in this paper, we use the omnidirectional camera mounted on the roof, while thefront laser, the perspective camera, and the rotating 3D scanner are used for detectionand tracking of pedestrians and cars. The other static lasers are for collision avoidance,which is not covered in this paper. Right: The Ackermann steering principle constrainsthe ego-motion of the car. This is used to improve the vision-based motion estimation.

2 Monocular-Vision based Ego-Motion Estimation

2.1 Overview

The problem of recovering relative camera poses and 3D structure from a set of monoc-ular images is in computer vision known as Structure From Motion (SFM) [1]. Manyresearchers have presented solutions for SFM using perspective and omnidirectional cam-eras (see e.g. [2, 3, 4, 5]). Closely related to SFM is the problem of Simultaneous Local-

ization and Mapping (SLAM), which aims at estimating the motion of the robot whilesimultaneously building and updating the environment map. SLAM has been most oftenperformed with other sensors than regular cameras, however in the last years successfulresults have been obtained using single cameras alone (see [6, 7, 8, 9]).

Although there are many algorithms for vision-based motion estimation available, theyhave not yet been used largely on mobile robots, and even less in autonomous cars. Themajor reasons for this are the complexity, the unsufficient robustness and the compu-tational costs of these algorithms, making them unsuited for real-time applications. Inrecent work [5, 10, 11], we showed that this can be addressed by using a restrictive motionmodel which allows us to parameterize the motion with only one feature correspondence.Using a single feature correspondence for motion estimation is the lowest model parame-terization possible and results in the most efficient algorithms for removing outliers.

2.2 Our Approach: Exploiting Non-holonomic Constraints

Vehicles with Ackermann steering such as cars can only be controlled with two degreesof freedom, namely forward/backward motion and turning angle, i.e. they are non-holonomic. This results in a motion equivalent to a rotation about an Instantaneous

Center of Rotation (ICR) at each point in time, where pure translation corresponds to

Figure 2: Left: Recovered 3D map and camera positions. Right: Comparison of visualodometry (red dashed) and ground truth (black solid). The entire path is 3km long.

an ICR at infinity (see Fig. 1, right). This means that a camera that is mounted on acar can only move along circular paths. As a consequence, the epipolar geometry can becomputed from only one feature correspondence, allowing a motion estimation in almostfeatureless environments. Furthermore, outlier removal can be done very efficiently. Withour motion estimation algorithm, we can process frames at a frequency of up to 400Hz,which is, to the best of our knowledge, the most efficient existing motion estimation algo-rithm. A detailed description of our algorithms (1-Point RANSAC and Histogram Voting)as well as the mathematical derivation of the motion parameters using the non-holonomicconstraints, can be found in [10, 11]. In the latter, we also show that we can even estimatethe absolute scale from a single camera without any user input or vehicle odometry.

2.3 Results

We tested our motion estimation algorithm on the SmartTer platform. We recorded adataset in the center of Zurich during rush hour with many dynamic objects such ascars, buses, and pedestrians. A video sequence of the collected data can be obtainedfrom http://asl.epfl.ch/~scaramuz/research/Davide_Scaramuzza.htm. The over-all length of the tour was about 3km (see Fig. 2). Motion estimation was done by trian-gulating feature points, tracking them, and estimating new poses from them up to a scale.The absolute scale was computed using our method presented in [11]. We also evaluateddifferent feature detectors for out SFM algorithm: SIFT, Harris, KLT, and FAST corners.The best results in terms of agreement with the ground truth were obtained with Harris,KLT, or FAST features, mainly because of their higher density (2500 ∼ 4000 featuresper frame as opposed to 700 ∼ 1000 for SIFT). The left part of Fig. 2 shows a top viewof the recovered 3D map and camera trajectory. Note that the points are aligned quitewell along straight edges, which correspond to the walls of the buildings. A comparisonwith ground truth obtained from GPS and IMU data is shown on the right of Fig. 2.The estimated path is very close to the ground truth except for an unavoidable drift thatincreases with the traveled distance. For a purely incremental algorithm without bundleadjustment or global optimization, this is a very good result, given that the overall pathlength was on the order of 3km. To improve the results further, loop closing and SLAMtechniques are currently under development for our application [12].

Figure 3: Left/Center: Subparts, depicted as colored areas, and template masks, inwhite, both computed from the training set. Note that despite an unsupervised computa-tion, the subparts exhibit some semantic interpretation. Right: Superfeatures are stablefeatures in image and descriptor space. Shown are Shape Context descriptors at Hessianinterest points (in red) for the class ’pedestrian’. The superfeatures are depicted in green.

3 Pedestrian and Car Detection

3.1 Overview

The system we employ to detect pedestrians and cars consists of three main components:an appearance based detector that uses the information from camera images, a 2D-laserbased detector providing structural information, and a tracking module that uses thecombined information from both sensors and estimates the motion vector for each trackedobject. The laser based detection applies a boosted Conditional Random Field (CRF)on geometrical and statistical features of 2D scan points. The image based detector usesan extended version of the Implicit Shape Model (ISM) [13]. It operates on a regionof interest obtained from projecting the laser detection into the image to constrain theposition and scale of the objects. The tracking module applies an Extended Kalman Filter(EKF) with two motion models, fusing the information from camera and laser. A detaileddescription of the overall system can be found in [14], here we only give a short summary.

3.2 Our Approach: Use of Machine Learning Techniques

Our appearance-based people detector is based on scale-invariant Implicit Shape Models(ISM) [13]. In summary, an ISM consists in a set of local region descriptors, called thecodebook, and a set of displacements and scale factors, usually named votes, for eachdescriptor. A vote points from the position of the descriptor to the center of the objectas it was found in the training data. The codebook with the votes is first learned from alabeled training data set and then used for detection. In the past, we presented severalimprovements to the standard ISM approach [14, 15, 16], where the most recent ones aresub-part detection, extraction of template masks and the defintion of superfeatures (seeFig. 3). The main idea behind all these extensions is to enrich and to refine the informationextracted from the training data, leading to voters that can distinguish between objectsub-parts, in- and outside of object masks, as well as weak and strong features.

The laser-based detector uses Conditional Random Fields (CRFs) [17], which representthe conditional probability p(y | x) using an undirected cyclic graph, where each nodeis associated with a label yi (pedestrian, car, background) and a feature vector xi. Theedges model the conditional dependency of two neighboring data points and have also

Figure 4: Cars and pedestrian detected and tracked under occlusion, clutter and partialviews. Blue boxes indicate car detections, orange boxes pedestrian detections. The coloredcircle on the upper left corner of each box is the track identifier. Tracks are shown incolor in the second row and plotted with respect to the robot reference frame.

associated feature vectors. The node features include size, circularity, standard deviation,etc. We apply AdaBoost [18] to these feature vectors to account for the non-linear relationbetween observations and labels. For the edge features, we compute the Euclidean distancebetween the corresponding 2D data points and a value based on the sum of distances fromthe decision boundary of AdaBoost, which is only high if both points are equally classified.The intuition here is that edges between equally labeled points reveal a higher likelihoodof correct classification. To train the CRF we use L-BFGS gradient descent and for theinference we use max-product loopy belief propagation.

3.3 Results

Figure 4 shows some qualitative results of our detection and tracking algorithm. The datawas collected from a tour of the SmartTer in the city of Zurich. It is particularly chal-lenging due to occlusions, clutter and partial views. As one can see, cars and pedestriansare detected correctly, together with a correct estimation of their motion vectors. In aquantitative evaluation, we obtained an Equal-Error-Rate (equal precision and recall) of68% for pedestrians and 75.7% for cars.

4 Predictive Motion Planning

4.1 Overview

Inside cities, autonomous cars have to navigate through both structured (streets) andunstructured (parking lots, off-road tracks) dynamic environments. As vehicle and envi-ronmental constraints often require complex maneuvers, a responsive vehicle behavior canonly be achieved if planning (and replanning) is performed in real-time (i.e. at 10Hz).In order to cope with probabilistic predictions of the future motion of a dynamic object(as obtained from our tracking module), an accurate state model including throttle andsteering wheel inputs becomes necessary.

For real-time planning and replanning in such a high-dimensional state space (wecurrently consider 2D position, heading, steering angle, velocity and time), we employ astate lattice to reduce the search space and an adaptation of the D* algorithm to dealwith time based planning, which decreases replanning time compared to, e.g., A*. Furtherinformation can be found in [19].

4.2 Our Approach: D* Planning on a State Lattice

Until recently, real-time objectives rendered the computation of optimal and kinematicallyfeasible trajectories (i.e. to the next waypoint, several 100m away) intractable. Instead,deterministic graph search algorithms such as Dijkstra [20] or A∗ [21] have often beenapplied to lower dimensional representations of the navigation problem (such as 2D grids).Paths returned by these grid based algorithms, although optimal on the grid, are notdirectly achievable due to vehicle constraints. Time consuming post-processing steps,such as path smoothing and the generation of a velocity profile along the smoothed pathare required. Even if execution time is of no concern, in absence of (road) structure orin highly cluttered and dynamic environments, path- and velocity planning cannot beseparated in order to arrive at truly time-optimal solutions, however.

A representation that addresses these issues comes in the form of the state lattice, aconstruct that reformulates the nonholonomic motion planning problem into graph search.The base lattice is generated offline by forward simulating a suitable vehicle model usinga (uniform) sampling of desired velocity and steering angle as model inputs to generatea large set of motion primitives, followed by a careful selection of a small subset of theseprimitives: the selection is performed in such a way, that through combination of motionprimitives belonging to the subset, any and all original motions can be reconstructed downto a discretization limit (see [22] for an automated approach). Compliance with vehiclekinematic and dynamic constraints is thus inherently guaranteed, given the above designiteration and a high-fidelity dynamic model of the vehicle. As an example, Fig. 5 showsall successor edges of an initial state.

On the state lattice, we employ an adapted version of the popular D* algorithm.The efficiency by which D* repairs previously generated solutions (in many cases oneto two orders of magnitude faster than replanning from scratch [23]) renders it highlydesirable for this task. D* achieves this by planning from the goal to the current robotconfiguration (unlike e.g. A*). Unfortunately, adding time to the state-space for time-based predictions of dynamic obstacles, results in time-parametrized motion, precludinggoal state initialization with the correct arrival time, which is only known after computingthe solution. To address this issue, we introduced the time-viable heuristic, which exploitsthe finite number of edges in a state lattice of bounded size: it stores all possible motioncombinations emerging from every single start configuration up to a horizon tmax, linkedto the fidelity of motion predictions. This heuristic look-up table enables the planningalgorithm to determine, whether a given expanded state is possibly within a range of tmax

from the search goal state, i.e. the current robot configuration, and if so, with which timeinstance(s) it needs to be initialized to reach the current robot configuration at t = 0.0 s.

During search, every edge of the graph is assigned a cost corresponding to the localtraversability of the terrain and a risk score (occupancy probability × collision severity)obtained from the predicted object motion. After planning, the robot begins executingthe least cost trajectory and a replanning loop with updated perception inputs is initiated.

Figure 5: Left: Example of a 16-directional lattice, constructed for a diff-drive householdrobot. Full 4D (x, y, θ, v) view for one initial heading starting at a velocity of 0.5 m/s isshown. Center/Right: Schematics of a dynamic scene, where our robot considers thepredicted future motion of dynamic objects and adapts its own trajectory accordingly.

4.3 Results

In experiments on an indoor robot, we have shown, that planning on our state lattice pro-duces smooth, directly executable solutions, and allows for fast planning and replanningtimes (typically below 0.1 s) in large environments (for details see [24]). Furthermore,we have demonstrated that through use of our novel time-viable heuristic, the efficiencybenefit of D* over i.e. A* is preserved, when planning in dynamic environments, wheretime is added to the state space [19].

5 Conclusion

In this paper, we gave an overview of the techniques we use for the three key challengesin autonomous driving in urban environments: ego-motion estimation, dynamic objectdetection, and predictive path planning. We showed the usefulness of our approaches onreal data, partly acquired with our robotic platform SmartTer. The integration of allindividual tasks into one entire system is currently under development.

References[1] R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University

Press, ISBN: 0521540518, second edition, 2004.

[2] P. I. Corke, D. Strelow, and S. Singh, “Omnidirectional visual odometry for a planetary rover,” inIEEE Int. Conf. on Intell. Rob. and Systems (IROS), 2004.

[3] D. Nister, O. Naroditsky, and Bergen J., “Visual odometry for ground vehicle applications,” Journalof Field Robotics, 2006.

[4] M. J. Milford and G. Wyeth, “Single camera vision-only slam on a suburban road network,” inIEEE Int. Conf. on Robotics and Automation, ICRA’08, 2008.

[5] D. Scaramuzza and R. Siegwart, “Appearance-guided monocular omnidirectional visual odometryfor outdoor ground vehicles,” IEEE Transactions on Robotics, vol. 24, no. 5, October 2008.

[6] M. C. Deans, Bearing-Only Localization and Mapping, Ph.D. thesis, Carnegie Mellon Univ., 2002.

[7] A Davison, “Real-time simultaneous localisation and mapping with a single camera,” in Interna-tional Conference on Computer Vision, 2003.

[8] L. A. Clemente, A. J. Davison, I. Reid, J. Neira, and J. D. Tardos, “Mapping large loops with asingle hand-held camera,” in Robotics Science and Systems, 2007.

[9] T. Lemaire and S. Lacroix, “SLAM with panoramic vision,” Journal of Field Robotics, vol. 24, no.1-2, pp. 91–111, 2007.

[10] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, “Real-time monocular visual odometry for on-road vehicles with 1-point ransac,” in IEEE International Conference on Robotics and Automation(ICRA 2009), Kobe, Japan, 16 May, 2009.

[11] D. Scaramuzza, F. Fraundorfer, M. Pollefeys, and R. Siegwart, “Absolute scale in structure frommotion from a single vehicle mounted camera by exploiting nonholonomic constraints,” in IEEEInternational Conference on Computer Vision (ICCV 2009), Kyoto, October 2009.

[12] D. Scaramuzza, F. Fraundorfer, M. Pollefeys, and R. Siegwart, “Closing the loop in appearance-guided omnidirectional visual odometry by using vocabulary trees,” Robotics and AutonomousSystem Journal (Elsevier), 2009, TO APPEAR.

[13] B. Leibe, E. Seemann, and B. Schiele, “Pedestrian detection in crowded scenes,” in IEEE Conf. onComputer Vision and Pattern Recognition (CVPR), 2005.

[14] L. Spinello, R. Triebel, and R. Siegwart, “Multiclass multimodal detection and tracking in urbanenvironments,” in Int. Conference on Field and Service Robots (FSR), 2009.

[15] L. Spinello, R. Triebel, and R. Siegwart, “Multimodal detection and tracking of pedestrians inurban environments with explicit ground plane extraction,” in IEEE Int. Conf. on Intell. Rob. andSystems (IROS), 2008.

[16] L. Spinello, R. Triebel, and R. Siegwart, “Multimodal people detection and tracking in crowdedscenes,” in Proc. of the AAAI Conf. on Artificial Intelligence, July 2008.

[17] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models forsegmentation and labeling sequence data,” in Int. Conf. on Machine Learning (ICML), 2001.

[18] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and anapplication to boosting,” Journ. of Comp. and System Sciences, vol. 55, 1997.

[19] M. Rufli and R. Siegwart, “On the application of the D* search algorithm to time-based planningon lattice graphs,” in Proc. of the European Conference on Mobile Robots, 2009, to appear.

[20] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol.1, pp. 269–271, 1959.

[21] P.E. Hart, N.J., Nilsson, and B. Raphael, “A formal basis for the heuristic determination ofminimum cost paths,” IEEE Trans. on Systems Science and Cybernetics, vol. 4, no. 2, 1968.

[22] M. Pivtoraiko and A. Kelly, “Efficient constrained path planning via search in state lattices,” inInt. Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), 2005.

[23] S. Koenig and M. Likhachev, “Improved fast replanning for robot navigation in unknown terrain,”in IEEE Int. Conf. on Robotics and Automation (ICRA), 2002.

[24] M. Rufli, D. Ferguson, and R. Siegwart, “Smooth path planning in constrained environments,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2009.