ece 763 spring 18 project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized...
TRANSCRIPT
ECE 763 Spring 18 Project 03 Lane Detection using Convolutional Neural Networks
Github: https://github.ncsu.edu/aravi6/laneDetection Members: Abhishek Ravi(aravi6), Amrutha Hakkare Arunachala (ahakkar),
Siddhesh Gotad (svgotad)
Objective:
Develop a software pipeline to identify the lane boundaries(lane lines) in a video from a front-facing camera on a car.
Motivation:
The traditional pipeline for lane detection isn’t accurate for all kinds of roads, especially twisty, gradient or mixed patch roads. The traditional pipeline of lane detection can be improved by using neural networks. In addition, neural networks can be used to detect all lanes on the road as against to detection of only the current lane where the vehicle is moving.
We referred to the traditional image processing based lane detection from: https://github.com/sujaybabruwad/Advanced-Lane-Detection
When we tried running this code on road images which had almost straight lanes with higher radius of curvature, the detected lanes looked like the one shown in Fig.1 and Fig.2
Fig:1 Lane detection using handcrafted features
Fig.2 Lane detection using handcrafted features and image processing
But when we tried using the same pipeline for images with shadows, high brightness variation, occlusions etc, the results were very bad. In many cases the lane lines did not fall on the road. Some of them are shown in Fig.3, Fig.4 and Fig.5
Fig:3 Lane detection using handcrafted features and image processing on a harder dataset.
Fig:4 Lane detection using handcrafted features and image processing on a harder dataset.
Fig:5 Lane detection using handcrafted features and image processing on a harder dataset.
This poor performance of hand crafted feature based lane detection motivated us to improve the lane detection pipeline with neural networks.
Also, being a part of the EcoPRT(an EEP initiative), a fully autonomous vehicle, the motivation behind choosing lane detection using CNN is that this module can be in future integrated with the research work at EcoPRT.
Dataset:
The tuSimple lane dataset[1] is used for training, as it is the only large scale dataset for testing and testing deep learning methods on the lane detection task. It consists of 3626 training and 2782 testing video clips, under good and medium weather conditions. They are recorded on 2-lane/3-lane/4-lane or more highway roads, at different daytimes. The traffic conditions vary for each image. For each clip, they also provide the 19 previous frames, which are not annotated. The annotations are in a .json format, indicating the x-position of the lanes at a number of discretized y-positions. On each image, the current lanes and left/right lanes are annotated.
Training images count: 3000
Testing images count: 355
The annotations from the dataset are visualized in the figure Fig.6. The green circles are the lane line point in the ground-truth. The red lines running in the horizontal direction are the discretized heights for each annotation.
Fig.6 Annotations visualized from the dataset
Preparation of ground truth images: First step towards creating a segmentation network is to create the groundtruth masks for training images. The python script for the same was created and can be found in laneDetection/scripts/dataExtraction.py in the github link.It basically reads the json file to extract the points belonging to a lane and plot them using cv2.polylines function to generate ground truth image. Ground truth extraction was performed for testing and training images and the output for a set is shown in Fig.7 and Fig.8.
Fig.7 Actual image Fig.8 Ground truth image generated
Training:
We used a VGG-16 based CNN encoder-Segmentation decoder network as designed by Marvin Teichmann[2] and used the github implementation[4] for using this network for training and testing. We have modified this code to fit our dataset and our application of lane line segmentation. The whole model was trained from scratch with our data and the post processing pipeline described below to extract lanes has been implemented on top of this architecture.
Fig: Multi-Net architecture
Final training parameters :
Parameter Value
Number of training images 3000
Architecture VGG16 (KittiSeg)
Optimizer Adam
Batch Size 1
Learning Rate 1e-5
Steps 14000
Loss function Cross entropy
Time required for training ~20 hrs
GPU Nvidia GTX 1070
Pipeline:
1. Predicting lane lines using the VGG binary lane segmentation network:
We performed lane segmentation on the actual images using weights from the model that
we trained. The output of this step can be seen in Fig.9
Fig.9 Raw segmented image
2. Applying a perspective transform to rectify binary image ("like a birds-eye view"):
A fixed perspective transform was applied by using M-matrix which consists of the following
points:
Source Destination
570,320 570,320
784,320 784,320
1253,700 784,700
250,700 570,700
This is performed in order to fit a second order polynomial to detected lane points. The
output of transformed images can be observed in Fig.10 and Fig.11
Fig.10 Perspective transform on actual image
Fig.11 Perspective transform on segmented image
3. Detecting lane pixels by performing sliding window operation:
The histogram of the lower half of transformed and segmented image was plotted as shown
in Fig. 12. Peaks in the histogram represent lane lines.
Fig.12 Histogram along columns for lower half of the image
Plotting the histogram gives a rough estimate about origin of lane lines. A sliding window
operation is performed about the local maxima for the image as shown in Fig. 13. The
coordinates of local maxima represent the curve of lane lines and are stored in a list.
Fig.13 Sliding window operation
4. Fitting polynomial equations along the lane boundaries:
A second order polynomial equation is fit on the list of coordinates obtained after the sliding
window operation. Then the equation is probed for x and y values to find final lane lines.
The 2nd order lane lines can be observed in Fig. 14.
Fig.14 Fine tuned lane lines in transformed plane
5. Reprojecting the detected lane boundaries back onto the original image:
A fixed perspective transform was applied by using inverse of M-matrix (see Step 2) for
reprojecting the lane lines on the actual image as shown in Fig. 15
Fig.15 Reprojected lane lines on actual image
Results and Evaluation:
The accuracy of the lane detection model is obtained by comparing final output in Fig.16
with the ground truth in Fig.17
Fig.16 Final lane lines
Fig.17 Ground truth
Fig.18 Intersection of ground truth and final lane lines
Evaluation Metric:
Metric = Number of pixels in the ground truthNumber of pixels in the intersection of f inal lane lines and ground truth
The accuracy score for Fig.18 is 86.212%
We tested our algorithm on 355 test images with which we obtained a score(intersection over ground truth) of 62.44%.
We tried running our model on the same images with shadows and occlusions like we have mentioned in the motivation section. Fig.19, Fig.20 and Fig.21 are experimental results of our model.
Fig.19 Segmented lane output 1
Fig.20 Segmented lane output 2
Fig.21 Segmented lane output 3
As it is evident, our algorithm gave better results than the algorithm based on just image processing. We are performing better in most of the cases including shadows and brightness changes. But there are times when our algorithm also fails to identify lane lines accurately. One such example is Fig.22
Fig.22 Segmented lane output in extreme shadow condition
Challenges:
1. The system used for training crashed due to low storage space. This issue was fixed by clearing up data and weights generated from previous failed training runs.
2. The system used for training crashed due to low swap memory for the Ubuntu 16.04 LTS. This issue was solved by increasing the swap memory which has to be 2 times the RAM as a rule of thumb.
3. The constant M-matrix used for projection during perspective transform causes vanishing point to shift when the vehicle is moving uphill or downhill.
4. Currently the pipeline takes 100-200ms for each frame on an Nvidia GTX 1070 and 400-600ms on a GTX 960M. This poses a potential hurdle to be used on a real time autonomous system.
Future Work:
1. The accuracy of this method can be increased if H-Net[3] is used to predict the H-matrix instead of keeping it constant. Using H-net would eliminate the error in projection when the ground plane shifts for uphill and downhill images.
2. Determine the curvature of the lane and vehicle position with respect to center.
3. Since this project is going to be used at Ecoprt, we would ideally want to train this network on the dataset that the vehicle would encounter during its operation.
4. Reduce the segmentation time either by improving the network or looking into other real time networks.
Individual Contributions:
This project was a group effort and has been categorized into different modules:
1. Literature Survey: Amrutha, Abhishek, Siddhesh 2. Understanding classical pipeline: Amrutha 3. Dataset and Groundtruth: Abhishek 4. Training and babysitting: Siddhesh 5. Neural network pipeline: Amrutha, Abhishek, Siddhesh 6. Testing and Evaluation Metric: Amrutha, Abhishek, Siddhesh 7. Presentation and Report: Amrutha, Abhishek, Siddhesh
References:
[1] TuSimple Lane Detection Challenge http://benchmark.tusimple.ai/#/t/1
[2] MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving, Marvin
Teichmann, Michael Weber, Marius Zollner, Roberto Cipolla and Raquel Urtasun
https://arxiv.org/pdf/1612.07695.pdf
[3] Towards End-to-End Lane Detection: an Instance Segmentation Approach, Davy
Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool
https://arxiv.org/pdf/1802.05591.pdf
[4] A Kitti Road Segmentation model implemented in tensorflow.
https://github.com/MarvinTeichmann/KittiSeg