ece 763 spring 18 project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized...

15
ECE 763 Spring 18 Project 03 Lane Detection using Convolutional Neural Networks Github: https://github.ncsu.edu/aravi6/laneDetection Members: Abhishek Ravi(aravi6), Amrutha Hakkare Arunachala (ahakkar), Siddhesh Gotad (svgotad) Objective: Develop a software pipeline to identify the lane boundaries(lane lines) in a video from a front-facing camera on a car. Motivation: The traditional pipeline for lane detection isn’t accurate for all kinds of roads, especially twisty, gradient or mixed patch roads. The traditional pipeline of lane detection can be improved by using neural networks. In addition, neural networks can be used to detect all lanes on the road as against to detection of only the current lane where the vehicle is moving. We referred to the traditional image processing based lane detection from: https://github.com/sujaybabruwad/Advanced-Lane-Detection When we tried running this code on road images which had almost straight lanes with higher radius of curvature, the detected lanes looked like the one shown in Fig.1 and Fig.2 Fig:1 Lane detection using handcrafted features

Upload: others

Post on 07-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

ECE 763 Spring 18 Project 03 Lane Detection using Convolutional Neural Networks

Github: https://github.ncsu.edu/aravi6/laneDetection Members: Abhishek Ravi(aravi6), Amrutha Hakkare Arunachala (ahakkar),

Siddhesh Gotad (svgotad)

Objective:

Develop a software pipeline to identify the lane boundaries(lane lines) in a video from a front-facing camera on a car.

Motivation:

The traditional pipeline for lane detection isn’t accurate for all kinds of roads, especially twisty, gradient or mixed patch roads. The traditional pipeline of lane detection can be improved by using neural networks. In addition, neural networks can be used to detect all lanes on the road as against to detection of only the current lane where the vehicle is moving.

We referred to the traditional image processing based lane detection from: https://github.com/sujaybabruwad/Advanced-Lane-Detection

When we tried running this code on road images which had almost straight lanes with higher radius of curvature, the detected lanes looked like the one shown in Fig.1 and Fig.2

Fig:1 Lane detection using handcrafted features

Page 2: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig.2 Lane detection using handcrafted features and image processing

But when we tried using the same pipeline for images with shadows, high brightness variation, occlusions etc, the results were very bad. In many cases the lane lines did not fall on the road. Some of them are shown in Fig.3, Fig.4 and Fig.5

Fig:3 Lane detection using handcrafted features and image processing on a harder dataset.

Page 3: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig:4 Lane detection using handcrafted features and image processing on a harder dataset.

Fig:5 Lane detection using handcrafted features and image processing on a harder dataset.

This poor performance of hand crafted feature based lane detection motivated us to improve the lane detection pipeline with neural networks.

Also, being a part of the EcoPRT(an EEP initiative), a fully autonomous vehicle, the motivation behind choosing lane detection using CNN is that this module can be in future integrated with the research work at EcoPRT.

Page 4: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Dataset:

The tuSimple lane dataset[1] is used for training, as it is the only large scale dataset for testing and testing deep learning methods on the lane detection task. It consists of 3626 training and 2782 testing video clips, under good and medium weather conditions. They are recorded on 2-lane/3-lane/4-lane or more highway roads, at different daytimes. The traffic conditions vary for each image. For each clip, they also provide the 19 previous frames, which are not annotated. The annotations are in a .json format, indicating the x-position of the lanes at a number of discretized y-positions. On each image, the current lanes and left/right lanes are annotated.

Training images count: 3000

Testing images count: 355

The annotations from the dataset are visualized in the figure Fig.6. The green circles are the lane line point in the ground-truth. The red lines running in the horizontal direction are the discretized heights for each annotation.

Fig.6 Annotations visualized from the dataset

Page 5: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Preparation of ground truth images: First step towards creating a segmentation network is to create the groundtruth masks for training images. The python script for the same was created and can be found in laneDetection/scripts/dataExtraction.py in the github link.It basically reads the json file to extract the points belonging to a lane and plot them using cv2.polylines function to generate ground truth image. Ground truth extraction was performed for testing and training images and the output for a set is shown in Fig.7 and Fig.8.

Fig.7 Actual image Fig.8 Ground truth image generated

Training:

We used a VGG-16 based CNN encoder-Segmentation decoder network as designed by Marvin Teichmann[2] and used the github implementation[4] for using this network for training and testing. We have modified this code to fit our dataset and our application of lane line segmentation. The whole model was trained from scratch with our data and the post processing pipeline described below to extract lanes has been implemented on top of this architecture.

Page 6: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig: Multi-Net architecture

Final training parameters :

Parameter Value

Number of training images 3000

Architecture VGG16 (KittiSeg)

Optimizer Adam

Batch Size 1

Learning Rate 1e-5

Steps 14000

Loss function Cross entropy

Time required for training ~20 hrs

GPU Nvidia GTX 1070

Page 7: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Pipeline:

1. Predicting lane lines using the VGG binary lane segmentation network:

We performed lane segmentation on the actual images using weights from the model that

we trained. The output of this step can be seen in Fig.9

Fig.9 Raw segmented image

2. Applying a perspective transform to rectify binary image ("like a birds-eye view"):

A fixed perspective transform was applied by using M-matrix which consists of the following

points:

Source Destination

570,320 570,320

784,320 784,320

1253,700 784,700

250,700 570,700

This is performed in order to fit a second order polynomial to detected lane points. The

output of transformed images can be observed in Fig.10 and Fig.11

Page 8: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig.10 Perspective transform on actual image

Fig.11 Perspective transform on segmented image

3. Detecting lane pixels by performing sliding window operation:

The histogram of the lower half of transformed and segmented image was plotted as shown

in Fig. 12. Peaks in the histogram represent lane lines.

Page 9: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig.12 Histogram along columns for lower half of the image

Plotting the histogram gives a rough estimate about origin of lane lines. A sliding window

operation is performed about the local maxima for the image as shown in Fig. 13. The

coordinates of local maxima represent the curve of lane lines and are stored in a list.

Fig.13 Sliding window operation

4. Fitting polynomial equations along the lane boundaries:

A second order polynomial equation is fit on the list of coordinates obtained after the sliding

window operation. Then the equation is probed for x and y values to find final lane lines.

The 2nd order lane lines can be observed in Fig. 14.

Page 10: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig.14 Fine tuned lane lines in transformed plane

5. Reprojecting the detected lane boundaries back onto the original image:

A fixed perspective transform was applied by using inverse of M-matrix (see Step 2) for

reprojecting the lane lines on the actual image as shown in Fig. 15

Fig.15 Reprojected lane lines on actual image

Page 11: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Results and Evaluation:

The accuracy of the lane detection model is obtained by comparing final output in Fig.16

with the ground truth in Fig.17

Fig.16 Final lane lines

Fig.17 Ground truth

Fig.18 Intersection of ground truth and final lane lines

Page 12: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Evaluation Metric:

Metric = Number of pixels in the ground truthNumber of pixels in the intersection of f inal lane lines and ground truth

The accuracy score for Fig.18 is 86.212%

We tested our algorithm on 355 test images with which we obtained a score(intersection over ground truth) of 62.44%.

We tried running our model on the same images with shadows and occlusions like we have mentioned in the motivation section. Fig.19, Fig.20 and Fig.21 are experimental results of our model.

Fig.19 Segmented lane output 1

Page 13: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Fig.20 Segmented lane output 2

Fig.21 Segmented lane output 3

Page 14: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

As it is evident, our algorithm gave better results than the algorithm based on just image processing. We are performing better in most of the cases including shadows and brightness changes. But there are times when our algorithm also fails to identify lane lines accurately. One such example is Fig.22

Fig.22 Segmented lane output in extreme shadow condition

Challenges:

1. The system used for training crashed due to low storage space. This issue was fixed by clearing up data and weights generated from previous failed training runs.

2. The system used for training crashed due to low swap memory for the Ubuntu 16.04 LTS. This issue was solved by increasing the swap memory which has to be 2 times the RAM as a rule of thumb.

3. The constant M-matrix used for projection during perspective transform causes vanishing point to shift when the vehicle is moving uphill or downhill.

4. Currently the pipeline takes 100-200ms for each frame on an Nvidia GTX 1070 and 400-600ms on a GTX 960M. This poses a potential hurdle to be used on a real time autonomous system.

Page 15: ECE 763 Spring 18 Project 03 - siddheshgotad.comsiddheshgotad.com/lane_detection.pdf · discretized heights for each annotation. Fig.6 Annotations visualized from the dataset . P

Future Work:

1. The accuracy of this method can be increased if H-Net[3] is used to predict the H-matrix instead of keeping it constant. Using H-net would eliminate the error in projection when the ground plane shifts for uphill and downhill images.

2. Determine the curvature of the lane and vehicle position with respect to center.

3. Since this project is going to be used at Ecoprt, we would ideally want to train this network on the dataset that the vehicle would encounter during its operation.

4. Reduce the segmentation time either by improving the network or looking into other real time networks.

Individual Contributions:

This project was a group effort and has been categorized into different modules:

1. Literature Survey: Amrutha, Abhishek, Siddhesh 2. Understanding classical pipeline: Amrutha 3. Dataset and Groundtruth: Abhishek 4. Training and babysitting: Siddhesh 5. Neural network pipeline: Amrutha, Abhishek, Siddhesh 6. Testing and Evaluation Metric: Amrutha, Abhishek, Siddhesh 7. Presentation and Report: Amrutha, Abhishek, Siddhesh

References:

[1] TuSimple Lane Detection Challenge http://benchmark.tusimple.ai/#/t/1

[2] MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving, Marvin

Teichmann, Michael Weber, Marius Zollner, Roberto Cipolla and Raquel Urtasun

https://arxiv.org/pdf/1612.07695.pdf

[3] Towards End-to-End Lane Detection: an Instance Segmentation Approach, Davy

Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool

https://arxiv.org/pdf/1802.05591.pdf

[4] A Kitti Road Segmentation model implemented in tensorflow.

https://github.com/MarvinTeichmann/KittiSeg