driverfatiguedetectionbasedonconvolutionalneural ...dec 02, 2019 · figure...

Research ArticleDriver Fatigue Detection Based on Convolutional NeuralNetworks Using EM-CNN

Zuopeng Zhao1 Nana Zhou 2 Lan Zhang2 Hualin Yan2 Yi Xu2 and Zhongxin Zhang 2

1School of Computer Science and Technology amp Mine Digitization Engineering Research Center of Ministry of Education of thePeoplersquos Republic of China China University of Mining and Technology Xuzhou 221116 China2School of Computer Science and Technology China University of Mining and Technology Xuzhou 221116 China

Correspondence should be addressed to Nana Zhou ts18170082p21cumteducn

Received 2 December 2019 Revised 20 October 2020 Accepted 28 October 2020 Published 18 November 2020

Academic Editor Anastasios D Doulamis

Copyright copy 2020 Zuopeng Zhao et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With a focus on fatigue driving detection research a fully automated driver fatigue status detection algorithm using driving imagesis proposed In the proposed algorithm the multitask cascaded convolutional network (MTCNN) architecture is employed in facedetection and feature point location and the region of interest (ROI) is extracted using feature points A convolutional neuralnetwork named EM-CNN is proposed to detect the states of the eyes and mouth from the ROI images e percentage of eyelidclosure over the pupil over time (PERCLOS) and mouth opening degree (POM) are two parameters used for fatigue detectionExperimental results demonstrate that the proposed EM-CNN can efficiently detect driver fatigue status using driving imageseproposed algorithm EM-CNN outperforms other CNN-based methods ie AlexNet VGG-16 GoogLeNet and ResNet50showing accuracy and sensitivity rates of 93623 and 93643 respectively

1 Introduction

A survey by the American Automobile Associationrsquos TrafficSafety Foundation found that 16ndash21 of traffic accidents werecaused by driver fatigue [1] According to Ammour et al theprobability of a traffic accident caused by driver fatigue is 46times that of normal driving [2] According to the ldquoSpecialSurvey and Investment Strategy Research Report of ChinarsquosTraffic Accident Scene Investigation and Rescue EquipmentIndustry in 2019ndash2025rdquo there were 203049 traffic accidents inChina in 2017 (with a death toll of 63372 and direct propertyloss of 121131300 yuan the equivalent of 1721275773 USdollars) To reduce the occurrence of such traffic accidents it isof great practical value to study an efficient and reliable al-gorithm to detect driver fatigue

ere are currently four methods of driver fatiguedetection

(1) Based on physiological indicators [3ndash6] Rohit et al[7] analyzed the characteristics of electroencepha-logram (EEG) using linear discriminant analysis and

a support vector machine to detect driver fatigue inreal time However most on-board physiologicalsensors are expensive and must be attached to hu-man skin which can cause driver discomfort andaffect driver behavior

(2) Based on the driving state of the vehicle [8 9]Ramesh et al [10] used a sensor to detect themovement state of the steering wheel in real time todetermine the degree of driver fatigue However theprimary disadvantage of this method is that detec-tion is highly dependent on the individual drivingcharacteristics and the road environment As a re-sult there is a high degree of randomness andcontingency between the driving state of the vehicleand driver fatigue which reduces the detectionaccuracy

(3) Based on machine vision [11 12] Grace measuredpupil size and position using infrared light of dif-ferent wavelengths [13] Yan et al used machinevision to extract the geometric shape of the mouth

HindawiComputational Intelligence and NeuroscienceVolume 2020 Article ID 7251280 11 pageshttpsdoiorg10115520207251280

shape [14] An advantage of this method is that thefacial features are noninvasive visual informationthat is unaffected by other external factors iedriving state of the vehicle individual drivingcharacteristics and road environment

(4) Based on information fusion Wang Fei et alcombined physiological indicators and driving stateof the vehicle to detect the driver fatigue state of thedriver by collecting the EEG signal of the subject andthe corresponding steering wheel manipulation dataHowever the robustness of the test is affected by theindividualrsquos manipulation habits and the drivingenvironment

2 Related Work

To address the above difficulties in this study we considerdeep convolutional neural networks (CNNs) [15ndash18]CNNs have developed rapidly in the field of machine vi-sion especially for face detection [19 20] Viola and Jones[21] and Yang et al [22] pioneered the use of the AdaBoostalgorithm with Haar features to train different weakclassifiers cascading into strong classifiers for detectingfaces and nonhuman faces In 2014 Facebook proposed theDeepFace facial recognition system which uses facealignment to fix facial features on certain pixels prior tonetwork training and extracts features using a CNN In2015 Google proposed FaceNet which uses the same faceto have high cohesion in different poses while differentfaces have low coupling properties In FaceNet the face ismapped to the feature vector of Euclidean space using aCNN and the ternary loss function [23] In 2018 theChinese Academy of Sciences and Baidu proposed Pyr-amidBox which is a context-assisted single-lens face de-tection algorithm for small fuzzy partially occluded facesPyramidBox improved network performance by usingsemisupervised methods low-level feature pyramids andcontext-sensitive predictive structures [24] CNN-basedface detection performance is enhanced significantly byusing powerful deep learning methods and end-to-endoptimization In this study we combine eye and mouthcharacteristics and use a CNN rather than the traditionalimage processing method to realize feature extraction andstate recognition [25 26] and the necessary threshold is setto judge fatigue

e proposed method to detect driver fatigue statuscomprises three components (Figure 1) First the driverrsquosfacial bounding box and the five feature points of the left andright eyes nose and the left and right corners of the mouthare obtained by an MTCNN [27] Second the states of theeyes and mouth are classified Here the region of interest(ROI) is extracted by the feature points and the states of theeyes and mouth are identified by EM-CNN Finally wecombine the percentage of eyelid closure over the pupil overtime (PERCLOS) and mouth opening degree (POM) toidentify the driverrsquos fatigue status

e primary contributions of this study are summarizedas follows

(1) e EM-CNN which is based on a state recognitionnetwork is proposed to classify eye and mouth states(ie open or closed) In machine vision-based fa-tigue driving detection blink frequency andyawning are important indicators for judging driverfatigue erefore this paper proposed a convolu-tional neural network that recognizes the state of theeyes and mouth to determine whether the eyes andmouths are open or closede EM-CNN can reducethe influence of factors such as changes in lightingsitting and occlusion of glasses to meet the adapt-ability to complex environments

(2) Amethod is developed to detect driver fatigue statusis method combines multiple levels of features bycascading two unique CNN structures Face detec-tion and feature point location are performed basedon MTCNN and the state of eyes and mouth isdetermined by EM-CNN

(3) Binocular images (rather than monocular images)are detected to obtain abundant eye features For adriverrsquos multipose face area detecting only mon-ocularrsquos information can easily cause misjudgmentTo obtain richer facial information a fatigue drivingrecognition method based on the combination ofbinocular and mouth facial features is proposedwhich utilizes the complementary advantages ofvarious features to improve the recognition accuracy

3 Proposed Methodology

31 Face Detection and Feature Point Location Face detec-tion is challenging in real-world scenarios due to changes indriver posture and unconstrained environmental factorssuch as illumination and occlusion By using the depth-cascading multitasking MTCNN framework face detectionand alignment can be completed simultaneously the in-ternal relationship between the two is exploited to improvethe performance and the global face features are extractedthus the positions of the face left and right eyes nose andthe left and right corners of the mouth can be obtained estructure of the MTCNN is shown in Figure 2 eMTCNNcomprises three cascaded subnetworks ie P-Net (proposalnetwork) R-Net (refined network) and O-Net (outputnetwork) which are detected face and feature point positionfrom coarse to fine

P-Net first an image pyramid is constructed to obtainimages of different sizes ese images are then input to theP-Net in sequence A fully convolutional network isemployed to determine whether a face is included in a12times12 area at each position thereby obtaining a boundingbox of the candidate face area and its regression vectoren the candidate face window is calibrated with the frameregression vector and nonpolar large value suppression isemployed to remove highly overlapping candidate face re-gions [28 29]

R-Net the candidate face area obtained by P-Net inputand the image size is adjusted 24times 24 e candidate facewindow is screened by bounding box regression and

2 Computational Intelligence and Neuroscience

nonmaximum value suppression In comparison with theP-Net the network structure adds a connection layer toobtain a more accurate face position

O-Net similar to the R-Net in the O-Net the image sizeis adjusted to 48times 48 and the candidate face window isscreened to obtain the final face position and five featurepoints

e MTCNN performs face detection via a three-layercascade network that performs face classification boundingbox regression and feature point location simultaneously Itdemonstrates good robustness and is suitable for real drivingenvironments e result is shown in Figure 3

32 State of the Eye and Mouth Recognition

321 ROI Extraction Generally most eye detectionmethods only extract one eye to identify a fatigue stateHowever when the driverrsquos head shifts using informationfrom only a single eye can easily cause misjudgmenterefore to obtain more eye information and accuratelyrecognize the eye state the proposed method extracts a two-eye image to determine whether the eyes are open or closed

e position of the driverrsquos left and right eyes is obtainedusing theMTCNN network Here the position of the left eyeis a1 (x1 y1) the position of the right eye is a2 (x2 y2) thedistance between the left and right eye is d1 and the width ofthe eye image is w1 e height is h1 according to theproportion of the face ldquothree courts and five eyesrdquo the

binocular images are intercepted and the correspondencebetween width and height is expressed as follows

d1

x1 minus x22 + y1 minus y221113969

w1 15 d1

h1 d1

(1)

A driverrsquos mouth region changes significantly whentalking and yawning Here the positions of the left and rightcorners of the mouth are obtained using the MTCNNnetwork e position of the left corner of the mouth is b1(x3 y3) and the position of the right corner of the mouth isb2 (x4 y4) e distance between the left and right cornersis d2 the width of the mouth image is w2 and the height ish2 similar to the eye region extraction e correspondencebetween width and height is expressed as follows

d2


w2 d2

h2 d2

(2)

322 EM-CNN Architecture After extracting the eyes andmouth regions it is necessary to evaluate the state of the eyesand mouth to determine whether they are open or closede proposed method employs EM-CNN for eye and mouthstate recognition e network structure is shown inFigure 4

In a real driving environment the acquired images of thedriverrsquos eyes and mouth are different in size thus the size ofthe input image is adjusted to 175times175 and a feature map of44times 44times 56 is obtained by two convolution pools Here thesize of the convolution kernel in the convolutional layer is3times 3 and the step size is 1e size of the convolution kernelin the pooled layer is 3times 3 and the step size is 2 To avoidreducing the size of the output image and causing partialinformation loss at the edges of the image a layer of pixels isfilled along the edge of the image before the convolutionoperationen the 1times 1 3times 3 5times 5 convolution layers and3times 3 pooling layer are used to increase the adaptability of thenetwork to the size e feature map of 44times 44times 256 isobtained through another pooling en after passingthrough a residual block there are three layers of convo-lution in the residual block and the layer is pooled and an11times 11times 72 feature map is output e feature map is thenconverted to a one-dimensional vector in the fully connectedlayer and the number of parameters is reduced by randominactivation to prevent network overfitting Finally theclassification result (ie eyes are open or closed and themouth is open or closed) is output by softmax

33 Fatigue State Detection When the driver enters thefatigue state there is usually a series of physiological reac-tions such as yawning and closing the eyes According to theEM-CNN multiple states of the eyes and mouth are ac-quired and the fatigue state of the driver is evaluated by

MTCNN

Vehicleplatformdataset

Facedetection

Feature pointlocation

ROIextraction

PERCLOSPOM

Driverfatigue

Wider_faceCelebA

State ofeye andmouth

recognition

EM-CNN

Figure 1 Flowchart of the proposed method to detect driverfatigue

Computational Intelligence and Neuroscience 3

Figure 3 Face detection and feature point location using the MTCNN

Faceclassification

Boundingbox

regression

Featurepoint

locationInput

12 times 12 times 3 5 times 5 times 10 3 times 3 times 16 1 times 1 times 32

1 times 1 times 10

1 times 1 times 4

1 times 1 times 2Conv 3 times 3 Conv 3 times 3 Conv 3 times 3MP 2 times 2

(a)

Faceclassification

Boundingbox

regression

Featurepoint

locationInput

24 times 24 times 3 11 times 11 times 28 4 times 4 times 48 3 times 3 times 64 128

FC2

4

10

Conv 3 times 3MP 3 times 3


Conv 2 times 2

(b)

Faceclassification

Boundingbox

regression

Featurepoint

locationInput48 times 48 times 3 23 times 23 times 32 10 times 10 times 64 4 times 4 times 64 3 times 3 times 128 256

FC2

4

10



Conv 3 times 3MP 2 times 2 MP 2 times 2

(c)

Figure 2 MTCNN architecture (a) P-Net (b) R-Net and (c) O-Net


calculating the eye closure degree PERCLOS and mouthopening degree POM

331 PERCLOS e PERCLOS parameter indicates thepercentage of eye closure time per unit time [30]

PERCLOS 1113936

Ni fi

Ntimes 100 (3)

Herefi represents the closed frame of the eye 1113936Ni fi

represents the number of closed-eye frames per unit timeand N is the total number of frames per unit time Todetermine the fatigue threshold 13 video frame sequenceswere collected to test and calculate its PERCLOS value eresult showed that when PERCLOS is greater than 025 thedriver is in the closed-eye state for a long time which can beused as an indicator of fatigue

332 POM Similar to PERCLOS POM represents thepercentage of mouth closure time per unit time

POM 1113936

Ni fi

Ntimes 100 (4)

Herefi indicates the frame with the mouth open 1113936Ni fi

indicates the number of open mouth frames per unit timeand N is the total number of frames per unit time WhenPOM is greater than 05 it can be judged that the driver is in

the opened mouth state for a long time which can also beused as an indicator of fatigue Greater values for these twoindicators suggest higher degrees of fatigue

333 Fatigue State Recognition After the neural networkpretraining is completed the fatigue state is identified basedon the fatigue threshold of PERCLOS and POM First theface and feature point positions of the driver frame image areobtained by the MTCNN and the ROI area of the eyes andmouth is extracted en the state of the eyes and mouth isevaluated by the proposed EM-CNN Here the eye closuredegree and mouth opening degree of the continuous frameimage are calculated and the driver is determined to be in afatigue state when the threshold is reached

4 Experimental Results

41 Dataset Description e driver driving images used inthis study were provided by an information technologycompany called Biteda A total of 4000 images of a realdriving environment were collected (Figure 5) is paperruns a holdout for the experimental results and divides thedatasets directly into a 37 ratio e datasets were dividedinto four categories ie open eyes (2226 images 1558training and 668 test samples) closed eyes (1774 images1242 training and 532 test samples) open mouth (1996

Input

BN

3 times

3 M

P

3 times

3 C

onv

BN

3 times

3 M

P FCFCFC

Input Convolution Batch normalization Max pooling Fully connected

4 6 128

Eye open

Eye close

Mouth open

Mouth close

175 times 175 times 3 175 times 175 times 48 88 times 88 times 48 88 times 88 times 56 44 times 44 times 56


3 times 3MP

1 times 1Conv

1 times 1Conv

1 times 1Conv

1 times 1Conv

5 times 5Conv

3 times 3Conv

3 times

3 M

P

3 times

3 M

P

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

Figure 4 EM-CNN structure


images 1397 training and 599 test samples) and closedmouth (2004 images 1403 training and 601 test samples)Examples of training sets are shown in Figure 6 e test setsare similar to the training sets but with different drivers

42 ImplementationDetails During the driving process theamplitude of the mouth change under normal conditions isless than that in the fatigue state which is easy to define Incontrast the state change of the eye while blinking is difficultto define thus the eye state is defined by calculating eyeclosure based on machine vision A flowchart of this methodis shown in Figure 7 To define the state of the eyes the ROIis binarized and the binary image is smoothed using ex-pansion and erosion techniques e area of the binocularimage can be represented by a black area and the number ofblack pixels and the total number of pixels of the ROI arecounted Figure 8 shows the processed eyes and mouthHere the ratio of the two is calculated as the eye closuredegree and the eye state is evaluated using a threshold valueof 015 When the ratio is greater than 015 the eye is in theopen state and when the ratio is less than 015 the eye isclosed

During training the batch size is set to 32 and theoptimizationmethod is Adam (with a learning rate of 0001)An epoch means that all training sets are trained once and100 epochs are trained using the training set

e training and testing of MTCNN and EM-CNN arebased on Python 352 and Keras 224 with Tensorflow 1120as the back end A computer with an Nvidia GTX 1080 TiGPU 64-bit Windows 7 and 4GB of memory was used asthe experimental hardware platform

43 Performance of EM-CNN To verify the efficiency of EM-CNN we implemented several other CNN methods ieAlexNet [16] VGG-16 [31] GoogLeNet [32] and ResNet50[33] e accuracy sensitivity and specificity results are shownin Table 1 e proposed EM-CNN showed an accuracy of93623 sensitivity of 93643 and specificity of 60882eproposed EM-CNN outperformed the compared networksFigure 9 shows a comparison of the accuracy obtained bydifferent networks and a comparison of cross-entropy loss forthese networks is shown in Figure 10

Figure 11 shows that the proposed EM-CNN is moresensitive and specific to the state of the mouth than the eyesbecause drivers show large changes to the mouth and lessinterference in the fatigue state Table 2 gives four classifi-cations under the receiver operating characteristic (ROC)curve e larger the area under the curve (AUC) value thebetter the classification effect e experimental resultsdemonstrate that the classification effect of the proposedEM-CNN on the mouth state is better than that for the eyes

44 Fatigue State Recognition Combined with the temporalcorrelation of eye state changes judging by the blink fre-quency fatigue driving is to mark the state of the eyes in thevideo frame sequence If the detected eye state is open it ismarked as ldquo1rdquo otherwise it is marked as ldquo0rdquo and the blinkprocess in the video frame sequence can be expressed as asequence of ldquo0rdquo and ldquo1rdquo Figure 12 reflects changes in theopen and closed states of the eyes

e frequency of yawning is used to judge the fatiguedriving and is to mark the mouth state in the video framesequence If the detected mouth state is open it is marked as

Figure 5 Sample images from the driver dataset


ldquo1rdquo otherwise it is marked as ldquo0rdquo then the beat in the videoframe sequence the yawning process can be expressed as asequence of ldquo0rdquo and ldquo1rdquo Figure 13 reflects the changes in themouth state of the driver under normal and fatigueconditions

resholds are related to different methods used indifferent studies so the existing results cannot be useddirectly resholds for PERCLOS and POM should beobtained from experiments irteen video frame se-quences of drivers in a real driving environment werecollected and experimented with the converted video

stream to frame image for recognition PERCLOS andPOM values were calculated e results are shown inFigure 14 e fatigue state can be identified according tothe fatigue threshold e test results show that when

Figure 6 Examples of training sets

Video frame

extraction

ROIextraction

Gray valuestretching

Binariza-tion

Expansioncorrosion

Black pixelratio

Figure 7 Calculation of eye closure based on machine vision

Table 1 Performance comparison of different networks

Network Accuracy () Sensitivity () Specificity ()AlexNet 89565 86065 58724VGG-16 25797 50 50GoogLeNet 91015 89906 60412ResNet-50 92899 92292 58547EM-CNN 93623 93643 60882

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

AlexNetVGG-16GoogLeNet

ResNet-50V2

0010203040506070809

1

Accu

racy

Figure 9 Comparison of accuracy of various networks

(a)

(b)

Figure 8 (a) Images showing changes in the blinking process (b) Images showing changes in the yawning process


PERCLOS reaches 025 it can be judged that the driver isin the closed-eye state for a long time When POMreaches 05 the driver is in the open mouth state for along time Note that the degree of fatigue is greater asthese indicators take greater values

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN

Figure 10 Comparison of loss of various networks

ROC curve ofclass eye closeROC curve ofclass eye open

ROC curve ofclass mouth closeROC curve ofclass mouth open

0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty

0 04 06 08 1021 ndash specificity

Figure 11 ROC curves for four classifications

Table 2 AUC for four classifications

State AUC ()Eye close 97913Eye open 991Mouth close 99854Mouth open 99918


5 Conclusions

e method of detecting driver fatigue based on cascadedMTCNN and EM-CNN is expected to play an importantrole in preventing car accidents caused by driving fatigueFor face detection and feature point location we use theMTCNN architecture of a hierarchical convolutionalnetwork which outputs the facial bounding box and thefive feature points of the left and right eyes nose and theleft and right mouth corners en the ROI of the driverimage is extracted using the feature points To evaluate thestate of the eyes and mouth this paper has proposed theEM-CNN-based detection method In an experimental

evaluation the proposed method demonstrated high ac-curacy and robustness to real driving environments Fi-nally driver fatigue is evaluated according to PERCLOSand POMe experimental results demonstrate that whenPERCLOS reaches 025 and POM reaches 05 the drivercan be considered in a fatigue state In the future it will benecessary to further test the actual performance and ro-bustness of the proposed method In addition we willimplement the method on a hardware device

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is study was supported by the China Natural ScienceFoundation (no 51874300) and Xuzhou Key RampD Program(no KC18082)

References

[1] I Parra Alonso R Izquierdo Gonzalo J Alonso A GarciaMorcillo D FernandezLlorca and M A Sotelo ldquoe ex-perience of drivertive-driverless cooperative vehicle team in

0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)

Figure 12 Comparison of driverrsquos eye state changes under (a) normal and (b) fatigue conditions

0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)

Figure 13 Comparison of driverrsquos mouth state changes under (a) normal and (b) fatigue conditions

1 2 3 4 5 6 7 8 9 10 11 12 13

Video frame sequence

PERCLOSPERCLOS_threshold

POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM

Figure 14 reshold of PERCLOS and POM


the 2016 gcdcrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 19 no 4 pp 1322ndash1334 2017

[2] N Ammour H Alhichri Y Bazi B Benjdira N Alajlan andM Zuair ldquoDeep learning approach for car detection in UAVimageryrdquo Remote Sensing vol 9 no 4 p 312 2017

[3] R P Balandong R F Ahmad M N Mohamad Saad andA S Malik ldquoA review on EEG-based automatic sleepinessdetection systems for driverrdquo Ieee Access vol 6 pp 22908ndash22919 2018

[4] X Q Ahmad et al ldquoDriving fatigue detection with fusion ofEEG and forehead EOGrdquo in Proceedings of the InternationalJoint Conference on Neural Networks pp 897ndash904 Van-couver Canada July 2016

[5] YWang X Liu Y Zhang Z Zhu D Liu and J Sun ldquoDrivingfatigue detection based on EEG signalrdquo in Proceedings of the5th International Conference on Instrumentation and Mea-surement Computer Communication and Control pp 715ndash718 Qinhuangdao China September 2015

[6] X Zhao S Xu J Rong et al ldquoDiscriminating threshold ofdriving fatigue based on the electroencephalography sampleentropy by receiver operating characteristic curve analysisrdquoJournal of Southwest Jiaotong University vol 48 no 1pp 178ndash183 2013

[7] F Rohit V Kulathumani R Kavi I Elwarfalli V Kecojevicand A Nimbarte ldquoReal-time drowsiness detection usingwearable lightweight brain sensing headbandsrdquo IET Intelli-gent Transport Systems vol 11 no 5 pp 255ndash263 2017

[8] A Kulathumani R Soua F Karray andM S Kamel ldquoRecenttrends in driver safety monitoring systems state of the art andchallengesrdquo IEEE Transactions on Vehicular Technologyvol 66 no 6 pp 4550ndash4563 2017

[9] M K Soua andMM Bundele ldquoDesign amp analysis of k meansalgorithm for cognitive fatigue detection in vehicular driverusing oximetry pulse signalrdquo in Proceedings of the IEEE In-ternational Conference on Computer Communication andControl (IC4) Indore India September 2015

[10] M V Ramesh A K Nair and A Kunnathu ldquoekkeyilintelligent steering wheel sensor network for real-timemonitoring and detection of driver drowsinessrdquo InternationalJournal of Computer Science and Security (IJCSS) vol 1 p 12011

[11] C Anitha B S Venkatesha and B S Adiga ldquoA two foldexpert system for yawning detectionrdquo Procedia ComputerScience vol 92 pp 63ndash71 2016

[12] M Karchani A Mazloumi G N Saraji et al ldquoPresenting amodel for dynamic facial expression changes in detectingdriversrsquo drowsinessrdquo Electronic Physician vol 7 no 2pp 1073ndash1077 2015

[13] L Boon-Leng L Dae-Seok and L Boon-Giin ldquoMobile-basedwearable type of driver fatigue detection by GSR and EMGrdquoin Proceedings of the TENCON 2015-2015 IEEE Region 10Conference Macau China November 2015

[14] J-J Yan H-H Kuo Y-F Lin and T-L Liao ldquoReal-timedriver drowsiness detection system based on PERCLOS andgrayscale image processingrdquo in Proceedings of the 2016 In-ternational Symposium on Computer Consumer and Control(IS3C) pp 243ndash246 Xirsquoan China July 2016

[15] C Dong C C Loy and X Tang ldquoAccelerating the super-resolution convolutional neural networkrdquo in Proceedings ofthe European Conference on Computer Vision AmsterdamNetherlands October 2016

[16] C Hentschel T P Wiradarma and H Sack ldquoFine tuningCNNS with scarce training data-adapting imagenet to artepoch classificationrdquo in Proceedings of the IEEE International

Conference on Image Processing (ICIP) Phoenix AZ USASeptember 2016

[17] A El-Fakdi and M Carreras ldquoTwo-step gradient-based re-inforcement learning for underwater robotics behaviorlearningrdquo Robotics and Autonomous Systems vol 61 no 3pp 271ndash282 2013

[18] Z Zhao C Ye Y Hu C Li and L Xiaofeng ldquoCascade andfusion of multitask convolutional neural networks for de-tection of thyroid nodules in contrast-enhanced CTrdquo Com-putational Intelligence and Neuroscience vol 2019 Article ID7401235 13 pages 2019

[19] J Gu Z Wang J Kuen et al ldquoRecent advances in con-volutional neural networksrdquo Pattern Recognition vol 77pp 354ndash377 2018

[20] F Wang Y-H Li L Huang K Chen R-H Zhang andJ-M Xu ldquoMonitoring driversrsquo sleepy status at night based onmachine visionrdquo Multimedia Tools and Applications vol 76no 13 pp 14869ndash14886 2017

[21] P Viola andM Jones ldquoRapid object detection using a boostedcascade of simple featuresrdquo in Proceedings of the Ieee Com-puter Society Conference on Computer Vision and PatternRecognition CVPR 2001 pp 511ndash518 Kauai HI USA De-cember 2001

[22] S Yang P Luo C C Loy and X Tang ldquoWider face a facedetection benchmarkrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR)pp 5525ndash5533 Las Vegas NV USA June 2016

[23] F Schroff D Kalenichenko and J Philbin ldquoFaceNet a unifiedembedding for face recognition and clusteringrdquo in Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition pp 815ndash823 Boston MA USA June 2015

[24] X Tang D K Du Z He and J Liu PyramidBox A Context-Assisted Single Shot Face Detector Springer InternationalPublishing Berlin Germany 2018

[25] R Almodfer S Xiong M Mudhsh and P Duan ldquoMulti-column deep neural network for offline Arabic handwritingrecognitionrdquo in Artificial Neural Networks and MachineLearning pp 260ndash267 Springer Berlin Germany 2017

[26] A He and X Tian ldquoMulti-organ plant identification withmulticolumn deep convolutional neural networksrdquo in Pro-ceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC) Budapest Hungary October2016

[27] K Zhang Z Zhang Z Li and Y Qiao ldquoJoint face detectionand alignment using multitask cascaded convolutional net-worksrdquo Ieee Signal Processing Letters vol 23 no 10pp 1499ndash1503 2016

[28] H Zhang R Benenson and B Schiele ldquoA convnet for non-maximum suppressionrdquo in Proceedings of the GermanConference on Pattern Recognition Hannover GermanySeptember 2016

[29] E Shelhamer J Long and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 39 no 4pp 640ndash651 2017

[30] W Shi J Li and Y Yang ldquoFace fatigue detection methodbased on MTCNN and machine visionrdquo in Advances in In-telligent Systems and Computing pp 233ndash240 SpringerVerlag Huainan China 2020

[31] Z You Y Gao J Zhang H Zhang M Zhou and C Wu ldquoAstudy on driver fatigue recognition based on SVMmethodrdquo inProceedings of the 4th International Conference on Trans-portation Information and Safety ICTIS pp 693ndash697 BanffCanada August 2017


[32] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Boston MA USA June 2015

[33] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition Las Vegas NVUSA June 2016


shape [14] An advantage of this method is that thefacial features are noninvasive visual informationthat is unaffected by other external factors iedriving state of the vehicle individual drivingcharacteristics and road environment

(4) Based on information fusion Wang Fei et alcombined physiological indicators and driving stateof the vehicle to detect the driver fatigue state of thedriver by collecting the EEG signal of the subject andthe corresponding steering wheel manipulation dataHowever the robustness of the test is affected by theindividualrsquos manipulation habits and the drivingenvironment

2 Related Work

To address the above difficulties in this study we considerdeep convolutional neural networks (CNNs) [15ndash18]CNNs have developed rapidly in the field of machine vi-sion especially for face detection [19 20] Viola and Jones[21] and Yang et al [22] pioneered the use of the AdaBoostalgorithm with Haar features to train different weakclassifiers cascading into strong classifiers for detectingfaces and nonhuman faces In 2014 Facebook proposed theDeepFace facial recognition system which uses facealignment to fix facial features on certain pixels prior tonetwork training and extracts features using a CNN In2015 Google proposed FaceNet which uses the same faceto have high cohesion in different poses while differentfaces have low coupling properties In FaceNet the face ismapped to the feature vector of Euclidean space using aCNN and the ternary loss function [23] In 2018 theChinese Academy of Sciences and Baidu proposed Pyr-amidBox which is a context-assisted single-lens face de-tection algorithm for small fuzzy partially occluded facesPyramidBox improved network performance by usingsemisupervised methods low-level feature pyramids andcontext-sensitive predictive structures [24] CNN-basedface detection performance is enhanced significantly byusing powerful deep learning methods and end-to-endoptimization In this study we combine eye and mouthcharacteristics and use a CNN rather than the traditionalimage processing method to realize feature extraction andstate recognition [25 26] and the necessary threshold is setto judge fatigue

e proposed method to detect driver fatigue statuscomprises three components (Figure 1) First the driverrsquosfacial bounding box and the five feature points of the left andright eyes nose and the left and right corners of the mouthare obtained by an MTCNN [27] Second the states of theeyes and mouth are classified Here the region of interest(ROI) is extracted by the feature points and the states of theeyes and mouth are identified by EM-CNN Finally wecombine the percentage of eyelid closure over the pupil overtime (PERCLOS) and mouth opening degree (POM) toidentify the driverrsquos fatigue status

e primary contributions of this study are summarizedas follows

(1) e EM-CNN which is based on a state recognitionnetwork is proposed to classify eye and mouth states(ie open or closed) In machine vision-based fa-tigue driving detection blink frequency andyawning are important indicators for judging driverfatigue erefore this paper proposed a convolu-tional neural network that recognizes the state of theeyes and mouth to determine whether the eyes andmouths are open or closede EM-CNN can reducethe influence of factors such as changes in lightingsitting and occlusion of glasses to meet the adapt-ability to complex environments

(2) Amethod is developed to detect driver fatigue statusis method combines multiple levels of features bycascading two unique CNN structures Face detec-tion and feature point location are performed basedon MTCNN and the state of eyes and mouth isdetermined by EM-CNN

(3) Binocular images (rather than monocular images)are detected to obtain abundant eye features For adriverrsquos multipose face area detecting only mon-ocularrsquos information can easily cause misjudgmentTo obtain richer facial information a fatigue drivingrecognition method based on the combination ofbinocular and mouth facial features is proposedwhich utilizes the complementary advantages ofvarious features to improve the recognition accuracy

3 Proposed Methodology

31 Face Detection and Feature Point Location Face detec-tion is challenging in real-world scenarios due to changes indriver posture and unconstrained environmental factorssuch as illumination and occlusion By using the depth-cascading multitasking MTCNN framework face detectionand alignment can be completed simultaneously the in-ternal relationship between the two is exploited to improvethe performance and the global face features are extractedthus the positions of the face left and right eyes nose andthe left and right corners of the mouth can be obtained estructure of the MTCNN is shown in Figure 2 eMTCNNcomprises three cascaded subnetworks ie P-Net (proposalnetwork) R-Net (refined network) and O-Net (outputnetwork) which are detected face and feature point positionfrom coarse to fine

P-Net first an image pyramid is constructed to obtainimages of different sizes ese images are then input to theP-Net in sequence A fully convolutional network isemployed to determine whether a face is included in a12times12 area at each position thereby obtaining a boundingbox of the candidate face area and its regression vectoren the candidate face window is calibrated with the frameregression vector and nonpolar large value suppression isemployed to remove highly overlapping candidate face re-gions [28 29]

R-Net the candidate face area obtained by P-Net inputand the image size is adjusted 24times 24 e candidate facewindow is screened by bounding box regression and









d1


w1 15 d1

h1 d1

(1)


d2


w2 d2

h2 d2

(2)




MTCNN


Facedetection


ROIextraction

PERCLOSPOM

Driverfatigue

Wider_faceCelebA


recognition

EM-CNN




Faceclassification

Boundingbox

regression

Featurepoint

locationInput


1 times 1 times 10

1 times 1 times 4


(a)

Faceclassification

Boundingbox

regression

Featurepoint

locationInput


FC2

4

10



Conv 2 times 2

(b)

Faceclassification

Boundingbox

regression

Featurepoint


FC2

4

10




(c)





PERCLOS 1113936

Ni fi

Ntimes 100 (3)




POM 1113936

Ni fi

Ntimes 100 (4)







Input

BN

3 times

3 M

P

3 times

3 C

onv

BN

3 times

3 M

P FCFCFC


4 6 128

Eye open

Eye close

Mouth open

Mouth close



3 times 3MP

1 times 1Conv

1 times 1Conv

1 times 1Conv

1 times 1Conv

5 times 5Conv

3 times 3Conv

3 times

3 M

P

3 times

3 M

P

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

















Video frame

extraction

ROIextraction


Binariza-tion

Expansioncorrosion

Black pixelratio




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch


ResNet-50V2

0010203040506070809

1

Accu

racy


(a)

(b)




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM














































d1


w1 15 d1

h1 d1

(1)


d2


w2 d2

h2 d2

(2)




MTCNN


Facedetection


ROIextraction

PERCLOSPOM

Driverfatigue

Wider_faceCelebA


recognition

EM-CNN




Faceclassification

Boundingbox

regression

Featurepoint

locationInput


1 times 1 times 10

1 times 1 times 4


(a)

Faceclassification

Boundingbox

regression

Featurepoint

locationInput


FC2

4

10



Conv 2 times 2

(b)

Faceclassification

Boundingbox

regression

Featurepoint


FC2

4

10




(c)





PERCLOS 1113936

Ni fi

Ntimes 100 (3)




POM 1113936

Ni fi

Ntimes 100 (4)







Input

BN

3 times

3 M

P

3 times

3 C

onv

BN

3 times

3 M

P FCFCFC


4 6 128

Eye open

Eye close

Mouth open

Mouth close



3 times 3MP

1 times 1Conv

1 times 1Conv

1 times 1Conv

1 times 1Conv

5 times 5Conv

3 times 3Conv

3 times

3 M

P

3 times

3 M

P

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

















Video frame

extraction

ROIextraction


Binariza-tion

Expansioncorrosion

Black pixelratio




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch


ResNet-50V2

0010203040506070809

1

Accu

racy


(a)

(b)




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM








































Faceclassification

Boundingbox

regression

Featurepoint

locationInput


1 times 1 times 10

1 times 1 times 4


(a)

Faceclassification

Boundingbox

regression

Featurepoint

locationInput


FC2

4

10



Conv 2 times 2

(b)

Faceclassification

Boundingbox

regression

Featurepoint


FC2

4

10




(c)





PERCLOS 1113936

Ni fi

Ntimes 100 (3)




POM 1113936

Ni fi

Ntimes 100 (4)







Input

BN

3 times

3 M

P

3 times

3 C

onv

BN

3 times

3 M

P FCFCFC


4 6 128

Eye open

Eye close

Mouth open

Mouth close



3 times 3MP

1 times 1Conv

1 times 1Conv

1 times 1Conv

1 times 1Conv

5 times 5Conv

3 times 3Conv

3 times

3 M

P

3 times

3 M

P

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

















Video frame

extraction

ROIextraction


Binariza-tion

Expansioncorrosion

Black pixelratio




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch


ResNet-50V2

0010203040506070809

1

Accu

racy


(a)

(b)




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM









































PERCLOS 1113936

Ni fi

Ntimes 100 (3)




POM 1113936

Ni fi

Ntimes 100 (4)







Input

BN

3 times

3 M

P

3 times

3 C

onv

BN

3 times

3 M

P FCFCFC


4 6 128

Eye open

Eye close

Mouth open

Mouth close



3 times 3MP

1 times 1Conv

1 times 1Conv

1 times 1Conv

1 times 1Conv

5 times 5Conv

3 times 3Conv

3 times

3 M

P

3 times

3 M

P

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

3 times

3 C

onv

















Video frame

extraction

ROIextraction


Binariza-tion

Expansioncorrosion

Black pixelratio




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch


ResNet-50V2

0010203040506070809

1

Accu

racy


(a)

(b)




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM





















































Video frame

extraction

ROIextraction


Binariza-tion

Expansioncorrosion

Black pixelratio




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch


ResNet-50V2

0010203040506070809

1

Accu

racy


(a)

(b)




1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM








































1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Epoch

02468

101214

Cros

s-en

tropy

loss


ResNet-50EM-CNN




0

01

02

03

04

05

06

07

08

09

1

Sens

itivi

ty






5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM







































5 Conclusions



Data Availability




Acknowledgments


References


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


0

1

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

(a)

0

1

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

(b)


1 2 3 4 5 6 7 8 9 10 11 12 13



POMPOM_threshold

0005

01015

02025

03035

04045

05

PERC

LOS

0010203040506070809

POM







































driverfatiguedetectionbasedonconvolutionalneural ...dec 02, 2019 · figure...

Documents