eye gaze modelling for attention prediction · the model the model proposed by xia et all[3]...
Post on 16-Oct-2020
1 Views
Preview:
TRANSCRIPT
EYE GAZE MODELLING
FOR
ATTENTION PREDICTIONOmran Kaddah
7/3/2019
1
OVERVIEW
Motivation : how Attention prediction
is important for Autonomous Driving.
How the current states of art Machine
Learning tackle the problem.
Datasets & Training
Models
Conclusion
&
What we propose
7/3/2019
2
MOTIVATION
7/3/2019
3
MOTIVATION
Driver distraction is one of the main
causes of road accidents [1].
Predicting where the drive should
have his gaze by warnings.
Current eye gaze prediction models are
powerful. However, they still need
more improvement.
Improve accuracy
Decrease false-negatives and false positives rates
7/3/2019
4
SUPERVISED MACHINE LEARNING
SETTING
Any improvement has to do with either:
❖Data
❖Model
❖The way a model is trained
7/3/2019
5
ATTENTION PREDICTION
7/3/2019
6
ATTENTION DATASETS• Datasets used by previous research papers[4][5]
➢ Year 2009 and 2011 respectively.
➢ Few frames.
➢ Lab-settings
• DR(eye)VE [2]➢ in-car settings.
➢ Collected from 74 rides by 8 drivers.❖ 1.0 car and 0.04 pedestrian per fame, 464 braking events
➢ Duration of 6 hours.
➢ Attention maps made by aggregation over temporal frame.
➢ Drawbacks: see next slide
• Berkeley DeepDrive Attention A.K.A. BDD-A[3]➢ in-Lab settings
➢ More in the next slide7/3/2019
7
ATTENTION DATASETS
Drawbacks DR(eye)VE:Xia et all[3] discussed that stating that DR(eye)VE is single focus, human can have a covert
attention[6]. Also, it included False positive gazes(drivers tend to which are irrelevant for driving
situation[7]).
More critical situations are still needed.
Proposed solution:Xia et all[3] provided the solution with BDD-A dataset and how to train on it.
• Protocol that uses crowd-sourced driving videos
With 45 gaze providers, who played driving instructor task.
• Visual cues simultaneously demand attention (Psychological studies[8][[9])
→ aggregating and smoothing gazes of independent observers to make attention maps.
• Driving situations collected from 1,232 rides in more crowded area,
0.25 pedestrians and 4.4 cars per frame. In addition to 3x more braking events
compared to DR(eye)VE.
7/3/2019
8
THE MODEL
The Model proposed by Xia et all[3] predicts the driver
attention map for a video frame given the current
and previous one.
Visual feature processing module outputs each frame
into LSTM.
Dropout layers were also used.
Output is 64x36 (8)grids of probability distributions
Cross entropy as a loss functions
Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017. Predicting Driver Attention in Critical Situations. arXivpreprint arXiv:1711.06406.
7/3/2019
9
TRAINING
Problem: more important situations during driving are rare. And the loss is same when making error in these rare important incidences.
We need to find a way to detect those situation, and sample more of them. But how?
BDD-A has higher rate of pedestrians, cars, and braking events. But still that is not enough.
https://www.flickr.com/photos/stretchybill/5723445927
7/3/2019
10
TRAINING
Proposed in [3] Compute the mean attention field, and then find Kullback–
Leibler divergence between the mean attention field(distribution) and the
current frame. The result is the sampling weight for the given frame.
𝐾𝐿(𝐹 𝑥 ||𝐹 𝑥 ) =
𝑃𝐼𝑋𝐸𝐿
𝐹 𝑝𝑖𝑥𝑒𝑙 log(𝐹 𝑝𝑖𝑥𝑒𝑙
𝐹(𝑝𝑖𝑥𝑒𝑙))
𝐹/𝐹: 𝑥 → 0,1 , 𝑥 ∈ [0, 𝑑𝑖𝑚𝑋 × 𝑑𝑖𝑚𝑌]
Sequences are now sampled at probabilities proportional to the sequence
sampling weights.7/3/2019
11
RESULTS
https://github.com/pascalxia/driver_attention_prediction
7/3/2019
12
RESULTS
https://github.com/pascalxia/driver_attention_prediction
7/3/2019
13
RESULTS
https://github.com/pascalxia/driver_attention_prediction
7/3/2019
14
RESULTS
7/3/2019
15
[3]
EYE-GAZE MODELLING
7/3/2019
16
DATASETS
MSP-Gaze corpus [10]
❖ 46 participants from different ethinicties
EYEDIAP[11]
MPIIGaze[12]
7/3/2019
17
DATA SETS
https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-gaze.html
7/3/2019
18
MODEL
Jha and Busso[13] proposed a model that user calibration, or
invasive equipments, inspired by model in [14].
❖ Inputs: image of both eyes, both eyes give information about
the head rotation[10]
❖Model is CNN, see next slide for the architecture
Output 2D visual map describing the probability of the gaze
direction, formulating the problem as a classification problem.
Normally it is a regression. Regression by classification.
Gaussian filter is applied on the ground truth labels and
predictions to solve the cost sensitivity problem
7/3/2019
19
7/3/2019
20
RESULTS
7/3/2019
21➢Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.
RESULTS
7/3/2019
22
𝑑eye−*,and the distance between
the subject eye pair centre and the
ground truth/estimate gaze point
𝑑eye−mc is the distance between the user
and the monitor centre; 𝑑dmc−estimateis
the distance between the monitor centre and 290 the predicted gaze position; and 𝑑mc−true is the distance between the
monitor centre and the ground truth gaze position.
PROPOSAL
7/3/2019
23
•Two birds one stone.
•One can simulate different lighting condition on the same time for both models
•Easier to match output of both models as both are in car-driving settings.
Training Models for eye gaze and attention prediction can be done simultaneously.
•In [3] Alex was used, however, though it is simple, it has many parameters, better use modern arch as Mobile-net v2[15] has 12x less parameters, same number of operation, and a better accuracy.
•Use pretrained up sampling layers of semantic-segmentation models for model proposed in [3]
•Using LSTM for eye gaze might help further improvement with learning eye moments pattern.
Use state-of-art NN architectures:
CONCLUSION
There is always room for improvements.
How what was proposed in [3] overcame the drawbacks of in-
car setting and the bias of the dataset.
Significance of how the output is and the loss function for eye
modelling[13]
7/3/2019
24
REFRENCES [1] Klauer, S. G., Guo, F., Simons-Morton, B. G., Ouimet, M. C., Lee, S. E., & Dingus, T. A. (2014). Distracted
driving and risk of road crashes among novice and experienced drivers. New England journal of medicine, 370(1), 54-59.
[2]Stefano Alletto, Andrea Palazzi, Francesco Solera, Simone Calderara, and Rita Cucchiara.
“Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and
assisted driving”. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognitionWorkshops. 2016, pp. 54–60.
➢ [3] Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017 . Predicting Driver Attention in Critical Situations. arXiv preprint arXiv:1711.06406.
[4] Ludovic Simon, Jean-Philippe Tarel, and Roland Brémond. “Alerting the drivers about
road signs with poor visual saliency”. In: Intelligent Vehicles Symposium, 2009 IEEE.
IEEE. 2009, pp. 48–53.
[5] Geoffrey Underwood, Katherine Humphrey, and Editha Van Loon. “Decisions about
objects in real-world scenes are influenced by visual saliency before and during their
inspection”. In: Vision research 51.18 (2011), pp. 2031–2038.
[6] Patrick Cavanagh and George A Alvarez. “Tracking multiple targets with multifocal
attention”. In: Trends in cognitive sciences 9.7 (2005), pp. 349–354.7/3/2019
25
REFERENCES
[7] Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, and Rita
Cucchiara.
“Learning where to attend like a human driver”. In: Intelligent Vehicles Symposium (IV),
2017 IEEE. IEEE. 2017, pp. 920–925.
[8] Rudolf Groner, Franziska Walder, and Marina Groner. “Looking at faces: Local and
global aspects of scanpaths”. In: Advances in Psychology. Vol. 22. Elsevier, 1984, pp.
523–533.
[9] SK Mannan, KH Ruddock, and DS Wooding. “Fixation sequences made during visual
examination of briefly presented 2D images”. In: Spatial vision 11.2 (1997), pp. 157–178.
7/3/2019
26
REFERENCES [10]N. Li and C. Busso, “Calibration free, user independent gaze estimation with tensor analysis,” Image and Vision Computing, vol. 74, pp.
10–20, June 2018.
[11] K.A.F. Mora, F. Monay, and J.-M.Odobez, “EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras,” in Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA 2014), Safety Harbor,
FL, USA, March 2014, pp. 255–258.
[12] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “MPIIGaze: Real-world dataset and deep appearance-based gaze estimation,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 41, no. 1, pp. 162–175, January 2019.
➢ [13] Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.
[14] S. Jha and C. Busso, “Probabilistic estimation of the gaze region of the driver using dense classification,” in IEEE International Conference on Intelligent Transportation (ITSC 2018), Maui, HI, USA, November 2018, pp. 697–702.
[15]Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.C., 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4510-4520).
7/3/2019
27
top related