eye gaze modelling for attention prediction · the model the model proposed by xia et all[3]...

EYE GAZE MODELLING

ATTENTION PREDICTIONOmran Kaddah

7/3/2019

OVERVIEW

Motivation : how Attention prediction

is important for Autonomous Driving.

How the current states of art Machine

Learning tackle the problem.

Datasets & Training

Models

Conclusion

What we propose

7/3/2019

MOTIVATION

7/3/2019

MOTIVATION

Driver distraction is one of the main

causes of road accidents [1].

Predicting where the drive should

have his gaze by warnings.

Current eye gaze prediction models are

powerful. However, they still need

more improvement.

Improve accuracy

Decrease false-negatives and false positives rates

7/3/2019

SUPERVISED MACHINE LEARNING

SETTING

Any improvement has to do with either:

❖Data

❖Model

❖The way a model is trained

7/3/2019

ATTENTION PREDICTION

7/3/2019

ATTENTION DATASETS• Datasets used by previous research papers[4][5]

➢ Year 2009 and 2011 respectively.

➢ Few frames.

➢ Lab-settings

• DR(eye)VE [2]➢ in-car settings.

➢ Collected from 74 rides by 8 drivers.❖ 1.0 car and 0.04 pedestrian per fame, 464 braking events

➢ Duration of 6 hours.

➢ Attention maps made by aggregation over temporal frame.

➢ Drawbacks: see next slide

• Berkeley DeepDrive Attention A.K.A. BDD-A[3]➢ in-Lab settings

➢ More in the next slide7/3/2019

ATTENTION DATASETS

Drawbacks DR(eye)VE:Xia et all[3] discussed that stating that DR(eye)VE is single focus, human can have a covert

attention[6]. Also, it included False positive gazes(drivers tend to which are irrelevant for driving

situation[7]).

More critical situations are still needed.

Proposed solution:Xia et all[3] provided the solution with BDD-A dataset and how to train on it.

• Protocol that uses crowd-sourced driving videos

With 45 gaze providers, who played driving instructor task.

• Visual cues simultaneously demand attention (Psychological studies[8][[9])

→ aggregating and smoothing gazes of independent observers to make attention maps.

• Driving situations collected from 1,232 rides in more crowded area,

0.25 pedestrians and 4.4 cars per frame. In addition to 3x more braking events

compared to DR(eye)VE.

7/3/2019

THE MODEL

The Model proposed by Xia et all[3] predicts the driver

attention map for a video frame given the current

and previous one.

Visual feature processing module outputs each frame

into LSTM.

Dropout layers were also used.

Output is 64x36 (8)grids of probability distributions

Cross entropy as a loss functions

Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017. Predicting Driver Attention in Critical Situations. arXivpreprint arXiv:1711.06406.

7/3/2019

TRAINING

Problem: more important situations during driving are rare. And the loss is same when making error in these rare important incidences.

We need to find a way to detect those situation, and sample more of them. But how?

BDD-A has higher rate of pedestrians, cars, and braking events. But still that is not enough.

https://www.flickr.com/photos/stretchybill/5723445927

7/3/2019

TRAINING

Proposed in [3] Compute the mean attention field, and then find Kullback–

Leibler divergence between the mean attention field(distribution) and the

current frame. The result is the sampling weight for the given frame.

𝐾𝐿(𝐹 𝑥 ||𝐹 𝑥 ) =

𝑃𝐼𝑋𝐸𝐿

𝐹 𝑝𝑖𝑥𝑒𝑙 log(𝐹 𝑝𝑖𝑥𝑒𝑙

𝐹(𝑝𝑖𝑥𝑒𝑙))

𝐹/𝐹: 𝑥 → 0,1 , 𝑥 ∈ [0, 𝑑𝑖𝑚𝑋 × 𝑑𝑖𝑚𝑌]

Sequences are now sampled at probabilities proportional to the sequence

sampling weights.7/3/2019

RESULTS

https://github.com/pascalxia/driver_attention_prediction

7/3/2019

RESULTS

7/3/2019

RESULTS

7/3/2019

RESULTS

7/3/2019

EYE-GAZE MODELLING

7/3/2019

DATASETS

MSP-Gaze corpus [10]

❖ 46 participants from different ethinicties

EYEDIAP[11]

MPIIGaze[12]

7/3/2019

DATA SETS

https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-gaze.html

7/3/2019

Jha and Busso[13] proposed a model that user calibration, or

invasive equipments, inspired by model in [14].

❖ Inputs: image of both eyes, both eyes give information about

the head rotation[10]

❖Model is CNN, see next slide for the architecture

Output 2D visual map describing the probability of the gaze

direction, formulating the problem as a classification problem.

Normally it is a regression. Regression by classification.

Gaussian filter is applied on the ground truth labels and

predictions to solve the cost sensitivity problem

7/3/2019

RESULTS

7/3/2019

21➢Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.

RESULTS

7/3/2019

𝑑eye−*,and the distance between

the subject eye pair centre and the

ground truth/estimate gaze point

𝑑eye−mc is the distance between the user

and the monitor centre; 𝑑dmc−estimateis

the distance between the monitor centre and 290 the predicted gaze position; and 𝑑mc−true is the distance between the

monitor centre and the ground truth gaze position.

PROPOSAL

7/3/2019

•Two birds one stone.

•One can simulate different lighting condition on the same time for both models

•Easier to match output of both models as both are in car-driving settings.

Training Models for eye gaze and attention prediction can be done simultaneously.

•In [3] Alex was used, however, though it is simple, it has many parameters, better use modern arch as Mobile-net v2[15] has 12x less parameters, same number of operation, and a better accuracy.

•Use pretrained up sampling layers of semantic-segmentation models for model proposed in [3]

•Using LSTM for eye gaze might help further improvement with learning eye moments pattern.

Use state-of-art NN architectures:

CONCLUSION

There is always room for improvements.

How what was proposed in [3] overcame the drawbacks of in-

car setting and the bias of the dataset.

Significance of how the output is and the loss function for eye

modelling[13]

7/3/2019

REFRENCES [1] Klauer, S. G., Guo, F., Simons-Morton, B. G., Ouimet, M. C., Lee, S. E., & Dingus, T. A. (2014). Distracted

driving and risk of road crashes among novice and experienced drivers. New England journal of medicine, 370(1), 54-59.

[2]Stefano Alletto, Andrea Palazzi, Francesco Solera, Simone Calderara, and Rita Cucchiara.

“Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and

assisted driving”. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognitionWorkshops. 2016, pp. 54–60.

➢ [3] Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017 . Predicting Driver Attention in Critical Situations. arXiv preprint arXiv:1711.06406.

[4] Ludovic Simon, Jean-Philippe Tarel, and Roland Brémond. “Alerting the drivers about

road signs with poor visual saliency”. In: Intelligent Vehicles Symposium, 2009 IEEE.

IEEE. 2009, pp. 48–53.

[5] Geoffrey Underwood, Katherine Humphrey, and Editha Van Loon. “Decisions about

objects in real-world scenes are influenced by visual saliency before and during their

inspection”. In: Vision research 51.18 (2011), pp. 2031–2038.

[6] Patrick Cavanagh and George A Alvarez. “Tracking multiple targets with multifocal

attention”. In: Trends in cognitive sciences 9.7 (2005), pp. 349–354.7/3/2019

REFERENCES

[7] Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, and Rita

Cucchiara.

“Learning where to attend like a human driver”. In: Intelligent Vehicles Symposium (IV),

2017 IEEE. IEEE. 2017, pp. 920–925.

[8] Rudolf Groner, Franziska Walder, and Marina Groner. “Looking at faces: Local and

global aspects of scanpaths”. In: Advances in Psychology. Vol. 22. Elsevier, 1984, pp.

523–533.

[9] SK Mannan, KH Ruddock, and DS Wooding. “Fixation sequences made during visual

examination of briefly presented 2D images”. In: Spatial vision 11.2 (1997), pp. 157–178.

7/3/2019

REFERENCES [10]N. Li and C. Busso, “Calibration free, user independent gaze estimation with tensor analysis,” Image and Vision Computing, vol. 74, pp.

10–20, June 2018.

[11] K.A.F. Mora, F. Monay, and J.-M.Odobez, “EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras,” in Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA 2014), Safety Harbor,

FL, USA, March 2014, pp. 255–258.

[12] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “MPIIGaze: Real-world dataset and deep appearance-based gaze estimation,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 41, no. 1, pp. 162–175, January 2019.

➢ [13] Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.

[14] S. Jha and C. Busso, “Probabilistic estimation of the gaze region of the driver using dense classification,” in IEEE International Conference on Intelligent Transportation (ITSC 2018), Maui, HI, USA, November 2018, pp. 697–702.

[15]Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.C., 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4510-4520).

7/3/2019

eye gaze modelling for attention prediction · the model the model proposed by xia et all[3]...

Documents

attention models & current topics in neural mt•solution:...

xia xing_2009

model arcs attention, relevance,...

a computational model of dynamic perceptual attention for

haijun xia

xia rulebook

xia daiyun

heliocentric model an xia, jia xiao, sunyuen hui

xia, shang, & zhou dynasties xia, shang, & zhou dynasties

a reference model for driver attention in automation

deep attention model for the hierarchical diagnosis of...

maximum entropy model ling 572 fei xia 02/07-02/09/06

visual attention model based on statistical properties of...

presenter: dr. xinxing xia -...

xia dynasty

implementasi model pembelajaran attention …

heterogeneous memory enhanced multimodal attention model...

an attention-based ranking model for social media

pengaruh model arcs (attention, relevansi, …

a neural attention model for sentence summarization