3d point of gaze estimation using head-mounted rgb-d...

2
3D Point of Gaze Estimation Using Head-Mounted RGB-D Cameras Christopher McMurrough, Christopher Conly, Vassilis Athitsos, Fillia Makedon Department of Computer Science and Engineering The University of Texas at Arlington Arlington, Texas {mcmurrough, cconly, athitsos, makedon}@uta.edu ABSTRACT This paper presents a low-cost, wearable headset for 3D Point of Gaze (PoG) estimation in assistive applications. The device consists of an eye tracking camera and forward facing RGB-D scene camera which, together, provide an es- timate of the user gaze vector and its intersection with a 3D point in space. The resulting system is able to compute the 3D PoG in real-time using inexpensive and readily available hardware components. Categories and Subject Descriptors H.5.2 [User Interfaces]: Input devices and strategies; I.4.8 [Image Processing and Computer Vision]: Scene Analysis—object recognition, range data, sensor fusion General Terms Design, Human Factors, Experimentation Keywords Eyetracking, assistive environments, multimodal systems, human-computer interaction 1. INTRODUCTION Eye gaze interaction has been shown to be highly bene- ficial to people with physical disabilities. In the case study presented in [3], 16 amyotrophic lateral sclerosis (ALS) pa- tients with severe motor disabilities (loss of mobility, inabil- ity to speak, etc.) were introduced to eye tracking devices during a 1-2 week period. Several patients reported a clear positive impact on their quality of life, resulting from en- hanced communication facilitated by the eye tracking de- vices. While the utility of gaze interaction has been demon- strated, existing eye gaze systems suffer from some limit- ing constraints. In general, they are designed for interaction with fixed computer displays or 2D scene images, and the 2D PoG of these systems does not directly translate into the 3D world. An accurate estimate of the 3D user PoG within an environment is clearly useful, as it can be used to detect user attention and intention to interact [1]. For example, knowledge of the user 3D PoG could be used to identify ob- jects of interests for manipulation by an assistive robot. An intelligent wheelchair could also utilize 3D PoG as a primary data modality for assisted navigation. Furthermore, existing systems tend to lack mobility, and the mobile 3D PoG tracking systems that have been pro- posed in literature suffer from their own limitations. The head-mounted multi-camera system presented in [4], for ex- ample, gives the 3D PoG relative to the user’s frame of refer- ence, but does not map this point to the user’s environment. Finally, the high monetary cost and proprietary nature of commercial eye tracking equipment limits widespread use. This has led to interest in the development of low-cost solu- tions using off-the-shelf components. We propose a novel head-mounted system that addresses the limitations of current solutions. First, an eye tracking camera is used to estimate the 2D PoG. An inexpensive RGB-D scene camera is then used to acquire a 3D represen- tation of the environment structure. Finally, we provide a process by which the 2D PoG is transformed to 3D coordi- nates. 2. EYE TRACKING CAMERA The system eye tracking feature is accomplished using an embedded USB camera module equipped with an infrared pass filter. The user’s eye is illuminated with a single in- frared LED to provide consistent image data in various am- bient lighting conditions. The LED also produces a corneal reflection on the user’s eye, which can be seen by the camera and exploited to enhance tracking accuracy. The LED was chosen according to the guidelines discussed in [2] to ensure that the device could be used safely for indefinite periods of time. The image resolution of 640x480 pixels and frame rate of 30 Hz facilitate accurate tracking of the pupil and corneal reflection using image processing techniques which are further discussed in section 4. The eye camera is positioned such that the image frame is centered in front of one of the user’s eyes. The module can be moved from one side of the headset frame to the other so that either eye can be used (to take advantage of user preference or eye dominance), while fine adjustments to the camera position and orientation are possible by manipulating the flexible mounting arm. 3. SCENE RGB-D CAMERA Information about the environment in front of the user is provided by a forward facing RGB-D camera, the Asus XtionPRO Live. This device provides a 640x480 color im- age of the environment along with a 640x480 depth range image at a rate of 30 Hz. The two images are obtained from Copyright is held by the author/owner(s). ASSETS’12, October 22–24, 2012, Boulder, Colorado, USA. ACM 978-1-4503-1321-6/12/10. 283

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3D Point of Gaze Estimation Using Head-Mounted RGB-D Camerasvlm1.uta.edu/~cconly/publications/assets2012.pdf · Point of Gaze (PoG) estimation in assistive applications. The device

3D Point of Gaze Estimation Using Head-Mounted RGB-D Cameras

Christopher McMurrough, Christopher Conly, Vassilis Athitsos, Fillia MakedonDepartment of Computer Science and Engineering

The University of Texas at Arlington Arlington, Texas

mcmurrough, cconly, athitsos, [email protected]

ABSTRACT This paper presents a low-cost, wearable headset for 3D Point of Gaze (PoG) estimation in assistive applications. The device consists of an eye tracking camera and forward facing RGB-D scene camera which, together, provide an es­timate of the user gaze vector and its intersection with a 3D point in space. The resulting system is able to compute the 3D PoG in real-time using inexpensive and readily available hardware components.

Categories and Subject Descriptors H.5.2 [User Interfaces]: Input devices and strategies; I.4.8 [Image Processing and Computer Vision]: Scene Analysis—object recognition, range data, sensor fusion

General Terms Design, Human Factors, Experimentation

Keywords Eyetracking, assistive environments, multimodal systems, human-computer interaction

1. INTRODUCTION Eye gaze interaction has been shown to be highly bene­

ficial to people with physical disabilities. In the case study presented in [3], 16 amyotrophic lateral sclerosis (ALS) pa­tients with severe motor disabilities (loss of mobility, inabil­ity to speak, etc.) were introduced to eye tracking devices during a 1-2 week period. Several patients reported a clear positive impact on their quality of life, resulting from en­hanced communication facilitated by the eye tracking de­vices. While the utility of gaze interaction has been demon­

strated, existing eye gaze systems suffer from some limit­ing constraints. In general, they are designed for interaction with fixed computer displays or 2D scene images, and the 2D PoG of these systems does not directly translate into the 3D world. An accurate estimate of the 3D user PoG within an environment is clearly useful, as it can be used to detect user attention and intention to interact [1]. For example, knowledge of the user 3D PoG could be used to identify ob­jects of interests for manipulation by an assistive robot. An

intelligent wheelchair could also utilize 3D PoG as a primary data modality for assisted navigation.

Furthermore, existing systems tend to lack mobility, and the mobile 3D PoG tracking systems that have been pro­posed in literature suffer from their own limitations. The head-mounted multi-camera system presented in [4], for ex­ample, gives the 3D PoG relative to the user’s frame of refer­ence, but does not map this point to the user’s environment. Finally, the high monetary cost and proprietary nature of commercial eye tracking equipment limits widespread use. This has led to interest in the development of low-cost solu­tions using off-the-shelf components.

We propose a novel head-mounted system that addresses the limitations of current solutions. First, an eye tracking camera is used to estimate the 2D PoG. An inexpensive RGB-D scene camera is then used to acquire a 3D represen­tation of the environment structure. Finally, we provide a process by which the 2D PoG is transformed to 3D coordi­nates.

2. EYE TRACKING CAMERA The system eye tracking feature is accomplished using an

embedded USB camera module equipped with an infrared pass filter. The user’s eye is illuminated with a single in­frared LED to provide consistent image data in various am­bient lighting conditions. The LED also produces a corneal reflection on the user’s eye, which can be seen by the camera and exploited to enhance tracking accuracy. The LED was chosen according to the guidelines discussed in [2] to ensure that the device could be used safely for indefinite periods of time. The image resolution of 640x480 pixels and frame rate of 30 Hz facilitate accurate tracking of the pupil and corneal reflection using image processing techniques which are further discussed in section 4.

The eye camera is positioned such that the image frame is centered in front of one of the user’s eyes. The module can be moved from one side of the headset frame to the other so that either eye can be used (to take advantage of user preference or eye dominance), while fine adjustments to the camera position and orientation are possible by manipulating the flexible mounting arm.

3. SCENE RGB-D CAMERA Information about the environment in front of the user

is provided by a forward facing RGB-D camera, the Asus XtionPRO Live. This device provides a 640x480 color im-age of the environment along with a 640x480 depth range image at a rate of 30 Hz. The two images are obtained from

Copyright is held by the author/owner(s). ASSETS’12, October 22–24, 2012, Boulder, Colorado, USA. ACM 978-1-4503-1321-6/12/10.

283

Page 2: 3D Point of Gaze Estimation Using Head-Mounted RGB-D Camerasvlm1.uta.edu/~cconly/publications/assets2012.pdf · Point of Gaze (PoG) estimation in assistive applications. The device

individual imaging sensors and registered by the device such that each color pixel value is assigned actual 3D coordinates in space. This provides a complete scanning solution for the environment in the form of 3D ”point clouds”, which can be further processed in software. The completed headset is shown in Figure 1.

Figure 1: Headset with eye and scene cameras

4. POINT OF GAZE ESTIMATION An estimate of the user PoG is computed using a modi­

fied version of the starburst algorithm presented in [6]. This algorithm creates a mapping between pupil positions and 2D scene image coordinates after a 9 point calibration pro­cedure is performed. During the pupil detection phase of the algorithm, an ellipse is fitted to the pupil such that the ellipse center provides an accurate estimate of the pupil cen­ter. The center of the infrared corneal reflection is detected during the next phase of the algorithm, which is then com­pared to the pupil center to acquire a difference vector. The resulting difference vector is then used to interpolate the 2D PoG in the scene camera, as shown in Figure 2. The 3D PoG can be obtained easily from the 2D point by looking up the 3D coordinates of the pixel in the point cloud data struc­ture provided by the RGB-D camera. Exploitation of the RGB-D point cloud structure removes the need for stereo eye tracking during 3D PoG estimation as used in [4, 5].

5. CONCLUSIONS & FUTURE WORK The resulting headset provides valuable information on

user intent to designers of assistive systems. The low-cost approach will enable the inclusion of 3D PoG in a wide va­riety of applications. Future work will explore the use of 3D PoG for control of electric wheelchairs and service robots in assistive environments.

6. REFERENCES [1] D. Milner and M. Goodale. The Visual Brain in

Action. Oxford University Press, Oxford, UK, 2nd edition, 2006.

(a) Eye image with difference vector

(b) Scene image annotated with PoG

Figure 2: Mapping of gaze vector to scene

[2] F. Mulvey, A. Villanueva, D. Sliney, R. Lange, S. Cotmore, and M. Donegan. D5 . 4 Exploration of safety issues in Eyetracking. Technical report, Communication by Gaze Interaction (COGAIN), 2008.

[3] V. Pasian, F. Corno, I. Signorile, and L. Farinetti. The Impact of Gaze Controlled Technology on Quality of Life. In Gaze Interaction and Applications of Eye Tracking: Advances in Assistive Technologies, chapter 6, pages 48–54. IGI Global, 2012.

[4] F. Pirri, M. Pizzoli, and A. Rudi. A general method for the point of regard estimation in 3D space. In CVPR 2011, pages 921–928. IEEE, June 2011.

[5] K. Takemura, Y. Kohashi, T. Suenaga, J. Takamatsu, and T. Ogasawara. Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications - ETRA ’10, volume 1, page 157, New York, New York, USA, 2010. ACM Press.

[6] D. Winfield and D. Parkhurst. Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Workshops, 3:79–79, 2005.

284