video-based tracking of user’s motion for augmented desk ...ysato/papers/sato-fg04.pdf · the...

5
Video-Based Tracking of User’s Motion for Augmented Desk Interface Yoichi Sato Kenji Oka Institute of Industrial Science The University of Tokyo 4-6-1 Komaba, Meguro-ku Tokyo 153-8505, Japan ysato,oka @iis.u-tokyo.ac.jp Hideki Koike Graduate School of Information Systems University of Electro-Communications 1-5-1 Chofugaoka, Chofu Tokyo 182-8585, Japan [email protected] Yasuto Nakanishi Department of Computer, Information and Communication Science Tokyo University of Agriculture and Technology 2-24-16 Naka-cho, Koganei Tokyo 184-8585, Japan [email protected] Abstract This paper presents an overview of our project on an augmented desk interface system called the EnhancedDesk. The EnhancedDesk is equipped with an infrared camera for tracking a users’ hand motions, a color video camera for recognizing objects, and projectors for displaying various kinds of digital information onto the desk. The key techni- cal innovations of the EnhancedDesk include: fast and ac- curate tracking of multiple fingers for direct manipulation of both real and projected objects with bare hands, inter- active object registration and recognition with hand ges- tures, vision-based tracking of a user’s gaze, and addition of interactive functionality overlaid on the desk. This pa- per presents a brief description of the components, as well as some prototypes of applications of our augmented desk interface system. 1. Introduction Graphical user interface (GUI) is commonly used as a standard interface on personal computers. Users are able to employ various kinds of applications on a computer using an efficient interface that GUI provides. While GUI is well- matured, many users realize that its capability is rather lim- ited when they perform some tasks by combining physical objects such as paper documents on a desk with computer applications. This limitation comes from the lack of seam- less integration of two entities of different types, i.e., phys- ical objects and computer applications. In order to perform a wide variety of tasks in office and classroom environments, we often interact with not only tangible objects such as paper-based materials, but also computer simulations and electronic-media databases. This richness of semi-connected contents burdens us with a load of media synchronization. For example, the overhead for ac- cessing a computer simulation while reading a printed text- book often causes a shift in our focus of attention, and this tends to disrupt our trend of thought. We believe an augmented desk interface system provides a promising direction to cope with such inefficiency that is inherent in using both real and digital media simultaneously. An ordinary textbook can be used more efficiently, if sup- porting multimedia contents are provided appropriately by using an augmented desk interface system. This motivates us to investigate what kind of computer support is neces- sary for users to perform various kinds of activities on a desk. Figure 1 shows one such example called the Interactive Textbook. When a user opens a page of a paper-based ma- terial with matrix code, the system instantly recognizes the code and projects the corresponding digital content onto the desk. For instance, when a student reaches the page describing a mass-spring experiment, the system automat- ically projects the digital content of computer graphics sim- ulation of the mass-spring experiment onto the right side of the textbook. The student can manipulate the projected mass by his or her hand and observe the dynamic behavior Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE

Upload: others

Post on 12-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Video-Based Tracking of User’s Motion for Augmented Desk ...ysato/papers/Sato-FG04.pdf · The augmented desk system treats a real desktop as a substitution for computer screen

Video-Based Tracking of User’s Motion for Augmented Desk Interface

Yoichi Sato Kenji OkaInstitute of Industrial Science

The University of Tokyo4-6-1 Komaba, Meguro-ku

Tokyo 153-8505, Japan�ysato,oka�@iis.u-tokyo.ac.jp

Hideki KoikeGraduate School of Information SystemsUniversity of Electro-Communications

1-5-1 Chofugaoka, ChofuTokyo 182-8585, Japan

[email protected]

Yasuto NakanishiDepartment of Computer, Information and Communication Science

Tokyo University of Agriculture and Technology2-24-16 Naka-cho, Koganei

Tokyo 184-8585, [email protected]

Abstract

This paper presents an overview of our project on anaugmented desk interface system called the EnhancedDesk.The EnhancedDesk is equipped with an infrared camera fortracking a users’ hand motions, a color video camera forrecognizing objects, and projectors for displaying variouskinds of digital information onto the desk. The key techni-cal innovations of the EnhancedDesk include: fast and ac-curate tracking of multiple fingers for direct manipulationof both real and projected objects with bare hands, inter-active object registration and recognition with hand ges-tures, vision-based tracking of a user’s gaze, and additionof interactive functionality overlaid on the desk. This pa-per presents a brief description of the components, as wellas some prototypes of applications of our augmented deskinterface system.

1. Introduction

Graphical user interface (GUI) is commonly used as astandard interface on personal computers. Users are able toemploy various kinds of applications on a computer usingan efficient interface that GUI provides. While GUI is well-matured, many users realize that its capability is rather lim-ited when they perform some tasks by combining physicalobjects such as paper documents on a desk with computerapplications. This limitation comes from the lack of seam-

less integration of two entities of different types, i.e., phys-ical objects and computer applications.

In order to perform a wide variety of tasks in office andclassroom environments, we often interact with not onlytangible objects such as paper-based materials, but alsocomputer simulations and electronic-media databases. Thisrichness of semi-connected contents burdens us with a loadof media synchronization. For example, the overhead for ac-cessing a computer simulation while reading a printed text-book often causes a shift in our focus of attention, and thistends to disrupt our trend of thought.

We believe an augmented desk interface system providesa promising direction to cope with such inefficiency that isinherent in using both real and digital media simultaneously.An ordinary textbook can be used more efficiently, if sup-porting multimedia contents are provided appropriately byusing an augmented desk interface system. This motivatesus to investigate what kind of computer support is neces-sary for users to perform various kinds of activities on adesk.

Figure 1 shows one such example called the InteractiveTextbook. When a user opens a page of a paper-based ma-terial with matrix code, the system instantly recognizes thecode and projects the corresponding digital content ontothe desk. For instance, when a student reaches the pagedescribing a mass-spring experiment, the system automat-ically projects the digital content of computer graphics sim-ulation of the mass-spring experiment onto the right sideof the textbook. The student can manipulate the projectedmass by his or her hand and observe the dynamic behavior

Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE

Page 2: Video-Based Tracking of User’s Motion for Augmented Desk ...ysato/papers/Sato-FG04.pdf · The augmented desk system treats a real desktop as a substitution for computer screen

of the spring and the mass. Similarly, when a student opensa page describing the pendulum experiment, computer gen-erated simulation is projected onto the desk and the studentcan drag the pendulum to see its dynamic behavior.

Figure 1. Interactive Textbook providesseamless integration of a textbook and asso-ciated multimedia contents via direct manip-ulation with bare hands

In this paper, we first describe the overview of our aug-mented desk interface system called the EnhancedDesk aswell as some of the most closely related works in thefield. Then, we briefly explain our vision-based methods fortracking users’ hand and head motions reliably even in un-controlled environments.

2. Augmented desk interface system

The augmented desk system treats a real desktop asa substitution for computer screen. In the past, differenttypes of the augmented desk systems have been proposed[7, 12, 17, 1, 4, 11, 13, 16]. The pioneering work called Dig-italDesk by Wellner[17] introduced a basic structure of anaugmented desk system composed of a desk, a video cam-era, a computer, and a video projector. The video cameracaptures images on the desk. The computer performs im-age processing with the images from the camera, and gen-erates appropriate output images. Then the video projectorprojects the output images onto the desktop. The advantagesin using the augmented desk system to achieve smooth inte-gration of paper and digital information are summarized asfollows:

� Automatic retrieval and display of digital informationby recognizing real objects: When a user puts a doc-ument on the desk, the system first recognizes whatthat document is. Then the system automatically re-trieves the corresponding digital information from thedatabase and displays it on the desktop.

� Direct manipulation of digital information as well asreal objects on the desk.

InteractiveDesk[1] used a one-dimensional bar code tolink from a real paper folder to email or Web pages relatedto the documents in the folder. Robinson et al. also used a1D bar code to link from a printed Web page to the origi-nal Web page[13]. In these works, interaction with digitalinformation was with the traditional mouse and keyboard.Although use of the bar code in the work above is simi-lar to ours, our 2-D matrix code can offer the paper’s sizeand orientation information. MetaDesk[4] used real objects(Phicons) to manipulate digital information such as elec-tronic maps. Rekimoto’s Augmented Surfaces[11] enabledusers to smoothly interchange digital information amongPCs, tables, walls, and so on. However, automatic detec-tion of user’s hands and fingers was not explored in eithersystem.

Inspired by the DigitalDesk, we have developed an aug-mented desk interface system called the EnhancedDesk [6].While it has similar design to DigitalDesk, the Enhanced-Desk has a distinct advantage over previously proposed sys-tems in that it allows users, with their own hands and fin-gers, to perform various kinds of tasks by manipulating bothphysical objects and electronically displayed objects simul-taneously. This leads to more natural and intuitive interac-tion between both physical and synthetic objects. The setupof EnhancedDesk is shown in Figure 2.

The EnhancedDesk is equipped with an infrared camerafor tracking users’ hand motions, a color video camera forrecognizing objects, and two LCD projectors for both front-projection and rear-projection of various kinds of digital in-formation onto the desk. In addition, our system makes useof an infrared camera for measuring and recognizing mo-tions of users !Ghands. In most of the previously proposedmethods, image regions corresponding to human skin aretypically extracted either by color segmentation or by back-ground image subtraction to monitor a user !Gs activities.However, in the case of our augmented desk interface sys-tem, it is difficult to employ these techniques because theobserved color of human skin and image backgrounds con-tinuously changes due to the projection by LCD projectors.The use of an infrared camera helps us avoid this difficulty.By adjusting the infrared camera to measure a range of hu-man body temperature, image regions which correspond tohuman skin can be correctly identified by binarization ofthe input image with a proper threshold value even in com-

Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE

Page 3: Video-Based Tracking of User’s Motion for Augmented Desk ...ysato/papers/Sato-FG04.pdf · The augmented desk system treats a real desktop as a substitution for computer screen

plex backgrounds conditions under dynamically changingillumination (Figure3).

infraredcamera

color CCDcamera

plasmadisplay

LCD projector(front projection)

LCD projector(rear projection)

Figure 2. Overview of our augmented desk in-terface system called EnhancedDesk

Figure 3. Color and infrared images of a hand

3. Real-time tracking of hands and fingers

One of the key innovations of the EnhancedDesk is accu-rate, real-time hand and fingertip tracking that allows a userto seamlessly integrate both real objects and associated dig-ital information. In particular, we have developed a methodfor discerning fingertip locations in image frames and mea-suring fingertip trajectories across image frames.

Our vision-based method consists of the following steps.First, our system detects fingertips in each image frame bymaking use of its geometrical features. Then a filtering tech-nique based on the Kalman filter is used for predicting fin-

gertip locations in successive image frames. Finally, the cor-respondences between the predicted locations and detectedfingertips are examined to determine fingertips locations.

Using the proposed framework, we are able to obtainmultiple fingertips’ trajectories in real time with high accu-racy. Even on a complex background under changing light-ing conditions, our system can track multiple fingertips re-liably without using any invasive devices or color markerson users’ hands. More detailed explanation of our trackingmethod can be found in [14, 8, 9].

Figure 4. Real-time tracking of multiple fin-gers

To demonstrate our system’s capability of tracking mul-tiple fingertips in real time, we have developed some proto-types of applications of the EnhancedDesk. Figure 5 showsone such example application. This two-handed drawingtool [18] assigns different roles to each hand. After select-ing radial menus with the left hand, users draw or select ob-jects to be manipulated with the right hand. For example, tocolor an object, a user selects the color menu with the lefthand and indicates the object to be colored with the righthand. The system also uses gesture recognition to let usersdraw objects such as circles, ellipses, triangles, and rectan-gles and directly manipulate them using the right hand andfingers.

4. Real-time tracking of faces in 3D

Users’ attention also plays an important role in design-ing human-computer interfaces that can be used effectivelyand intuitively in real environments. Since users’ attentioncorrelates well with their gaze points, real-time sensing of agaze point is considered to be one of the key components forhuman-computer interfaces. This motivated us to develop a

Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE

Page 4: Video-Based Tracking of User’s Motion for Augmented Desk ...ysato/papers/Sato-FG04.pdf · The augmented desk system treats a real desktop as a substitution for computer screen

Figure 5. Two-handed drawing tool

new vision-based method for estimating the 3D pose, i.e.,position and orientation, of a user’s face in real-time.

A number of tracking techniques based on computer vi-sion have been proposed by other researchers in the past.Among such techniques, the ones based on particle filtering[3] are known to be able to handle challenging situationswith clutter, occlusion, and noise. This advantage is essen-tial for HCI applications because, in those applications, asystem needs to handle unexpected sudden motion by a userin dynamically changing environments. Accordingly, someof the previously proposed methods in HCI have utilizedparticle filtering for estimating the 3D pose of a user’s face[15, 2].

For HCI applications, the following two aspects haveto be realized simultaneously in addition to the robustnessagainst clutters and occlustions: dealing with abruct motionof a user’s head, and estimating the 3D pose of a user’s headaccurately when the user is staring at a point in a scene.To achieve those two aspects, we have proposed to use im-age inputs from multiple cameras for estimating the 3Dpose of a user’s face reliably in real-time. Our proposed ap-proach consists of two parts: automatic initialization of the3D model of a user’s face with multiple feature points, andtracking the face in consecutive image frames based on par-ticle filtering. The key component of our proposed methodis adaptive control of diffusion factors in a motion model ofa user’s head motion used in particle filtering. This makesour method robust against sudden abrupt motion and, at thesame time, capable of estimating the pose with high accu-racy.

The current implementation of our method runs at 30frames/sec on a Linux PC with Pentium4 2.8GHz for pro-cessing non-interlaced stereo images of the size of �������pixels. Figure 6 shows some tracking results of using ourmethod. Estimated poses of the user’s face are shown withcross markers corresponding to the facial features used for

pose estimation. As we can see from the images in the thirdrow, our method is able to track a user’s face even in chal-lenging situations such as partial occlusion and illuminationchange.

For quantitative evaluation of our method’s tracking per-formance, a user’s face was tracked in a pre-recoded im-age sequence for a duration of 10 seconds. In order to ob-tain the ground truth for the pose of the user’s face in theimage sequence, the pose was directly measured by usinga Polhemus FASTRAK electromagnetic sensor attached tothe user’s head. Figure 7 shows the tracking result. In thisfigure, the thick solid lines labeled Est. show the pose esti-mated by our method, and the dotted lines labeled GT showthe ground truth measured with the FASTRAK sensor. Also,the thin solid lines labeled Const. show the pose estimatedwithout adaptive control of the diffusion factor. As we cansee in this figure, our method is capable of tracking a user’sface even when it is moving rapidly. On the other hand, itcan be seen from this figure that estimation results with-out adaptive diffusion control were significantly less accu-rate.

Figure 6. Real-time tracking of a user’s facein stereo images

5. Conclusions

This paper presented an overview of our project calledthe EnhancedDesk. The main goal of this project is to ex-plore an augmented desk interface system with real-timesensing capability of users’ motion for providing seamlessintegration of real objects and associated digital contents.We described two key components of our system: a tech-nique for tracking multiple hands and fingers and a tech-nique for tracking a user’s face in 3D. We have demon-strated our system’s tracking capability by developing some

Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE

Page 5: Video-Based Tracking of User’s Motion for Augmented Desk ...ysato/papers/Sato-FG04.pdf · The augmented desk system treats a real desktop as a substitution for computer screen

-20

-10

0

10

20

30

roll [

de

g.]

Est.Rl

GTRl

-30

-20

-10

0

10

#frame

pitch

[d

eg

.]

-20

-10

10

20

yaw

[deg.]

-150

-100

-50

0

50

100

150

200

po

sitio

n [

mm

]

6030 90 120 150 180 210 240 270

30 90 120 180 210 240 270

60 180 270

6030 150

Const.Rl

Est.Yw

GTYw

Const.Yw

Est.Ph

GTPh

Const.Ph

Figure 7. Improved tracking performancewith adaptive diffusion control

prototypes of applications in the EnhancedDesk. With rapidadvances of computer vision and increase of computationalpower of ordinary PCs, we believe that vision-based track-ing of user’s motion offers an promising direction not onlyfor building prototype systems but also for developing realapplications with novel man-machine interface.

Acknowledgement

The authors thank Yoko Ishii, Chen Xinlei and YoshinoriKobayashi for their contributions to this project. This workwas supported in part by Grants-in-Aid for Scientific Re-search from the Ministry of Education, Culture, Sports, Sci-ence and Technology of Japan (No. 13224051). We wouldlike to thank the Omron Corporation for providing us theOKAO vision library that is used in the initialization step ofour face tracking technique.

References

[1] T. Arai, K. Machi, and S. Kuzunuki, “Retrieving electronicdocuments with real-world objects on interactivedesk,” Proc.ACM Symposium on User Interface Software and Technol-ogy (UIST’95), pp. 37-38, 1995.

[2] B. Braathen, et al., “An approach to automatic recognitionof spontaneous facial actions,” Proc. FG 2002, pp. 360-365,2002.

[3] M. Isard and A. Blake, “Condensation– conditional densitypropagation for visual tracking,” IJCV, Vol. 29, No. 1, pp. 5-28, 1998.

[4] H. Ishii and B. Ullmer, “Tangible bits: towars seamless inter-face between people, bits and atoms,” Proc. ACM Conf. Hu-man Factors in Computing System (CHI’97), pp. 234-241,1997.

[5] Y. Ishii, Y. Nakanishi, H. Koike, K. Oka, and Y. Sato, “En-hancedMovie: movie editing on an augmented desk,” Proc.the Fifth Int. Conf. Ubiquitous Computing (UbiComp03), pp.153-154, October 2003.

[6] H. Koike, Y. Sato, and Y. Kobayashi, “Integrating paper anddigital information on EnhancedDesk: a method for real-time finger tracking on augmented desk system,” ACM Trans.Computer-Human Interaction, Vol. 8, No. 4, pp. 307-322,December 2001.

[7] M. Kruger, Artificial Reality 2nd ed., Addison-Wesley, Read-ing, MA, 1991.

[8] K. Oka, Y. Sato, and H. Koike, “Real-time tracking of mul-tiple fingers and gesture recognition for augmented desk in-terface systems,” Proc. IEEE Int. Conf. Automatic Face andGesture Recognition (FG2002),

pp. 429-434, 2002.[9] K. Oka, Y. Sato, and H. Koike, “Real-time fingertip tracking

and gesture recognition” IEEE Computer Graphics and Ap-plications, Vol. 22, No. 6, pp. 64-71, November/December2002.

[10] K. Oka, I. Sato, Y. Nakanishi, Y. Sato and H. Koike, “Interac-tion for entertainment contents based on direct manipulationwith bare hands,” Proc. 2002 IFIP/IPSJ Int. Workshop onEntertainment Computing (IWEC2002), pp. 391-398, May2002.

[11] J. Rekimoto and M. Saito, “Augmented surfaces: a spa-tially continuous work space for hybrid computing environ-ments,” Proc. ACM Conf. Human Factors in Computing Sys-tem (CHI’99), pp. 378-385, 1999.

[12] W. Mackay, “Augmenting reality: adding computational di-mensions to paper,” Communications of ACM, Vol. 36, No.7, pp. 96-97, 1993.

[13] J. A. Robinson and C. Robertson, “The LivePaper system:augmenting paper on an enhanced tabletop,” Computer &Graphics, Vol. 25, No. 5, pp. 731-743, 2001.

[14] Y. Sato, Y. Kobayashi, and H. Koike, “Fast tracking of handsand fingertips in infrared images for augmented desk inter-face,” Proc. IEEE Int. Conf. Automatic Face and GestureRecognition (FG2000),

pp. 462-467, March 2000.[15] J. Sherrah and S. Gong, “Fusion of perceptual cues for ro-

bust tracking of head pose and position,” Pattern Recogni-tion, Vol. 34, No. 8, 2001.

[16] N. Takao, J. Shi and S. Baker, “Tele-Graffiti: a camera-projector based remote sketching system with hand-baseduser interface and automatic session summarization,” Int. J.Computer Vision, Vo. 53, No. 2, pp. 115-133, July 2003.

[17] P. Wellner, “Interacting with the paper on the DigitalDesk,”Communications of ACM, Vol. 36, No. 7, pp. 87-96, 1993.

[18] X. Chen, H. Koike, Y. Nakanishi, K. Oka, and Y. Sato, “Two-handed drawing on augmented desk system,” Proc. 2002 Int.Working Conf. Advanced Visual Interfaces (AVI2002), pp.219-222, May 2002.

Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE