[ieee 2013 36th international conference on telecommunications and signal processing (tsp) - rome,...

� Abstract—Any object recognition approach has a feature

extraction/selection stage. Features should be carefully selected because they are used to object representation. The purpose of this work is to find good sparse features which can be used further to detect fingers and recognize hand gestures. The proposed features are edges, lines and salient regions extracted with Kadir and Brady detector. These features can be combined to define a large set of hand postures.

Keywords—Edge, hand gesture, hough transform, sparse.

I. INTRODUCTION HE task of selecting relevant features in classification problems can be viewed as one of the most fundamental problems in the field of machine learning. The ideal

features are not affected by occlusion and clutter, there are invariant (or covariant), there are also robust, which means that noise, blur, discretization, compression, etc. do not have a big impact on them. From the distinctive point of view individual features can be matched to a large database of objects; from the quantitative point of view many features can be generated for even small objects and offer a precise localization.

There is a wide variety of interest points and corners detectors in literature. According to [1] they can be divided into three categories: contour based, intensity based and parametric model based methods.

Contour based methods have existed for a long time [2], [3] and first extract contours, then search for maximal curvature or inflexion points along the contour chains, or do some polygonal approximation and then search for intersection points. Intensity based methods compute [4], [5], [6] a measure that indicates the presence of an interest point directly from the grey values. Parametric model methods fit a parametric intensity model to the signal. Usually these methods provide sub-pixel accuracy, but unfortunately are limited to specific types of interest points. The paper is organized as follows: second chapter introduces related works in feature selection with regards to the hand

Manuscript received February 21, 2013. This paper was supported by the project "Development and support of multidisciplinary postdoctoral programmes in major technical areas of national strategy of Research - Development - Innovation" 4D-POSTDOC, contract no. POSDRU/89/1.5/S/52603, project co-funded by the European Social Fund through Sectorial Operational Programme Human Resources Development 2007-2013

S. Georgiana is with “Politehnica” University of Timisoara / Department of Applied Electronics, Timi�oara, Romania (email: [email protected] );

C. C�talin-Daniel is with “Politehnica” University of Timisoara / Department of Applied Electronics, Timi�oara, Romania (email: [email protected]).

gesture; the third chapter we proposed several features to be use in hand gesture recognition; the fourth chapter presents the proposed algorithm for finger detection and in five we present the experimental results and the conclusion.

II. RELATED WORK There are a multitude of features used for hand gesture

recognition. Hand silhouette is among the simplest, yet most frequently used features. Silhouettes are easily extracted from local hand and arm images in the restrictive background setups. By silhouette, it is meaning the outline of the hand provided by segmentation algorithms, or equivalently the partitioning of the input image into object and background pixels. In the case of complex backgrounds, techniques that employ color histogram analyses can be used. In order to estimate the hand pose the position and orientation of the hand, fingertip locations, and finger orientation from the images is extracted. The center of gravity (or centroid) of the silhouette [7] is one choice, but it may not be very stable relative to the silhouette shape due to its dependence on the finger positions. The point having the maximum distance to the closest boundary edge [8-10], has been argued to be more stable under changes in silhouette shape.

A frequently used feature in gesture analysis is the fingertip. Fingertip detection can be handled by correlation techniques using a circular mask [9], [11-12], which provides rotation invariance, or fingertip templates extracted from real images [13]. Another common method to detect the fingertips and the palm-finger intersections [14-16] is to use the curvature local maxima on the boundary of the silhouette. For the curvature-based methods, in case of noisy silhouette contours, the sensitivity to noise can be an issue. In [8], a more reliable algorithm based on the distance of the contour points to the hand position is utilized. The local maximum of the distance between the hand position and farthest boundary point at each direction gives the fingertip locations. The direction of the principal axis of the silhouettes [17- 18] can be used to estimate the finger or 2D hand orientation. All these features can be tracked across frames to increase computation speed and robustness using Kalman filters [8], [19] or heuristics that determine search windows in the image [15], [20] based on previous feature locations or rough planar hand models. The low computational complexity of these methods enables real-time implementations using conventional hardware but their accuracy and robustness are arguable. These methods rely on high quality segmentation lowering their chance of being applied on highly cluttered backgrounds. Failures can be expected in some cases such as two fingers touching each other or out of plane hand rotations.

Sparse Feature for Hand Gesture Recognition: A Comparative Study Simion Georgiana and C�t�lin-Daniel C�leanu

T

858978-1-4799-0404-4/13/$31.00 ©2013 IEEE TSP 2013

Jennings et al. [21] have used a more elaborated method by tracking the features directly in 3D using 3D models; their method has employed range images, color, and edge features extracted from multiple cameras to track the index finger in a pointing gesture. Very robust tracking results over cluttered and moving.

III. FINDING GOOD FEATURES FOR A SPARSE FRAMEWORK It is very important which features are used for the sparse

hand posture representation. The potential benefits of feature selection include, first and foremost, better accuracy of the inference engine and improved scalability. Secondary benefits include better data visualization and understanding, reduced measurement and storage requirements, and reduced training and inference time.

The first question is how hand can be represented in order to be decided which image locations had to be captured and which to dispose of. The main idea is that each hand posture can be described by: the V shapes between the fingers when these are apart, the curve shapes which correspond to the fingertips and the straight lines for the finger length. Each hand pose can be defined as a combination of these shapes. Based on the number of V shapes, curves and lines and based on the relations among them, the hand pose can be recognized. The second question is how these relevant image regions can be represented.

In this paper the Canny edge detector is proposed to extract the hand contour. First the RGB image is converted in to a gray scale image. In order to detect all useful parts of the

a) RGB image

b) Canny 0.05 threshold

c) The hand contour after being processed.

Fig. 1. Good hand contour.

hand contour it is necessary to use a high sensibility threshold. Using a high sensibility threshold, the details from the hand are also detected. These details are not useful for the next processing steps. In order to discard them, a function which counts the number of pixels from a connected object is used. The result can be seen in Fig. 1 c).

Next step was to detect the straight lines for the finger length. Based on the hand contour obtained in the previous stage and using the Hough transform [22] we have the results which are presented in Fig. 2. The Hough transform is designed to detect lines, using the parametric representation of a line:

rho = x*cos(theta) + y*sin(theta) The variable rho is the distance from the origin to the line

along a vector perpendicular to the line; theta is the angle between the x-axis and this vector. The hough function which can be found in Matlab implements the Standard Hough Transform (SHT). The hough function generates a parameter space matrix whose rows and columns correspond to these rho and theta values, respectively. The houghpeaks function finds peak values in this space, which represent potential lines in the input image. The houghlines function finds the endpoints of the line segments corresponding to peaks in the Hough transform, and it automatically fills in small gaps.

Kadir and Brady detector suitable for finding circular structures is proposed to extract the fingertips and the V shapes between the fingers when these are apart. The Kadir and Brady detector [23] searches for scale localized features with high entropy, with the constraint that the scale is isotropic. The algorithm is suitable for finding circular structures. The algorithm generates a space sparsely populated with scalar saliency values. For each pixel location and for each scale value between a minimum and a maxim the local descriptors value is measured within a window; then the PDF from this is estimated and the local entropy is calculated. The scale which conducts to the peaked entropy is selected. A final clustering of the candidates in scale space does then

Fig. 2. Straight lines detected on the hand contour.

859

Fig. 3. Results for the Kadir and Brady detector.

yield a set of interest points. This method finds regions that are salient over both location and scale.

IV. THE PROPOSED ALGORITHM In this section the proposed algorithm for finger detection,

based on and edge detection and line detection, is introduced. The framework diagram can be seen in Fig. 4.

First the RGB image is converted in a grayscale image in order to extract the hand contour using Canny edge detector. The detected edges are then filtered using a function which counts the number of pixels from a connected object. If the number of pixels is smaller than the predefined threshold, the object is discarded. Once the hand contour is processed, the Hough transform is used to detect the straight lines on the hand contour. Initially a set of segments are detected. Not all segments detected by the Hough transform are connected, even if they are belonging to the same line. The algorithm checks if two segments are on the same line and if there are, it connects the two segments.

In order to connect two segments that belong to the same line the midpoint of each line segment is computed. The midpoints are connected and a new line segment is created.

The angle of each line segment is also computed. In the next step, the difference between the line segments angles and the length of the new line segment are processed. If these values are smaller than the predefined thresholds �� , , the two segments are connected.

For all segments, the length is computed. The ratio between two segments is also evaluated. If this ratio is smaller than � the considered segments are good candidates to define a finger. In order to evaluate the candidate segments, several operations are performed. The

Fig. 4. The framework diagram.

Fig. 5. A pair of two lines, the midperpendicular of the smaller line segment intersects the other line segment. midperpendicular of the smaller segment is drawn and the intersection point with the other segment is marked, as it is illustrated in Fig. 5. The Eq. 1 corresponds to the line defined by points A and B.

12

1

12

1

yyyy

xxxx

��

��

. (1)

The slope of line defined by points A and B is computed

according to Eq. 2.

21

21

xxyymAB �

�� (2)

The slope of [MN] is computed using Eq. 3.

21

12

yyxxmMN �

�� (3)

Eq. 4 is describes the line segment [MN].

RGB Image

Grayscale Image

Edge detection and processing

Line processing

Finger detection

Line detection based on HT

),( 11 yxA

),( 22 yxB ),( 44 yxD

),( 33 yxC

N

M

860

Fig. 6. Fingers detection.

nxyyxxy MM �

��

�21

12 (4)

In order to compute the coordinates of point N, which represents the intersection of midperpendicular of segment [AB] and line segment [CD], the following equation system must be solved.

� � ��

��

�

��

��

�

034433443

21

12

yxyxyxxxyy

nxyyxxy

(5)

Once the values of these coordinates are available, it is possible to determine the length of the line segment [MN]. Two lines define a finger if the angle between them is smaller than � and the distance between two lines d is in a certain range.

V. RESULTS AND CONCLUSIONS In order to test the algorithm, 150 images selected from

Cambrige [24] - Gesture database and Massay hand gesture database were used. All these images represent hand postures in which all five fingers are shown.

These pictures consist in hand postures indoor taken, while artificial light was added. The pictures resolution is 640×480 pixels and 320×240 pixels. The finger detection rate was about 96,66%. The value for the parameter � is set to be smaller than 15 degrees and � < 60 pixels. Parameters � was set to 2.5, � < 20 and d ].40,5[� .

In this work several sparse features are extracted and proved to be good features for hand gesture recognition in a sparse framework. The proposed features are used to detect the hand fingers. The hand contour is first extracted using Canny edge detector. In the next stage, the Hough transform is applied to extract the line segments from the hand contour. These line segments are analyzed and if is concluded that they belong to the same line, a connection operation is performed. Pairs of two line segments are processed, the distance and the angle between them is computed. Based on these values it is decided if they represent a finger. The method was tested on a set of 150 images and the detection rate was 96,66%. In the future, in order to define a larger set of hand postures, the information related to the fingers position and the region where the fingers intersect the palm will be used.

REFERENCES [1] C. Schmid, R. Mohr, C. Bauckhage, “Evaluation of interest point

detectors,” International Journal of Computer Vision, Feb. 2000. 37(2): p. 151–172.

[2] H. Asada, M.Brady, “The curvature primal sketch,”. EEE Transactions on Pattern Analysis and Machine Intelligence, 1986. 8(1): p. 2–14.

[3] F. Mokhtarian, R. Suomela., “Robust image corner detection through curvature scale space,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998. 20(12): p. 1376-1381.

[4] C. Tomasi, T. Kanade, “Detection and tracking of point features,” Technical Report, Carnegie Mellon University, 1991. CMU-CS-91-132.

[5] S.M. Smith, J.M Brady, “SUSAN—A new approach to low level image processing.” International Journal of Computer Vision, 1997. 23(1): p. 45–78.

[6] C. Harris, “A combined corner and edge detector”, in Alvey Vision Conf. 1988. p. 147-151.

[7] Y. Sato, M. Saito, H. Koik, “Real-time input of 3D pose and gestures of a user’s hand and its applications for HCI,” IEEE Virtual Reality Conference, IEEE Computer Society,13-17 March, Yokohama, Japan, 2001.

[8] Z. Mo, J.P Lewis,U. Neumann, “Smartcanvas: a gesture-driven intelligent drawing desk system,” In 10th International Conference on Intelligent User Interfaces, ACM Press, New York 2005: p. 239-243.

[9] K. Oka, Y. Sato, H. Koike, “Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems,” FifthIEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, USA, 21 May, 2002. p 429-434

[10] K. Abe, H. Saito, S. Ozawa, “3D drawing system via hand motion recognition from two cameras,” IEEE International Conference on Systems, Man, and Cybernetics, vol. 2, 8-11Octomber, Nashville, TN, USA 2000. p. 840–845.

[11] H. Koike, Y. Sato, Y. Kobayashi, “Integrating paper and digital information on enhanceddesk: a method for realtime finger tracking on an augmented desk system,” ACM Transactions on Computer-Human Interaction 2001, 8(4): p. 307–322.

[12] J. Letessier, F. Berard, “Visual tracking of bare fingers for interactive surfaces,” 17th Annual ACM symposium on User Interface Software and Technology,24-27October ,Santa Fe, New Mexico,USA, 2004: p. 119–122.

[13] R. O’Hagan, A. Zelinsky, “Finger track—a robust and real-time gesture interface,” 10th Australian Joint Conference on Artificial Intelligence, 30 November, Perth, Australia, 1997: p. 475–484.

[14] J. Segen, S. Kumar, “Gesture VR: Vision-based 3D hand interface for spatial interaction,” Sixth ACM International Conference on Multimedia,12-16 September, Bristol, England 1998: p. 455–464.

[15] R.G. O’Hagan, A. Zelinsky, S. Rougeaux, “Visual gesture interfaces for virtual environments,” Interacting with Computers 2002, 14: p. 231–250.

[16] S. Malik, J. Laszlo, “Visual touchpad: a two-handed gestural input device,” 6th International Conference on Multimodal Interfaces, 13-15 October, State College, USA, 2004: p. 289–296.

[17] V.I. Pavlovic, R. Sharma, T.S. Huang, “Invited speech: gestural interface to a visual computing environment for molecular biologists,” 2nd International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, 13-16 October, Killington, Vermont 1996.

[18] A. Utsumi, J. Ohya, “Multiple-hand-gesture tracking using multiple cameras,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 22-23 June, Ft Collins Colorado, USA 1999: p. 473–478.

[19] J. Martin, V. Devin, J.l Crowley, “Active hand tracking,” 3rd. International Conference on Face & Gesture Recognition, IEEE Computer Society, 14-16 April, Nara, Japan, 1998.

[20] C. Maggioni, B. Kammerer, “Gesturecomputer—history, design and applications,” Computer Vision for Human-Machine Interaction, Cambridge, 1998: p. 312–325.

[21] C. Jennings, “Robust finger tracking with multiple cameras,” International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 26-27 September, Corfu, Greece, 199. ,p 152-160

[22] C. Gonzalez. Rafael, Woods, E. Richard, Digital Image Processing. Prentice Hall, 1993, ISBN 0201508036.

[23] T. Kadir, M. Brady, “Saliency, Scale and Image Description,” International Journal of Computer Vision, 2001. 42(2): p. 83–105.

[24] ftp://mi.eng.cam.ac.uk/pub/CamGesData

861

[ieee 2013 36th international conference on telecommunications and signal processing (tsp) - rome,...

Documents