[ieee 2009 international conference on advances in recent technologies in communication and...

Principal Component Analysis for Gesture Recognition using SystemC

Solomon Raju Kota1, J.L.Raheja1 , Ashutosh Gupta1 1Digital Systems Group,

Central Electronics Engineering Research Institute (CEERI) /Council of Scientific and Industrial Research

(CSIR), Pilani-333031, India [email protected], [email protected]

Archana Rathi2, Shashikant Sharma3 2GJUS&T, Hisar-125001, India

3Icfai Institute of Sci &Tech., Jaipur-32022, India [email protected],

[email protected]

Abstract—An algorithm for optimizing the Principal Component Analysis in gesture recognition is proposed, which makes use of covariance between factors to reduce data dimensions. The objectivity and automatization of above manual observation is realized by algorithm. We present an approach for the detection and identification of human gestures and describe a working, near gesture recognition system and then recognize the person by comparing characteristics of the gesture to those of known individuals. Our approach treats gesture recognition as a two dimensional recognition problem, taking advantage of the fact that gestures are normally upright and thus may be described by a small set of 2-D characteristics values. With minimal additional effort PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structure that often underlie it. The proposed algorithm is implemented in SystemC language, with the intention to download on to a FPGA. The output of the system developed in SystemC consists of the gesture ID with closest match, as well as value representing how close this match is (Euclidean Distance value). As PCA has been mostly used for face recognition, this technique has been extended to gesture recognition and is quite fast, relatively simple, and has been shown to work well in a somewhat constrained environment.

Keywords— Holistic, Euclidean distance, Training pattern, Automated gesture recognition, Gaussian noise, Eigenvectors, Eigen values, Covariance.

I. INTRODUCTION Pattern recognition is the scientific discipline whose goal is

the classification of objects into a number of categories or classes. Depending on the application, these objects can be images or signal waveforms or any type of measurements that need to be classified.

Gesture recognition is an important aspect of pattern recognition. Gesture recognition has been a very popular research topic in recent years. It covers a wide variety of applications in commercial and law enforcement, including security system, personal identification, image and film processing, and human-computer interaction. A complete gesture recognition system should include two stages. The first stage is detecting the location and size of a “gesture”, which is difficult and complicated because of the unknown position, orientation and scaling of gestures in an arbitrary image. The second stage of a gesture recognition system involves recognition the target gestures obtained in the first stage. In order to design a good gesture recognition system, the features chosen for recognition play a crucial role. Especially, due to its nonintrusive characteristic, it emerges as a vivid research direction in biometric, which gains a lot of research effort recently. Gesture recognition technology falls into two main categories: feature-based and holistic. Feature-based approaches rely on the individual face features, such as eyes, nose and mouth, and the geometrical relationships among them. Holistic methods take the entire gesture into account.

II. FEATURES, FEATURE VECTORS, AND CLASSIFIERS Let us first simulate a simplified case “mimicking” a

medical image classification task. Fig.1 shows two images, each having a distinct region inside it. The two regions are also themselves visually different. We could say that the region of Fig. 1 (a) results from a benign lesion, class A, and that of Fig. 1 (b) from a malignant one (cancer), class B. We will further assume that these are not the only patterns (images) that are available to us, but we have access to an image database with a number of patterns some of the available database.

which are known to originate from class A and some from class B [1]. The first step is to identify the measurable quantities that make these two regions distinct from each other. Fig.2 shows a plot of the mean value of the intensity in each region of interest versus the corresponding standard deviation around this mean. Each point corresponds to a different image from

(a) (b)

Figure 1. Examples of image regions corresponding to (a) class A and (b) class B

2009 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-3845-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.177

732


978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.177

732


978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.177

732

It turns out that class A patterns tend to spread in a different area from class B patterns. The straight line seems to be a good candidate for separating the two classes. Let us now assume that we are given a new image with a region in it and that we do not know to which class it belongs. It is reasonable to say that we measure the mean intensity and standard deviation in the region of interest and we plot the corresponding point. This is shown by the asterisk (*) in Fig.2.

Figure 2. Plot of the mean value versus the standard deviation for a number of

different images originating from class A (0) and class B (+). In this case, a straight line separates the two classes.

Then it is sensible to assume that the unknown pattern is more likely to belong to class A than class B. The preceding artificial classification task has outlined the rationale behind a large class of pattern recognition problems. The measurements used for the classification, the mean value and the standard deviation in this case, are known as features. In the more general case l features xi , i = 1,2, . . . ,l. are used and they form the feature vector where x = [x1, x2, . . . , xl ]T

T denotes transposition. Each of the feature vectors identifies uniquely a single pattern (object). The straight line in Fig. 2 is known as the decision line, and it constitutes the classifier whose role is to divide the feature space into regions that correspond to either class A or class B. If a feature vector x, corresponding to an unknown pattern, falls in the class A region, it is classified as class A, otherwise as class B. The patterns (feature vectors) whose true class is known and which are used for the design of the classifier are known as training patterns (training feature vectors).

III. BACKGROUND AND RELATED WORK A lot of work in gesture recognition has focused on

detecting individual features such as the eyes, nose, mouth, and head outline, and defining a gesture model by the position, size and relationships among these features. Beginning with Bledsoe’s [2] and Kanade’s [3] early systems, a number of automated or semi-automated gesture recognition strategies have modeled and classified gestures based on normalized distances and ratios among feature points. Recently this general approach has been continued and improved by the recent work of Yuille el al. [4].

Zhao et al. [5] pointed out that both global and local features are crucial for gesture recognition. Methods combining both global and local information are more robust against variations of illumination, pose and facial expressions. However, detection of local features are also challenging and error-prone. The subblock decomposition provides a way to represent local information.

Research in human strategies of gesture recognition, moreover, has shown that individual features and their immediate relationships comprise an insufficient representation to account for the performance of adult human gesture identification. Nonetheless, this approach to gesture recognition remains the most popular one in the computer vision literature. Stonham’s WISARD system [6] has been applied with some success to binary gesture images, recognizing both identity and expression. Most connectionist systems dealing with gestures treat the input images as a general 2-D pattern, and can make no explicit use of the configurational properties of a face. Only very simple systems have been explored to date, and it is unclear how they will scale to larger problems.

Recent work by Burt et al. uses a “smart sensing” approach based on multi-resolution template matching [7]. This coarse-to-fine strategy uses a special-purpose computer built to calculate multi-resolution pyramid images quickly, and has been demonstrated identifying people in near- real-time.

IV. ALGORITHMS FOR GESTURE RECOGNITION There are several algorithms by which we can do the

Gesture Recognition, which are listed below:

• Principle Component Analysis • Linear Discriminate Analysis • Independent Component Analysis • Neural Networks • Genetic Algorithms

V. PRINCIPAL COMPONENT ANALYSIS We are using Principal Component Analysis (PCA)

algorithm for the gesture recognition method in our project, because of its robustness, parallelizability, and relative simplicity. Its disadvantages are its sensitivity to lighting and pose variations. There are various reasons to choose this algorithm for gesture recognition. Firstly, the environment that will be used to obtain the individual gestures is controlled and hence lighting ad pose variation effect can be minimized. Secondly, since a gesture can be subdivided into multiple regions, pattern recognition can be applied in parallel, resulting in faster gesture recognition. And finally, it allows us to quickly add individuals to the gesture database, making it better suited for real time applications

PCA has been called one of the most valuable results from

applied linear algebra. It is a dimensionality reduction

+ + + + + + + + +

σ

μ

733733733

technique based on extracting the desired number of principal components of the multi-dimensional data [8]. The first principal component is the linear combination of the original dimensions that has the maximum variance; the n-th principal component is the linear combination with the highest variance, subject to being orthogonal to the n-1 first principal components.

The idea of PCA is illustrated in Fig. 2; the axis labeled Φ1 corresponds to the direction of maximum variance and is chosen as the first principal component. In a 2D case, the second principal component is then determined uniquely by the orthogonality constraints; in a higher-dimensional space the selection process would continue, guided by the variances of the projections.

(a) (b) Figure 3. (a) The concept of PCA. Solid lines: the original basis; dashed lines: the PCA basis. The dots are selected at regularly spaced locations on a straight line rotated at 30o, and then perturbed by isotropic 2D Gaussian noise. (b) The

projection (1D reconstruction) of the data using only the first principal component.

When measuring only two variables, such as height and weight in a dozen patients, it is easy to plot this data and to visually assess the correlation between these two factors. However, in a typical microarray experiment, the expression of thousands of genes is measured across many conditions such as treatments or time points. Therefore, it becomes impossible to make a visual inspection of the relationship between genes or conditions in such a multi-dimensional matrix. One way to make sense of this data is to reduce its dimensionality.

This method reduces data dimensionality by performing a covariance analysis between factors. As such, it is suitable for data sets in multiple dimensions, such as a large experiment in gene expression and image processing. Covariance is always measured between two factors. So with three factors, covariance is measured between factor x and y; y and z, and x and z. When more than 2 factors are involved, covariance values can be placed into a matrix. This is where PCA becomes useful.

Then the covariance matrix is used to find Eigenvectors and eigenvalues relevant to the data Eigenvectors can be thought of as “preferential directions” of a data set, or in other words, main patterns in the data. Eigen values can be thought of as quantitative assessment of how much a component represents

the data. The higher the Eigen values of a component, the more representative it is of the data. Eigen values can also be representative of the level of explained variance as a percentage of total variance. The percent of variance explained is dependent on how well all the components summarize the data.

VI. USING THE TEMPLATE The gesture recognition using PCA algorithm that involves

two phases

• Training Phase • Recognition Phase

During the training phase, each gesture is represented as a

column vector, with each entry corresponding to gesture pixel. These gesture vectors are then normalized with respect to average gesture. Next, the algorithm finds the eigenvectors of the covariance matrix of normalized gestures by using a speed up technique that reduces the number of multiplications to be performed. Lastly, this eigenvector matrix then multiplied by each of the gesture vectors to obtain their corresponding gesture space projections.

In the recognition phase, a subject gesture is normalized with respect to the average gesture and then projected onto gesture space using the eigenvector matrix. Next, the Euclidean distance is computed between this projection and all known projections. The minimum value of these comparisons is selected and compared with the threshold calculating during the training phase.

A. Training Phase

1. Each gesture in the database is represented as a column in a matrix A. The values in each of these columns represent the pixels of the gesture image and range from 0 to 255 for an 8-bit grayscale gesture image:

A =

Where m=Size of image (The image has total m pixels) n = Number of Gesture Images 2. Average of the matrix A is calculated to normalize the

matrix A. The average of matrix A is a column vector in which every element is the average of every gesture pixel values respectively.

, , … … … … … , Where ∑ ,

i = 1, 2, …..,m

734734734

3 Next, the matrix is normalized by subtracting each column of matrix avg from each column of matrix A :

Ā =

4 We then want to compute the covariance matrix

of Ā, which is Ā × Ā T or Ā T × Ā. But here we use ( Ā x Ā T ), because it reduces the size of the covariance matrix and calculated as:

L = Ā × Ā T

5 Next step is to calculate to obtain the eigenvectors of

original matrix thus we need to calculate the eigenvectors of the covariance matrix L, let us say eigenvectors of the covariance matrix are V, with size of V is same as L.

6 Now we calculate the eigenvectors of the original matrix

after the calculation of V as follows: U = Ā T × V. 7 Each gesture is then projected to gesture space while

calculating the projection of the gestures as:

Ω = Ā ×U

B. Recognition Phase 1. We represent the test gesture as a column vector:

r = rr

2. The target gesture is then normalized:

r = r m r r

3. Next, we calculate the projection of test gesture to project

the gesture on gesture space by the equation given below: 4. We then find the Euclidean distance between the target

projection and each of the projections in the database:

ED i ∑ Ω 1, j Ω i, j

Where i = 1, 2, …., n

m = total number of pixels in a gesture. n = number of gestures in the database 5. Finally, we decide which gesture is recognized by the test

gesture by selecting minimum Euclidean distance from the Euclidean distance vector “ED”. (Size of the Euclidean Distance vector is 1 x no. of faces.)

VII. RESULTS The input database and test database gestures are given

below as shown in Fig. 4 and 5:

Figure 4. Database gestures

Figure 5. Test database gestures

The resulted output obtained from the proposed PCA algorithm for a particular gesture no.6 from database and for various gesture images are shown in Table 2, respectively using SystemC

VIII. CONCLUSION Specifically, Table 1 illustrates a sample of the ED values when sixth gesture is entered from the database as the test gesture for the recognition. We have also observed that the

735735735

next closest distance calculations corresponding to the other gestures of the same person in the database. The Table2 illustrates a sample of the Euclidean distance calculations for various test gestures entered.

Figure 6. Minimum distance for the test image 1 Thus the proposed PCA algorithm reveals that when a gesture, which already exists in the database, is entered, the projection distance calculated for that specific gesture is zero and match is excellent. However, when an unknown gesture is entered, then the gesture whose ED turns out to be the smallest, is the matching gesture. Fig. 6 and 7 are input image and output calculation of ED for the given input. The results show that the proposed approach is quite practical and useful. Further the system can be synthesized and tested by downloading it onto the hardware.

Figure 7. Minimum distance for the test image 2

Table 1 ED calculation for Gesture no.6

Gesture Index Distance to gesture no. 6

1 1.2234x107 2 1.0272x107 3 0.8376x 107 4 0.4410x107 (close to face 6) 5 0.6151x107 6 0.0000x107 (exact matching) 7 0.5574x107 8 1.3561x107

Minimum Distance = 2.2880* 107

Test Gesture Equivalent Gesture

Minimum Distance= 1.9625* 107

Test GestureEquivalent Gesture

736736736

Table 2 Observed ED values for different test gestures

Sr.No Test Gesture Equivalent Gesture

Euclidean Distance

1

2.2880* 107

2

1.9625* 107

3

2.7366* 107

4

2.02735* 107

5

2.9615* 107

6

0.0000* 107

7

1.42878*107

8

2.2661* 107

REFERENCES

[1] Theodoridis, S. Koutroumbas, K. ‘Pattern Recognition,’ Elsevier Publication, Second Edition,2003.

[2] Bledsoe, W. W. ‘‘The Model Method in Facial Recognition,’’ Panoramic Research Inc. Palo Alto, CA, Rep. PRI: 15 (August 1966).

[3] Kanade, T. "Picture Processing System by Computer Complex and Recognition of Human Faces," doctoral dissertation, Deptt. of Information Technology, Kyoto University, November, 1973.

[4] Yuille, A. L. Cohen, D. S. and Hallinan, P.W. “Feature Extraction from Faces using Deformable Templates,” Proc. Computer Vision and Pattern Recognition (CVPR), San Diego, CA, June 1989.

[5] W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. “Face Recognition: A Literature Survey,” Technical Report CAR-TR- 948, University of Maryland, 2000.

[6] Stonham, T.J. “Practical Face Recognition And Verification with WISARD,” in H. Ellis, M. Jeeves, F. NewCombe, and A. Young (eds.), Aspects of Face Processing, Martinus Nijhoff Publisers,Dordrecht,1986.

[7] Burt, P. “Smart sensing within a Pyramid Vision Machine,” Proc. IEEE, Vol. 76, No. 8, August 1988.

[8] Shakhnarovich, G. Moghaddam, B. “Face Recognition in Subspaces,” Mitsubishi Electric Research Laboratories, TR-2004-041, May 2004.

737737737

[ieee 2009 international conference on advances in recent technologies in communication and...

Documents