dimensionality reduction for computer facial animation

7
Dimensionality reduction for computer facial animation Flora S. Tsai School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore article info Keywords: Facial animation Dimensionality reduction Principal Components Analysis Expectation–Maximization algorithm for PCA Multidimensional Scaling Locally Linear Embedding abstract This paper describes the usage of dimensionality reduction techniques for computer facial animation. Techniques such as Principal Components Analysis (PCA), Expectation–Maximization (EM) algorithm for PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE) are compared for the pur- pose of facial animation of different emotions. The experimental results on our facial animation data demonstrate the usefulness of dimensionality reduction techniques for both space and time reduction. In particular, the EMPCA algorithm performed especially well in our dataset, with negligible error of only 1–2%. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Facial animation is one alternative for enabling natural human computer interaction. Computer facial animation has applications in many fields, such as realistic virtual humans with different facial expressions used in the entertainment industry. In communication applications, interactive talking faces make the interaction be- tween users and machines better, and also provide a friendly inter- face which help to attract users. Humanlike expression is critical among the issues concerning the realism of synthesized facial ani- mation. Computer facial animation still remains a very challenging topic for the computer graphics community because of the defor- mation of a moving face is complex and humans have an inherent sensitivity to the subtleties of facial motion. Furthermore, depic- tion of human emotion is an extremely difficult interdisciplinary research topic studied by researchers in computer graphics, artifi- cial intelligence, communication, psychology and others. In this paper, a realistic and expressive computer facial animation system is presented by automated learning from Vicon Nexus facial mo- tion capture data. Ultimately, the emotions data are mapped to a 3D animated face using Autodesk Motionbuilder. Vicon works by tracking infrared reflective markers using MX cameras that emit infrared light. A component called the MX Ultra- net processes each camera image in hardware essentially. After that, multiple camera views are computed into a list of 3D marker coordinates in the host computer. The data used for this paper are the facial markers motion data collected using Vicon Nexus. Dimensionality reduction techniques are used to process the data collected. The emotions data are then mapped to a 3D animated face using Autodesk Motionbuilder. Because our approach used data captured from a real speaker, more natural and lifelike facial animations can be created. Computer facial animation is important for many applications, such as video games, virtual reporters and other interactive human–computer interfaces. Computer facial animation also covers applications of different technologies including image pro- cessing, text-to-speech synthesis, and graphic visualization, and programming skills. This paper will focus on developing 3D ani- mated facial animation using machine learning techniques. This paper is organized as follows. Section 2 reviews related work on facial animation and dimensionality reduction techniques of Principal Components Analysis (PCA), Expectation–Maximiza- tion (EM) algorithm for PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE). Section 3 describes the design, methodology, and algorithms for the recording, modeling, and ani- mation stages. Section 4 presents the results of the experiments. Finally, Section 5 concludes the paper. 2. Literature review In this paper, facial animation techniques and dimensionality reduction techniques were studied. Dimensionality reduction tech- niques are implemented to process the facial motion data collected. 2.1. Facial animation In this section, related facial animation work are reviewed. In a previous study (Deng et al., 2006), the facial animation technique was composed of four stages which were recording, modeling, synthesis and animation. In the recording stage, expressive facial motion and its accompanying audio are recorded simultaneously and preprocessed. Vicon Nexus is used for the facial markers mo- tion capturing. In the modeling stage, an approach is presented to 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.10.018 Tel.: +65 6790 6369; fax: +65 6793 3318. E-mail address: [email protected] Expert Systems with Applications 39 (2012) 4965–4971 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: flora-s-tsai

Post on 04-Sep-2016

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Dimensionality reduction for computer facial animation

Expert Systems with Applications 39 (2012) 4965–4971

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Dimensionality reduction for computer facial animation

Flora S. Tsai ⇑School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore

a r t i c l e i n f o

Keywords:Facial animationDimensionality reductionPrincipal Components AnalysisExpectation–Maximization algorithm forPCAMultidimensional ScalingLocally Linear Embedding

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.10.018

⇑ Tel.: +65 6790 6369; fax: +65 6793 3318.E-mail address: [email protected]

a b s t r a c t

This paper describes the usage of dimensionality reduction techniques for computer facial animation.Techniques such as Principal Components Analysis (PCA), Expectation–Maximization (EM) algorithmfor PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE) are compared for the pur-pose of facial animation of different emotions. The experimental results on our facial animation datademonstrate the usefulness of dimensionality reduction techniques for both space and time reduction.In particular, the EMPCA algorithm performed especially well in our dataset, with negligible error of only1–2%.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Facial animation is one alternative for enabling natural humancomputer interaction. Computer facial animation has applicationsin many fields, such as realistic virtual humans with different facialexpressions used in the entertainment industry. In communicationapplications, interactive talking faces make the interaction be-tween users and machines better, and also provide a friendly inter-face which help to attract users. Humanlike expression is criticalamong the issues concerning the realism of synthesized facial ani-mation. Computer facial animation still remains a very challengingtopic for the computer graphics community because of the defor-mation of a moving face is complex and humans have an inherentsensitivity to the subtleties of facial motion. Furthermore, depic-tion of human emotion is an extremely difficult interdisciplinaryresearch topic studied by researchers in computer graphics, artifi-cial intelligence, communication, psychology and others. In thispaper, a realistic and expressive computer facial animation systemis presented by automated learning from Vicon Nexus facial mo-tion capture data. Ultimately, the emotions data are mapped to a3D animated face using Autodesk Motionbuilder.

Vicon works by tracking infrared reflective markers using MXcameras that emit infrared light. A component called the MX Ultra-net processes each camera image in hardware essentially. Afterthat, multiple camera views are computed into a list of 3D markercoordinates in the host computer. The data used for this paper arethe facial markers motion data collected using Vicon Nexus.Dimensionality reduction techniques are used to process the datacollected. The emotions data are then mapped to a 3D animatedface using Autodesk Motionbuilder. Because our approach used

ll rights reserved.

data captured from a real speaker, more natural and lifelike facialanimations can be created.

Computer facial animation is important for many applications,such as video games, virtual reporters and other interactivehuman–computer interfaces. Computer facial animation alsocovers applications of different technologies including image pro-cessing, text-to-speech synthesis, and graphic visualization, andprogramming skills. This paper will focus on developing 3D ani-mated facial animation using machine learning techniques.

This paper is organized as follows. Section 2 reviews relatedwork on facial animation and dimensionality reduction techniquesof Principal Components Analysis (PCA), Expectation–Maximiza-tion (EM) algorithm for PCA, Multidimensional Scaling (MDS),and Locally Linear Embedding (LLE). Section 3 describes the design,methodology, and algorithms for the recording, modeling, and ani-mation stages. Section 4 presents the results of the experiments.Finally, Section 5 concludes the paper.

2. Literature review

In this paper, facial animation techniques and dimensionalityreduction techniques were studied. Dimensionality reduction tech-niques are implemented to process the facial motion data collected.

2.1. Facial animation

In this section, related facial animation work are reviewed. In aprevious study (Deng et al., 2006), the facial animation techniquewas composed of four stages which were recording, modeling,synthesis and animation. In the recording stage, expressive facialmotion and its accompanying audio are recorded simultaneouslyand preprocessed. Vicon Nexus is used for the facial markers mo-tion capturing. In the modeling stage, an approach is presented to

Page 2: Dimensionality reduction for computer facial animation

4966 F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971

learn speech coarticulation models from facial motion capture data,and a Phoneme-Independent Expression Eigenspace (PIEES) is con-structed. In the synthesis stage, based on the learned speech coar-ticulation models and the PIEES from the modeling stage, thecorresponding expressive facial animation is synthesized accordingto the given input speech/texts and expression. There are two sub-systems in the synthesis system which are neutral speech motionsynthesis and dynamic expression synthesis. The speech motionsynthesis subsystem learns explicit but compact speech coarticula-tion models from recorded facial motion capture data, based on aweight-decomposition method. Given a new phoneme sequence,this system synthesizes corresponding neutral visual speech mo-tion by concatenating the learned coarticulation models. In thedynamic expression synthesis subsystem, PIEES is first constructedby a phoneme-based time warping and subtraction, then novel dy-namic expression sequences are generated from the constructedPIEES by texture-synthesis approaches. Finally, the synthesizedexpression signals are weight-blended with the synthesized neutralspeech motion to generate expressive facial animation. The com-pact size of the learned speech coarticulation models and the PIEESmake it possible for the system to be used for on-the-fly facial ani-mation synthesis. Finally, in the animation stage, the captured mo-tion markers are mapped to the 3D face model (Cao, Tien, Faloutsos,& Pighin, 2005).

2.2. Principal Components Analysis

Principal Components Analysis (PCA) is a useful statistical andwidely-used technique for finding patterns in data of high dimen-sions. It is useful in reducing dimensionality and finding new, moreinformative, uncorrelated features (Tsai, 2010). There are somemathematical concepts that are used in PCA which covers standarddeviation, covariance, eigenvectors and eigenvalues. PCA is a way ofidentifying patterns in data and highlight their similarities and dif-ferences. While the luxury of graphical representation is not avail-able, patterns can be hard to find in data of high dimensions.Therefore, PCA is a powerful tool for analyzing data of high dimen-sions. The other main advantage of PCA is that once we have foundthese patterns in the data, then we could compress the data, reduc-ing the number of dimensions without much loss of information.There are six steps to perform PCA on a set of data which are toget data, subtract the mean, calculate the covariance matrix, calcu-late the eigenvectors and eigenvalues of the covariance matrix andthen choose components for forming a feature vector (Smith, 2002).

2.3. EM algorithms for PCA

A few eigenvectors and eigenvalues are allowed to be extractedfrom the large amount of high dimensional data with Expectation–Maximization (EM) algorithm for Principal Component Analysis.Missing information are accommodated naturally, resulting in highcomputational efficiency in both space and time. Principal Compo-nent Analysis can be viewed as a limiting case of a particular classof linear-Gaussian models (Roweis, 1998). The covariance structureof an observed p-dimensional variable can be captured using fewerthan the p(p + 1)/2 free parameters which are required in a fullcovariance matrix. Linear-Gaussian models assume that it wasproduced as a linear transformation of some k-dimensional latentvariable x plus additive Gaussian noise (Roweis, 1998). Denotingthe transformation by the p � k matrix C, and the p-dimensionalnoise by v (with covariance matrix R), the generative model can beexpressed as

y ¼ Cxþ v x � Nð0; IÞ v � Nð0;RÞ ð1Þ

According to a unit variance spherical Gaussian, the latent orcause variables x are assumed to be independent and identically

distributed. V is also independent and normal distributed. It is as-sumed to be independent of x. The model can be reduced to a singleGaussian model which can be expressed as:

y � N 0;CCT þ R� �

ð2Þ

It is necessary to have the condition k < p to restrict the covariancestructure of the Gaussian noise v by constraining the matrix R andto save parameters over the direct covariance representation in p-space as well.

2.3.1. Inference and learningWhen working with the linear-Gaussian models, there are two

problems that arise. The first problem is related to state inferenceor compression. Given fixed model parameters C, R, and also someobservation y, we are interested in the posterior probability P (xjy)over a single hidden state given the corresponding single observa-tion. Linear matrix projection is used for computation and theresulting density is itself Gaussian:

PðxjyÞ ¼ PðyjxÞPðxÞPðyÞ ¼

NðCx;RÞjyNð0; IÞjxNð0;CCT þ RÞjy

ð3Þ

PðxjyÞ ¼ Nðby; I � bCÞjx;b ¼ CTðCCT þ RÞ�1 ð4Þ

Not only is the expected value by of the unknown state obtained butalso an estimate of the uncertainty in this value in the form of thecovariance I � bC. Computing y from x (reconstruction) is alsostraightforward: P(yjx) = N(Cx,R)jy. Finally, the likelihood of anydata point y is computed using Eq. (1).

2.3.2. Zero noise limitPrincipal Component Analysis is a limiting case of the linear-

Gaussian model because of covariance of the noise v becomesinfinitesimally small and equal in all directions. PCA is obtainedby taking the limit R = lim�?0eI mathematically. This makes thelikelihood of a point y dominated solely by the squared distancebetween it and its reconstruction Cx. The directions of the columnsof C which minimize this error are known as the principal compo-nents. Inference now reduces to simple least squares projection:

PðxjyÞ ¼ Nðby; I � bCÞjx;b ¼ lim CTðCCT þ eIÞ�1 ð5ÞPðxjyÞ ¼ NððCT CÞ�1CT y;0Þjx ¼ dðx� ðCT CÞ�1CT yÞ ð6Þ

The posterior over states collapses to a single point and the covari-ance becomes zero since the noise has become infinitesimal.

2.3.3. EM algorithmEven though the principal components can be computed explic-

itly, there is still an EM algorithm for learning them. It can be easilyderived as the zero noise limit of the standard algorithms byreplacing the usual e-step with the projection. The algorithm is:

e-step : X ¼ ðCT CÞ�1CT Y ð7Þm-step : Cnew ¼ YXTðXXTÞ�1 ð8Þ

where Y is a p � n matrix of all the observed data and X is a k � nmatrix of the unknown states. The columns of C will span the spaceof the first k principal components. The algorithm can be performedonline using only a single data point at a time and so its storagerequirements are only O(kp) + O(k2). The intuition behind the algo-rithm is as follows: guess an orientation for the principal subspace.Fix the estimated subspace and project the data y into it to give thevalues of the hidden states x. Finally, fix the values of the hiddenstates and choose the subspace orientation which minimizes thesquared reconstruction errors of the data points.

Page 3: Dimensionality reduction for computer facial animation

F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971 4967

2.4. Multidimensional Scaling

Multidimensional Scaling (MDS) is a data analysis techniquethat displays the structure of distance-like data as a geometricalpicture (Tsai, 2011b). MDS pictures the structure of a set of objectsand approximates the distances between pairs of the objects. Thedata can be an ‘‘objective’’ similarity measure or an index calcu-lated from multivariate data. The data must always be representedby the degree of similarity of pairs of objects. Each object is repre-sented by a point in a multidimensional space. The distances be-tween pairs of points have the strongest possible relation to thesimilarities among the pairs of objects. For example, two similarobjects are represented by two points that are close together,and two dissimilar objects are represented by two points that arefar apart. The key idea of the method, to approximate the originalset of distances with distances corresponding to a configuration ofpoints in a Euclidean space can also be used for constructing a non-linear projection method (Tsai & Chan, 2007).

2.5. Locally Linear Embedding

Locally Linear Embedding (LLE) (Roweis & Saul, 2000) is anunsupervised learning algorithm that computes low-dimensional,neighborhood-preserving embeddings of high-dimensional inputs.LLE maps inputs into a single global coordinate system of lowerdimensionality, and its optimizations do not involve local minima(Roweis & Saul, 2000). LLE is able to learn the global structure ofnonlinear manifolds by exploiting the local symmetries of linearreconstructions. In LLE, each data point and its neighbors are ex-pected to lie on or close to a locally linear patch of the manifold.The local geometry of these patches is characterized by linear coef-ficients that reconstruct each data point from its neighbours (Row-eis & Saul, 2000).

eðWÞ ¼X

i

Xi �X

j

WijXj

�����

�����2

ð9Þ

The weights Wij summarize the contribution of the jth data point tothe ith reconstruction. To compute the weights Wij, the cost func-tion is minimized subject to two constraints:

(1) Each data point Xi is reconstructed only from its neighbors, ifWij = 0 if Xj does not belong to the set of neighbors of Xi.

(2) Sum the rows of the weight matrix:P

jWij ¼ 1.

By solving a least-squares problem, the optimal weights Wij

subject to these constraints are found. This assumes that the datalies on or near a smooth nonlinear manifold of lower dimensional-ity d� D. A linear mapping can be approximated – consisting of atranslation, rotation, and rescaling – to map the high-dimensionalcoordinates of each neighborhood into global internal coordinateson the manifold (Roweis & Saul, 2000). LLE constructs a neighbor-hood-preserving mapping based on the above idea. In the final stepof the algorithm, each high-dimensional observation Xi is mappedto a low-dimensional vector Yi representing global internal coordi-nates on the manifold. This is done by choosing d-dimensionalcoordinates Yi to minimize the embedding cost function:

/ðYÞ ¼X

i

Yi �X

j

WijYj

�����

�����2

ð10Þ

It can be minimized by solving a sparse N � N eigenvalue problem.LLE takes a different approach, analyzing local symmetries, linearcoefficients, and reconstruction errors instead of global constraints,pairwise distances, and stress functions (Roweis & Saul, 2000).Therefore, it avoids the need to solve large dynamic programming

problems, and tends to accumulate very sparse matrices, whosestructure can be exploited for time and space savings (Tsai, 2011a).

3. Design, methodology, and algorithms

3.1. Recording stage

Vicon motion capture system has been chosen for the recordingstage in this paper. Basically, Vicon works by tracking infrared reflec-tive markers using MX cameras that emit infrared light. A compo-nent called the MX Ultranet processes each camera image inhardware. After that, multiple camera views will be computed intoa list of 3D marker coordinates in the host computer. The Vicon hard-ware consists of Infrared reflective markers, eight MX cameras, onehost computer and four display computers. MX cameras contain ahigh resolution (4 Megapixel) and high speed (370 Hz) imaging sen-sor which is specifically tuned for infrared light detection. To illumi-nate the reflective markers, MX camera contains a ring of infraredLED around the camera lens. Each camera also includes a differentangle lens, and light entering the camera can be adjusted with a lensmounted aperture. Each camera outputs raw infrared point values tothe MX Ultranet system. Infrared reflective markers are plastic ballscovered with reflective tape. Smaller markers are used and they areaffixed with double sided tape onto the model’s face. The ViconNexus software is used to capture facial motion markers in this pa-per. A real speaker is used for the recording stage. Infrared reflectivemarkers is placed over the model’s entire face so that facial markersmotion can be recorded. Experiments using 17 markers, 100 mark-ers, 102 markers and 103 markers are conducted to determine theappropriate number of markers to be used. In addition, a micro-phone is also placed inside the lab for audio recording.

3.2. Modeling stage: dimensionality reduction

In the modeling stage, an approach will be presented to learnspeech coarticulation models from facial motion capture data, andPhoneme-Independent Expression Eigenspace (PIEES) will be con-structed. Before the construction of the Phoneme-IndependentExpression Eigenspace (PIEES), the facial motion data have to beprocessed first to reduce the dimensions of the data. The facial mo-tion data contains a matrix with frames as rows and coordinate ofmarker points as columns. The Matlab Toolbox for DimensionalityReduction (v0.7b) (van der Maaten, 2007) is used to performdimensionality reduction techniques on the facial motion data,where the dimensions of the data will be reduced from 1000+ toseveral chosen dimensions. The following dimensionality reductiontechniques have been chosen to process the facial motion data:

(1) Principal Components Analysis (PCA), PCA can find patterns indata of high dimension easily, highlight their similarities anddifferences, compress data, reduce the dimensions, withoutmuch loss of information.

(2) EM Algorithms for PCA (EMPCA), EMPCA allows a few eigen-vectors and eigenvalues to be extracted from a large amountof high dimensional data and accommodates missing infor-mation naturally.

(3) Multidimensional Scaling (MDS), MDS is a popular data anal-ysis technique which display the structure of distance-likedata as a geometrical picture.

(4) Locally Liner Embedding (LLE), LLE is a unsupervised learningalgorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs.

The results after dimensionality reduction will be observedusing the Matlab programs. By using the program, the results can

Page 4: Dimensionality reduction for computer facial animation

Fig. 1. Resulting graph of EMPCA with different number of iterations.

4968 F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971

be visualized in a video to allow us to evaluate the quality of theresulting data. Furthermore, Root Mean Square Error (RMSE) isintroduced to compute the total errors of the dimensionalityreduced data. With that, the resulting data can be evaluated quan-titatively. 3D coordinate points of the reduced-dimension data willbe compared with the original data. Because the original data has agreater number of dimensions than the reduced data, the originaldata is sampled by taking out data for every (n/k) interval. n isthe number of dimensions of the original data while k is the num-ber of dimensions of the reduced data. While the dimensions ofboth data are matched, the Root Mean Square Error (RMSE) canbe computed by using the equation:

RMSEx ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1=n

Xn

i¼1

ðxi � x2iÞ2vuut ð11Þ

RMSEy ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1=n

Xn

i¼1

ðyi � y2iÞ2vuut ð12Þ

RMSEz ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1=n

Xn

i¼1

ðzi � z2iÞ2vuut ð13Þ

After that, Root Mean Square Error of the 3 axis are added by usingthe equation:

RMSEtotal ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðRMSExÞ2 þ ðRMSEyÞ2 þ ðRMSEzÞ2

qð14Þ

Finally, RMSE of all dimensions of the data will be added. Experi-ments will be conducted by trying different dimensions and param-eters of the dimensionality reduction techniques such as iterationsnumber, i in EMPCA and number of nearest neighbors, k in LLE.Dimensionality reduction techniques which are better in preservingthe originality of the data will be chosen to process the remainingfacial motion data.

3.3. Animation stage: data mapping

Although other software for the creation of 3D animated faceand data mapping exist, Autodesk MotionBuilder 2009 was chosenbecause it allows direct data input from the Vicon motion capturesystem which is in .C3D format. The resulting data from Matlabwhich is in .MAT format will be converted to .C3D format first sothat they can be loaded into the Autodesk MotionBuilder 2009.After the datasets are loaded, they will be mapped to the 3D ani-mated face created in the Autodesk MotionBuilder 2009.

4. Results and discussion

4.1. Recording stage

We used 103 markers for the formal recording with four differentemotions which are anger, sadness, happiness, and surprise so thatthe facial expression can be more clearly reflected. To ensure highquality data, a professional model was hired to perform during aseries of motion capture sessions. The model was directed to speaka custom phoneme-balanced corpus. The model spoke the samecorpus for all the different expressions. In total, 34,000 sentenceswith different emotions have been recorded.

4.2. Modeling stage: dimensionality reduction

After the recording stage, the data is processed by the followingselected dimensionality reduction techniques: Principal Compo-nents Analysis, EM Algorithms for PCA, Multidimensional Scaling,and Locally Linear Embedding.

4.2.1. ExperimentsBefore starting the experiments, good facial motion data with

no gaps in between the frames are chosen from the database sothat the dimensionality reduction techniques can be performed.After that, a Matlab program is written to read the data (Excel file)and save it as a MAT-file so that it can be read by the Matlab Tool-box for Dimensionality Reduction (v0.7b), which was used to per-form several dimensionality reduction techniques on the datatested. The GUI of the toolbox is called and after that the selecteddata is read. The experiment is then continued by choosing thedimensionality reduction technique and the dimensions required.In the experiments, the dimensions of the data is reduced from1000+ to several chosen dimensions. After that, the processed datais saved as a new MAT-file. Another Matlab program is written toread the data and visualize the data in a video so that the qualityof the processed data can be observed. X, Y, Z are the parametersused in the video which indicate the marker coordinates of the se-lected data.

4.2.2. Computation of Root Mean Square ErrorA Matlab program is written to compute the Root Mean Square

Error on the dimensions reduced data. There are some parametersin the dimensionality reduction techniques which will affect thequality of the data. Therefore, number of iterations, i in the methodEMPCA and number of nearest neighbours, k in the method LLE arevaried in the experiments and the effect of these parameters arerecorded.

4.2.3. Number of iterations, i in EMPCATo determine the effect of number of iterations, i in the method

EMPCA, i is varied from 3 to 20 during the experiments. Dimen-sions are targeted to be reduced to 30 and 50 and the correspond-ing RMSE in percentage will be recorded. The resulting graph isshown in Fig. 1.

The number of iterations, i = 10 yields the least RMSE valuewhich are 1.10% and 1.31% in both dimensions 30 and 50. The stan-dard convergence proofs for EM apply to EMPCA as well, so it canbe concluded that it will always reach a local maximum likelihood.Furthermore, it is also shown that the only stable local extremumis the global maximum at which the true principal subspace isfound (Roweis, 1998). Therefore, the method of EMPCA has finallyfound its true principal subspace at i = 10, converging to the correctvalue while its RMSE value is minimum. Further increasing of thenumber of iterations, i will only lead to higher errors because its

Page 5: Dimensionality reduction for computer facial animation

F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971 4969

true principal subspace was already found. Therefore, i = 10 is cho-sen when EMPCA is performed on the data.

4.2.4. Nearest neighbours, k in LLETo determine the effect of nearest neighbours, k in the method

LLE, k is varied from 13 to 15 in the experiments. Dimensions aretargeted to be reduced to 30 and 50 and the corresponding RMSEin percentage will be recorded. The resulting graph is shown inFig. 2.

The number of nearest neighbours, k = 13 yields the least RMSEvalue which are 13.84% and 8.30% in both dimensions 30 and 50.From the visualization result of the original data in Figs. 3–5, weobserve that the number of nearest neighbours of a marker isaround 13–15. If the number of nearest neighbours, k, is less than13, the resulting data will contain errors where information of lessthan 103 markers are preserved. Therefore, number of nearestneighbours, k = 13 is chosen when performing LLE on the data.

4.2.5. RMSE computation for all the dimensionality reductiontechniques

Number of dimensions targeted to be reduced are 10, 20, 30, 40,50, 75, 100 and 150 in the experiments. The number of iterations, i

Fig. 2. Resulting graph of LLE with different number of nearest neighbours.

Fig. 3. Resulting graph of PCA with different number of dimensions.

in EMPCA and number of nearest neighbours, k in LLE are deter-mined. The resulting graphs of PCA, MDS, LLE, and EMPCA areshown in Figs. 3–5.

It can be concluded that EMPCA performs better than otherdimensionality reduction techniques. Moreover, data which is pro-cessed using EMPCA with 30 dimensions and 10 iterations per-formed the best with the lowest Root Mean Square Error of1.10%. The RMSE of LLE decreases with increasing dimensions.However, dimensions larger than 150 are not considered becausethe motion data should be small to be used in the constructionof the Phoneme-Independent Expression Eigenspace (PIEES) inthe modeling stage. Furthermore, MDS and PCA always producethe same results as far as Euclidean distance is concerned (Tsai &Chan, 2007). Therefore, the RMSE of PCA and MDS with differentdimensions are the same, as shown in Figs. 3 and 4.

EMPCA performed better than the other techniques becauseEMPCA is more effective when implemented for high dimensionaldata compared to the other techniques. EMPCA has solved theproblem of PCA which has trouble in finding principal componentdirections with high dimensional data. Furthermore, EMPCA alsoperformed better while dealing with missing data. Other methodscannot accommodate missing values while EMPCA estimates the

Fig. 4. Resulting graph of MDS with different number of dimensions.

Fig. 5. Resulting graph of EMPCA and LLE with different number of dimensions.

Page 6: Dimensionality reduction for computer facial animation

4970 F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971

maximum likelihood values for missing information directly ateach iteration. Simple and efficient computation of eigenvectorsand eigenvalues are also used in EMPCA when dealing with highdimensional data.

4.3. Animation stage

The animation stage focuses on developing 3D animated facialanimation by using the resulting data from the previous stages.

Fig. 7. Results of happy emotion processed using EMPCA with dimensions 30mapped into MotionBuilder 2009.

4.3.1. Animation stage: data conversionThe data is mapped to a 3D animated face using Autodesk

MotionBuilder 2009. Only .C3D file can be read by the AutodeskMotionBuilder 2009. Therefore, the resulting data which is in.MAT format is converted to .C3D format by using the softwareC3Deditor. The dimensionality reduced data are entered into the.C3D file using C3Deditor. After that, data of different emotionsare loaded into Autodesk MotionBuilder 2009.

4.3.2. Space improvementAfter the conversion of the resulting data to .C3D format, the

size of the resulting C3D file can be compared with the size ofthe original C3D data. The size of the resulting C3D file is 60 KBwhile the size of the original C3D data is 2.5 MB, which is a spaceimprovement of 97.6%.

Fig. 8. Results of surprise emotion processed using EMPCA with dimensions 30mapped into MotionBuilder 2009.

4.3.3. Animation stage: data mappingData of different emotions are loaded in Autodesk MotionBuild-

er 2009. The markers loaded from C3D file are resized. After that,the actor face is created. Three markers (from nose and forehead)are chosen as reference points and matched to the actor face bydragging them to the Object section. Other markers are draggingto their own corresponding section on the actor face. Character iscreated by dragging the character file from asset browser. The gen-eric expressions of character face are defined and matched to theactor’s face. Finally, a 3D animated face is created. The resultingdata with the emotions of angry, happy, surprise and sad (30dimensions processed using EMPCA) mapped to the 3D animatedfaces are shown in Figs. 6–9.

4.3.4. Discussion and evaluation of the resultsCompared to the previous facial animation techniques, our

technique has many advantages.

Fig. 6. Results of angry emotion processed using EMPCA with dimensions 30mapped into MotionBuilder 2009.

Fig. 9. Results of sad emotion processed using EMPCA with dimensions 30 mappedinto MotionBuilder 2009.

Accuracy and robustness. The approach presented in this paperuses 103 markers so that movement of any part of the facecan be recorded accurately, ensuring expressive facial emotion.To construct Phoneme-Independent Expression Eigenspace(PIEES) in the modeling part, dimensionally reduction has isperformed on the original data to reduce the size. In ourapproach, EM Algorithms for PCA has been chosen to reduce

Page 7: Dimensionality reduction for computer facial animation

F.S. Tsai / Expert Systems with Applications 39 (2012) 4965–4971 4971

the dimensions of the original data. After the reduction ofdimensions on the data, EMPCA only induced 1–2% variationcompared with the original data which is considered as negligi-ble error. With that, accuracy and robustness are ensured in ourapproach.Simplicity and Improvement of efficiency. In our approach, theimplementation of Autodesk MotionBuilder 2009 has simplifiedthe animation stage. Autodesk MotionBuilder 2009 is chosen inthe animation stage because it is developed particularly to becompatible with Vicon motion capture system which is usedin the recording stage in our approach. Autodesk MotionBuilder2009 allows direct input of .C3D data and creation of actor facehas increased the efficiency of data mapping. With that, 3D ani-mated faces can be generated by Autodesk MotionBuilder 2009in an advantageous and competitive time compared with otherfacial animation techniques.Perfect synchronization. Multiple-camera approaches alwaysface critical problem in camera synchronization. Accurate datacapture by multiple cameras requires special synchronizationdevices. In our approach, eight camera can simultaneously cap-ture eight images of different viewpoints. Perfect synchroniza-tion among multiple views is inherent.

5. Conclusion

This paper presented a realistic and expressive computer facialanimation system by automated learning from Vicon Nexus facialmotion capture data. Our approach with 103 markers performedbetter for expressive facial animation compared with anotherapproach using only 50 markers. Facial motion data of differentemotions collected using Vicon Nexus are processed using dimen-sionality reduction techniques of Principal Components Analysis(PCA), EM algorithms for PCA (EMPCA), Locally Linear Embedding(LLE) and Multidimensional Scaling (MDS). EMPCA with 30 dimen-sions and 10 iterations best preserved the originality of the datacompared with the other techniques. Reducing the dimensionsof the original data resulted in a space improvement of 97.6% withnegligible error of only 1–2%. The data with different emotions are

mapped to a 3D animated face using Autodesk Motionbuilder2009, producing reasonable results. The approach presented inthis paper used data captured from a real speaker, which makesthe mapped facial animation more natural and lifelike. The map-pings results are considered good as the motion of the eyes,eyebrows, and lips can be clearly reflected by the 3D animatedface. In the paper, 103 markers have been used to indicate facialmotion, compared to the 30–50 markers that are typically used.The approach presented can be used for various applications andserve as prototyping tool to generate natural expressive and real-istic facial animation.

Acknowledgements

The author acknowledges Chan Choon Seng for his contri-butions.

References

Cao, Y., Tien, W. C., Faloutsos, P., & Pighin, F. (2005). Expressive speech-driven facialanimation. ACM Trans. Graph., 24(4), 1283–1302.

Deng, Z., Neumann, U., Lewis, J., Kim, T.-Y., Bulut, M., & Narayanan, S. (2006).Expressive facial animation synthesis by learning speech coarticulation andexpression spaces. IEEE Trans. Vis. Comput. Graphics, 12(6), 1523–1534.

Roweis, S. (1998). EM algorithms for PCA and SPCA. In NIPS ’97: Proceedings of the1997 conference on Advances in neural information processing systems 10. MITPress, Cambridge, MA, USA (pp. 626–632).

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locallylinear embedding. Science, 290(5500), 2323–2326.

Smith, L. I. (2002). A tutorial on principal components analysis. URL<www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf>.

Tsai, F. S. (2010). Comparative study of dimensionality reduction techniques fordata visualization. Journal of Artificial Intelligence, 3(3), 119–134.

Tsai, F. S. (2011a). Dimensionality reduction framework for blog mining andvisualization. International Journal of Data Mining, Modelling and Management.

Tsai, F. S. (2011b). Dimensionality reduction techniques for blog visualization.Expert Systems With Applications, 38(3), 2766–2773.

Tsai, F. S., & Chan, K. L. (2007). Dimensionality reduction techniques for dataexploration. In 2007 6th international conference on information, communicationsand signal processing, ICICS (pp. 1568–1572).

van der Maaten, L. (2007). An introduction to dimensionality reduction usingMatlab. Tech. rep., Maastricht University, Maastricht, The Netherlands.