new algorithm for 3d facial model reconstruction and its ...frey/papers/forensics/liang r.h... ·...

9
July 2004, Vol.19, No.4, pp.501-509 J. Comput. Sci. & Technol. New Algorithm for 3D Facial Model Reconstruction and Its Application in Virtual Reality Rong-Hua Liang z,2, Zhi-Geng Pan 3, and Chun Chen 1 1College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China 2 CoUege of Information and Engineering, Zhejiang University of Technology, Hangzhou, 310014. P.R. China 3State Key Laboratory of CAD and CG, Zhejiang University, Hangzhou 310027, P.R. China E-mail: [email protected]; pzg~cad.zju.edu.cn; [email protected] Received February 17, 2003; revised April 29, 2003. Abstract 3D human face model reconstruction is essential to the generation of facial animations that is widely used in the field of virtual reality (VR). The main issues of 3D facial model reconstruction based on images by vision technologies are in twofold: one is to select and match the corresponding features of face from two images with minimal interaction and the other is to generate the realistic-looking human face model. In this paper, a new algorithm for realistic-looking face reconstruction is presented based on stereo vision. Firstly, a pattern is printed and attached to a planar surface for camera calibration, and corners generation and corners matching between two images are performed by integrating modified image pyramid Lucas-Kanade (PLK) algorithm and local adjustment algorithm, and then 3D coordinates of corners are obtained by 3D reconstruction. Individual face model is generated by the deformation of general 3D model and interpolation of the features. Finally, realistic- looking human face model is obtained after texture mapping and eyes modeling. In addition, some application examples in the field of VR are given. Experimental result shows that the proposed algorithm is robust and the 3D model is photo-realistic. Keywords stereo vision, image PLK algorithm, 3D reconstruction, texture mapping, virtual reality 1 Introduction In the field of computer vision and computer graphics, how to generate the realistic-looking hu- man face models and distinguish the slight individ- ual differences when animating them is always one of the most attractive but also difficult problems. Animated face models are essential to video confer- encing, games, virtual "showman", online chatting, etc. Since animated human face can be acquired by the deforming of freedom degree of general pa- rameterized model, a vivid individual facial model reconstruction is essential. One simple approach to getting 3D human face model is using laser scanners, and the facial anima- tions can then be generated after computing tex- tured model and geometrical deformation[ 1'2]. Not only are these scanners expensive, but also the data by laser scanners are usually quite noisy, requiring manual adjustment and alignment of deformation parameters before animating the model. To reconstruct 3D human face from images based on the computer vision technologies ex- ploited has been a challenge for it requires only an ordinary camera and PC. There has been a large amount of research on 3D face reconstruction and animation [3'4], and these approaches can be roughly classified into two groups. 1) To recover face geometry by image prop- erty. These approaches rely on stereo [51 , shading [6], structured light[ 71, silhouettes [s] and features for in- dividual face reconstructed [91, the images viewed with a predefined structure such as orthogonal views[ 1~ to obtain the model from the contour of images. 2) To recover face geometry and animate the face model by 3D general face model and a facial deformation control model. Recovering head as a simple triangular mesh is not sufficient. To ani- mate the face, these approaches usually employ a 3D generM model, for most people have the resem- bling facial features that can constrain our geomet- rical search space for 3D face model generation, and animated human face can be acquired by deforma- tion parameters of general model, such as muscular model [121, FAP in MPEG-4 [131, and so on. Up till now, the second approach is popular but *Regular This research is supported by the National Grand Fundamental Research 973 Program of China under Grant No.2002CB312100, Excellent Youth NSF of Zhejiang Province under Grant No.RC40008, and Zhejiang Provincial Nat- ural Science Foundation under Grant Nos.602134 and Z603262.

Upload: phamdang

Post on 29-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

July 2004, Vol.19, No.4, pp.501-509 J. Comput. Sci. & Technol.

New Algorithm for 3D Facial Model Reconstruct ion and Its Application in Virtual Real ity

Rong-Hua Liang z,2, Zhi-Geng Pan 3, and Chun Chen 1

1 College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China

2 CoUege of Information and Engineering, Zhejiang University of Technology, Hangzhou, 310014. P.R. China

3State Key Laboratory of CAD and CG, Zhejiang University, Hangzhou 310027, P.R. China

E-mail: [email protected]; pzg~cad.zju.edu.cn; [email protected]

Received February 17, 2003; revised April 29, 2003.

Abs t r ac t 3D human face model reconstruction is essential to the generation of facial animations that is widely used in the field of virtual reality (VR). The main issues of 3D facial model reconstruction based on images by vision technologies are in twofold: one is to select and match the corresponding features of face from two images with minimal interaction and the other is to generate the realistic-looking human face model. In this paper, a new algorithm for realistic-looking face reconstruction is presented based on stereo vision. Firstly, a pattern is printed and attached to a planar surface for camera calibration, and corners generation and corners matching between two images are performed by integrating modified image pyramid Lucas-Kanade (PLK) algorithm and local adjustment algorithm, and then 3D coordinates of corners are obtained by 3D reconstruction. Individual face model is generated by the deformation of general 3D model and interpolation of the features. Finally, realistic- looking human face model is obtained after texture mapping and eyes modeling. In addition, some application examples in the field of VR are given. Experimental result shows that the proposed algorithm is robust and the 3D model is photo-realistic.

Keywords stereo vision, image PLK algorithm, 3D reconstruction, texture mapping, virtual reality

1 I n t r o d u c t i o n

In the field of computer vision and computer graphics, how to generate the realistic-looking hu- man face models and distinguish the slight individ- ual differences when animating them is always one of the most attractive but also difficult problems. Animated face models are essential to video confer- encing, games, virtual "showman", online chatting, etc. Since animated human face can be acquired by the deforming of freedom degree of general pa- rameterized model, a vivid individual facial model reconstruction is essential.

One simple approach to getting 3D human face model is using laser scanners, and the facial anima- tions can then be generated after computing tex- tured model and geometrical deformation[ 1'2]. Not only are these scanners expensive, but also the data by laser scanners are usually quite noisy, requiring manual adjustment and alignment of deformation parameters before animating the model.

To reconstruct 3D human face from images based on the computer vision technologies ex- ploited has been a challenge for it requires only an

ordinary camera and PC. There has been a large amount of research on 3D face reconstruction and animation [3'4] , and these approaches can be roughly classified into two groups.

1) To recover face geometry by image prop- erty. These approaches rely on stereo [51 , shading [6] , structured light[ 71, silhouettes [s] and features for in- dividual face reconstructed [91, the images viewed with a predefined structure such as orthogonal views[ 1~ to obtain the model from the contour of images.

2) To recover face geometry and animate the face model by 3D general face model and a facial deformation control model. Recovering head as a simple triangular mesh is not sufficient. To ani- mate the face, these approaches usually employ a 3D generM model, for most people have the resem- bling facial features that can constrain our geomet- rical search space for 3D face model generation, and animated human face can be acquired by deforma- tion parameters of general model, such as muscular model [121, FAP in MPEG-4 [131, and so on.

Up till now, the second approach is popular but

* Regular This research is supported by the National Grand Fundamental Research 973 Program of China under Grant

No.2002CB312100, Excellent Youth NSF of Zhejiang Province under Grant No.RC40008, and Zhejiang Provincial Nat- ural Science Foundation under Grant Nos.602134 and Z603262.

502 J. Comput. Sci. & Technol., July 2004, I/o1.19, No.4

difficult to select and match the corners of two im- ages, so one main goal of 3D face reconstruction is to investigate how to deform a facial surface spa- tially with minimal user intervention. In this re- search area, Fua [14'15] has done impressive work on face reconstruction from two images, which is based on robust least-squares adjustment by a set of progressive finer control triangulations and takes advantage of three complementary sources of in- formation: stereo data, silhouette edges and 2-D feature points. And then a head model is gener- ated by model fitting algorithm and texture map- ping. But this algorithm is time consuming. Blanz and Vetter [161 presented linear classes of face ge- ometry that can constrain the geometrical search space and is very powerful in generating convinc- ing 3D human face models from images. Liu et alfl 7] developed a system that can construct tex- tured 3D face models from video sequences with minimal user intervention. After five manual clicks on any two images captured by ordinary video cam- era to tell the system where the eye corners, nose tip and mouth corners are, the system automati- cally generates a realistic-looking 3D human head model and the constructed model can be animated immediately. Customers can use their system to generate his/her face model with a PC and an or- dinary camera in a few minutes. However, their system fails sometimes. When people are young females with very smooth skin, the corner point matching may fail.

The goal of our algorithm is to atlow an un- trained user with a PC and an ordinary camera to create and instantly animate his/her face model in a few minutes with minimal user intervention, and the reconstructed model is photo-realistic enough t o be exploited in the field of virtual reality (VR). In this paper, an algorithm for realistic-looking face reconstruction is presented in terms of vision tech- nique. Firstly, a pat tern is printed and attached to a planar surface for camera calibration, which can obtain accurate intrinsic parameters compared with auto-calibration approaches. Corner genera- tion and corner matching between two images are performed by integrating modified image PLK al- gorithm and local adjustment algorithm. And then 3D coordinates of corners are obtained by 3D re- construction. Individual face model is generated by deformation of general 3D model and interpo- lation of the features. Finally realistic-looking hu- man face model is obtained by texture mapping. Compared with previous approaches, our method adopts different strategies for corner tracking on

face, disparity detecting and texture mapping.

2 N e w A l g o r i t h m for H u m a n Face M o d e l R e c o n s t r u c t i o n

2.1 A l g o r i t h m O u t l i n e

According to motion from structure in terms of vision technique, first 2D corresponding corners are tracked between two images, and then the 3D coor- dinates of tracked points based on stereo vision are calculated. In terms of the linear classes of human face geometries, a general facial model is fit to the reconstructed 3D points by deformation of general 3D model and interpolation of the features instead of 3D surface reconstruction only from 3D features. Main steps of our algorithm are as follows:

Step 1. Select the human head image sequence I0 ... Iv captured by a still camera. It is required that the head motion be relatively small: with one head turning to each side. Alternatively the user can simply turn his/her head from left all the way to the right, or vice versa.

Step 2. Calculate the intrinsic parameters of the camera by a calibration panel.

Step 3. Select two approximately frontal views with head motion of 10-15 degrees I0 and I1, and select the corners and match the corresponding corners between the images, then adjust the position in the neighbor- hood and get accurate matches.

Step 4. Calculate the extrinsic parameters of the camera, and get the structure of human face by defor- mation of general 3D model and interpolation of the features. Repeat Step 3 and Step 4 to refine the struc- ture.

Step 5. Generate a human face model with texture by Io...I~.

2.2 C a m e r a C a l i b r a t i o n

Let P(Xw,Yw,Z~) be a 3D point in a world coordinate system, and its coordinate is (X, 1I, Z) in a camera coordinate system and its retinal im- age coordinate is re(u, v) in an image coordinate system. A camera is described by the widely used pinhole model, and the relation between P and rn can be denoted by:

[i] [ ouo i]Eo Z ~ a v V 0 ~--

0 I

M Y~ . ( i )

Rong-Hua Liang et al.: New Algorithm for 3D Facial Model Reconstruction 503

The matr ix of camera intrinsic parameters is de- ~

noted by A = ay , where a~ = f /d~, 0

ay = f /dy, f is the focal length, d~ and dy are the pixel sizes in X direction and Y direction, respec- tively. R is the extrinsic parameter , so is t. If (~, !)) is the normalized image coordinate of the pixel co- ordinate (x, y) (see (5)), the non-linear model equa- t ion considers image distort ion as follows:

= + + + +

: y q- y[/ct(x 2 -4- y2) .+_ ]r (x2 -4- y2)2]

Step 2. Take a few images of the model plane under different orientations.

Step 3. Detect the feature points in the images. Step 4. Estimate the four intrinsic parameters and

all the extrinsic parameters using the closed-form solu- tion.

Step 5. Estimate the coefficients of the radial dis- tortion by solving the linear least-squares.

Step 6. Refine all parameters by minimizing the non-linear model.

A 15-second video clip with a resolution of 352 x 288 under different orientat ions captured by a still camera is input as cal ibrat ion picture. F ig . l (b) shows the procedure to obta in the extrinsic param- eters. Due to the white noise of input image, Guas- sian function is used to est imate the noise and pixel error is shown in Section 3.

(~)

Extrinsic parameters

100-~ - . " . . ....... ::: ..... . J ! �9 . . - / z 1,500

--200--100 0 100 0

(b)

Fig.1. (a) Calibration panel, the size of each square is 2.55cm x 2.55cm. (b) Sketch map of extrinsic parameters for camera calibration.

A pa t t e rn is printed with black squares and white squares. Compared with the pa t t e rn in [18], with the same size of black squares and the same distance, our approach can more easily find the cor- ners of black squares. The main steps of the camera cal ibrat ion are as follows.

Step 1. Print a pattern and attach it to a planar surface (see Fig.l(a)).

2 .3 C o r n e r M a t c h i n g

The key to reconstruct 3D human facial model is to get the corresponding corner matches between two images, but a general resolution is not avail- able up to now. %Ve present a new corner matching me thod based on image P L K algori thm [19'2~

Let u ( x , y) be a pixel in an image I(x, y), and its neighbor field is denoted by 5~,(v,w~,wy) = {v(p,q)lI[p- x[I <~ w~, l i P - Yl[ ~< wy}. The gradient mat r ix of the neighbor field of the pixel v(p~,py) is

(2)

where Is and Iy are the derivatives in X and Y directions, respectively.

Let I(x, y) and J(x, y) be the two correspond- Lag images (see Fig.2(a)), the algori thm aims to search for a pixel u in I(x, y) and its correspond- ing pixel v in J(x, y). We denote dispari ty vector as d = v - u , which is the same for its neighbor field 5~(v, w~, wy). The aim of corner match ing is to search for d which minimizes:

Z (Z(x,y)-

J(x + d~, y + dy)) 2. (3)

We define the pyramid representat ion of a generic image I ~ I1, . . . , I L. Let I(x,y) be the "0-th" level image, then (3) can be modified as:

504

(a)

(b)

(c)

Fig.2. Two images and some resul ts of corner m a t c h i n g and

error d ispar i ty vector correction. (a) Two source images

I(x,y) and J(x, y). (b) Resu l t s of corner match ing . T h e se-

rial n u m b e r s denote t he corresponding po in t s and the lines

deno te t he d i spar i ty vectors o f head motion. (c) Resu l t s o f

er ror d i spa r i ty vector correct ion for the images shown in (b).

-2+~z - ~ + ~ L(dL ) L L L

Z - ~ = ~ - ~ , y = ~ - ~

jL (x + gL +d~:,y+gyL L + dL))2 (4)

where gL L L T d L dL] T a r e the and = [ 4 , L-th pyramid optical flow and motion vector, and their relation is denoted by gL-t = 2(gL + alL).

The algorithm can be described as the following steps.

Step 1. Build pyramid sequence representations of I and J: I x and j x , where X = 0 , . . . , L .

Step 2. Initialize pyramidal guess : gLm = [0 0] T. Step 3. Compute gradient matrix G for each pixel

in image I(x, y), keeping only the useful pixels. Step 4. Find initial pixel u among the useful pix-

els and compute its gradient matrix in each pyramid sequence.

Step 5. Get the value of d L iteratively which is the initial value for the next level in pyramid sequence till I(x,y), then compute the disparity vector d and get the corresponding points in J(x, y).

Step 6. Decide the useful pLxels.

J. Comput. Sci. & Technol., July 2004, Voi.19, No.4

Step 7. Repeat Steps 4, 5, and 6 to generate all matches in I(x, y) and J(x, y).

Results are shown in Fig.2(b).

2.4 D i s p a r i t y C o r r e c t i o n

We can see that some erroneous matches exist in Fig.2(b), which needs to be corrected. A general al- gorithm named "local adjustment" is employed [21].

If the X coordinate values of pixels u and v sat- isfy X(u) > X(v) and the corresponding pixels u ' and v I in J(x, y) are the same, we consider u and v are sequential matches or cross matches. The key component of adjusting the position is to min- imize the number of cross matches. We conclude the algorithm as follows:

Step 1. Compute the pixel u(k,i) which has the maximum cross match number in the k-th line in im- age I(z, y).

Step 2. Make certain the scope of new match for u(k,i).

Step 3. Compute a new match u' for u(k,i) which can reduce the cross number, then get the value of the new cross number for u(k,i). Repeat it till the cross number is equal to 0.

The results of corner correction are shown in Fig.2(c).

2.5 M o t i o n A n a l y s i s a n d F e a t u r e P o i n t 3D R e c o n s t r u c t i o n

Define (x, y) as the normalization coordinate for pixel re(u, v) in I(x, y) if

= (5)

where A is the intrinsic matrix of camera. Let (x, y) and (x', y') be the normalization co-

ordinates of corresponding matches, then epipolar constraint is:

where

(x', y', 1 ) E ( z , y, t ) T = 0 (6)

(7)

(R, t ) is the 3D

E = [t]• R

and [t]x = t~ 0 - t ~ . - t y t~ 0

rigid transformation (rotation R and translation t).

The motion of camera is assumed to be rigid motion and translation occurs after rotation from

Rong-Hua Liang et al.: New Algorithm/or 3D Facial Model Reconstruction 505

the origin, then the motion is considered unique. The problem of structure from motion is resolved based on Zhang's work [221 as follows.

Step 1. Estimate the essential parameters with the 8-point algorithm (see (5) and (6)).

Step 2. Compute R, t according to (7). Step 3. Refine the parameters of E and R and t

by minimizing the sum of squared distances between points and their epipolar lines.

Step 4. Reconstruct the corresponding 3D points for corner points in the image by using (1).

2.6 C r e a t i o n o f 3D I n d i v i d u a l M o d e l

According to the results in the previous section, let u i represent the transformation of feature point pi. The individual model can be obtained by de-

formation of general 3D model and interpolation of the features. The following equation is solved:

f ( p ) = ~-~ cia~(llp - Fl[) + M p + t. (8) i

- - r Here <I)(r) = e ~-~, M and t represent 3 • 3 projec- tive matr ix and 3 • 1 vector respectively. Calculate coefficient c i and M and t. A linear system (see (8)) submit ted to (10) also has to be solved:

{ u i = f (p i )

E C i - - 0

i

E ci �9 p iT --~ 0

i

(9)

1 1 K 1 0 0 0 pO p~ K p~-~ o o o

p0y ply K pN-1 0 0 0

po p~ K ;~-~ o o o ~o ~o K r pO po po

K K K K K K K N-1 p~- i Iv-1 py-1

O" c o

o c~ 0 M

0 c~ -1

1 M ~

1 M~

g M3

1. t~

c o c o

1 1 Cy C z

M M N--1 cN- -1 Cy

M o M o

M~ M~

M~ M~

ty G

0

0 0 0

= 0 Uz

1 ~ z

M N - 1 Uz

0

0 0 0 0

Uy

I Uy

M q l N - - 1

0

0 0 0 0

Uz i

Uz

M N - - 1

(10)

The 3D individual model is assured to have min- imal energy value and the efficiency is improved after the linear system is solved.

The model can be refined by more images using the same method (see Figs.3(a) and 3(b)).

(a) (b)

(c) (d)

Fig.3. Results of 3D textured model. (a) Wireframe view. (b) Shadow view. (c) Textured view (Frontal view). (d) View of rotation.

2.7 T e x t u r e M a p p i n g

Let I 0 . . . I v be the video sequences, and the transformation b e t w e e n / / - 1 and Ii is denoted by 3 • 3 rotation matrix Ri and 3 • 1 translation vec- tor t~. The generated model is represented by M o and its triangulation representation is To .. �9 T,~.

The projections of T i in I 0 . . . I v are denoted by T p i 0 . . . T p i v (using (1)). First we compute the normal vector of R(T~) + ti. If x is equal to a rg(max(n i . (0, 0, 1)), we will get the texture image

by Tpi~. Repeat the steps mentioned above, M1 the triangles of M 0 will be mapped. Results are shown in Fig.3.

2.8 E y e s M o d e l i n g

Eyes modeling is to generate realistic-looking 3D

506

face model. Each 3D model of two eyes is rep- resented as two cycles and triangles surrounding the cycles. Eye model is denoted by E M , and E M = {R1 ,R2 ,T r} , where R1 = r 2, //2 = r22, and rl < r2. //1 and R2 consist of 12 triangles each. T r consists of 36 triangles surrounding R2 and connects eye sockets.

Experimental results with and without eyes modeling are shown in Fig.4.

J. Comput. Sci. & Technol., July 2004, Vol.19, No.4

and rotation matrix and translation vector between images shown in Fig.2 are as follows:

0.995115 0.093532 -0.0316021] R = -0.093144 0.995561 0.013533 | ,

0.032728 -0.010524 0.999409 J

t = [0.057626 0.004568 0.998328] T.

We input 120 images of one human head with slight displacement between them, and the statis- tical results in Table 1 and Table 2 are obtained with the algorithms described in Subsections 2.3 and 2.4.

Fig.4. Results of comparison of 3D reconstruction, the left

picture is 3D model without eyes modeling, and the right picture with eyes modeling.

3 Experimental Results and Algorithm Analysis

The images in our experiment are captured us- ing a still camera, and the resolution is 352 x 288. It is required that the head motion be relatively small: with one head turning to each side. Alter- natively the user can simply turn his/her head from left all the way to the right, or vice versa. With- out loss of generality, we assume the first camera coordinate system to be a world coordinate sys- tem, and the other camera coordinate system can be obtained by rigid transformation (first rotation, then translation). The experimental system is im- plemented with Visual C + + 6.0 under Windows 2000. No special lighting equipment or background was used. According to our algorithm mentioned above the value of camera intrinsic matrix is

594.8074951 0.0 191.8550568 ] A -- 0.0 633.9547119 142.2712708| .

o.o o.o 1.o j

The pixel error of input image between the ideal pixel and actual pixel, which can be adopted to the head images, is calculated as

e r r = [0.31607 0.37983].

The essential matrix of camera is

0.086313 -0.997752 -0.008331 ] E = 0.987707 0.100840 -0.0887908 J -0.009496 0.057131 0.000949

Table 1. Accuracy of Corner Matching of the Algori thm in Subsection 2.3. wx x wy is

the size of neighbor field window. Wx X "tlJy

Rotat ion angle 3 x 3 5 x 5 ? x 7 9 x 9 5 ~ ~ 7 ~ 48% 75% 62% 40%

10 ~ ~ 12 ~ 42% 60% 78% 55% 15 ~ ~ 17 ~ 37% 71% 70% 45%

20 ~ ~ 23 ~ 31% 44% 60% 52%

Table 2. The Accuracy of Corner Matching of Error Correction Algor i thm in Subsection 2.4.

w ~ X W y

Rotat ion angle 3 x 3 5 x 5 7 x 7 9 x 9 5 ~ ~ 7 ~ 78% 95% 92% 70%

I0 ~ ,,- 12 ~ 62% 90% 98% 75% 15 ~ ~ 17 ~ 67% 91% 90% 85%

20 ~ ,,, 23 ~ 51% 74% 80% 72%

The highest accuracy is obtained when window size is 7 x 7 and rotation angle is about 10 ~ ~ 12 ~ The reasons can be the following two aspects:

1) When rotation angle is about 10 ~ ~ 12 ~ most of the corners are visible and easily tracked. If rotation angle is too large, occlusion problem is difficult to resolve in computer vision. If rotation angle is too small, slight displacement between two images is difficult to calculate.

2) According to the algorithms mentioned in Subsections 2.3 and 2.4, window is used to match the corners. Due to the number of corners (in our system, the number is 66), when window's size is 7 • 7, it can obtain the approximate matching.

Therefore, the algorithm can be used in other fields (e.g., body motion tracking and recognition).

Due to the deformation of general 3D model and interpolation of the features, our approach can acquire more photo-realistic model than those ge- nerated by previous methods, Fig.5 is the results

Rong-Hua Liang et al.: New Algorithm for 3D Facial Model Reconstruction 507

of interpolation of the features only, and it also demonstrates the high vividness of our algorithm.

little interaction. 3) The limitation of our approach is that 3D

model acquired is not the whole human head, for face geometry is currently determined from only two views.

Fig.5. Result of 3D reconstruction by interpolation of the features only, the left picture is rotation shadow model and the right is frontal view.

%Ve have used our algorithm to construct face models for various people. Fig.6 shows side-by-side comparisons of two reconstructed models with the reat images. In all these examples, the video se- quences were taken using ordinary video camera in office room. No special lighting equipment or back- ground was used. After data-capturing and corner selection, the computations take 1 to 2 minutes to generate the synthetic textured head. Most of the t ime is spent on tracking the video sequences. Ex- perimental results also show the high robustness of our algorithm.

4 A p p l i c a t i o n s in V R

Good experimental results obtained from our al- gori thm encourage us to apply it to the field of VR in the following aspects:

Virtual "showman". Virtual "showman" is to create animated 3D characters and "speak" on the Internet. With our algorithm, animated 3D models are created by deforming the parameter- ized model after reconstructing his/her facial model from his/her images (see Fig.Z).

(a) (b)

Fig.6. Comparison of the original images with the recon- structed models of two persons.

In conclusion, compared with the previous work, the merits and the limitations of our algo- r i thm can be given as follows.

1) Our method adopts different strategies for corner tracking on face, disparity detecting and tex- ture mapping.

2) Our algorithm has high robustness and high vividness to reconstruct human face model in one to two minutes with camera and PC and requires

(c) (d)

Fig.7. Animation demos. (a) Happiness (Fig.4). (b) Sur- prise (Fig.4). (c) Smile (lower row in Fig.6). (d) Fear (lower row in Fig.6).

The architecture of virtual "showman" is given in Fig.8. First, language streams are transferred to the client and parsed into words then "speaking" is driven by words and the virtual facial expression can also be got. The expression of the model can be realized by deforming the model according to what the virtual "showman" is talking about.

Video conferencing. Unlike virtual "showman", video conferencing usually requires more vividness and animation is driven not by voice but by real-

508

t ime image. Now video conferencing is mainly de- veloped by finishing image compression, file trans- mission and image extraction in real t ime to en- sure the vividness. Usiflg two cameras at the same t ime to track the corners mentioned in Subsections 2.3 and 2.4 and to reconstruct the facial model to animate face, the video conferencing can be imple- mented more easily and efficiently.

Server

[ "Speak" server [

Language streams

Client

Characters receiver

Words parser

Deformation of model

Animation model rendering

Fig.8. Architecture of virtual "showman".

Online chatting. Nor.mally online chatting only includes voice. Using our modeling and anima- tion method, online chatting can contain audio and video information so that the chat will be more at- tractive to people who are longing for "real" chat not the out-of-date "typing" chat.

Online virtual games. Virtual games, close to the real world, support multi-players from all over the world through Internet. The difficulty in trans- ferring huge image files has been the headache for game designers. With the help of our system, it is only required to transfer much less information to create the virtual world for the game, including virtual characters. I t is not a dream any more for a player to create a character in the game that is not only following the player's instruction but also looks like himself/herself.

5 C o n c l u s i o n s a n d F u t u r e W o r k

In this paper, a robust algorithm for real- looking human face model reconstruction is pre- sented based on stereo vision technology. Accu- rate feature matches are obtained with modified image P L K algorithm and local adjustment algo- ri thm. And a textured face model is generated after camera calibration, motion analysis and tex- ture mapping. In addition, the algorithms of cor- ner matching and error correction can be applied to other fields such as body motion, corner matching, face recognition, etc.

Experimental results show that our algorithm is able to generate face models for people of differ-

J. Comput. Sci. & Technol., July 2004, Vo1.19, No.4

ent races, of different ages, and with different skin colors. Compared with the previous method for 3D facial model reconstruction, our algorithm has high robustness and high vividness to reconstruct hu- man face model in one to two minutes with camera and PC and requires little interaction. Such algo- r i thm can be potentially used by ordinary users at home to make their own face models. Some appli- cation examples in the field of VR are given, such as virtual showman, online chatting, video confer- ence.

Future work includes the following aspects: 1) face geometry determined from multi-view,

instead of current two-view, will be studied to gen- erate the whole head;

2) more photo-realistic face model, including automat ic generation of eyeballs and eye texture maps, as well as accurate incorporation of hair, teeth, and tongue, will be developed;

3) the current face mesh is very sparse. We are investigating techniques to increase the mesh resolution by using higher resolution face metrics or prototypes. Another possibility is to compute a displacement map-for each triangle using color information.

R e f e r e n c e s

[1] Lee Y C, Terzopoulos D, Waters K. Realistic modeling for facial animation. In Proc. ACM SIGGRAPH 95, Los Angeles, USA, 1995, pp.56-62.

[2] DeCarlo D, Metaxas D, Stone M. An anthropometric face model using variational techniques. In Proc. ACM SIGGRAPH 98, Orlando, USA, 1998, pp.67-74.

[3] Parke F I, Waters K. Computer Facial Animation. AK Peters, 1996.

[4] Hong P. An integrated framework for face modeling, fa- cial motion analysis and synthesis [Dissertation]. Uni- versity of Illinois at Urbana Champaign, 2001.

[5] Fun P, Leclerc Y G. Object-centered surface reconstruc- tion: Combining multi-image stereo and shading. Int. J. Computer Vision, 1995, 16(1): 35-56.

[6] Samaras D, Metaxas D. Incorporating illumination con- straints in deformable models. In Proc. the IEEE 1998 Conf. Computer Vision and Pattern Recognition (CVPR), 1998, Santa Barbara, USA, pp.322-329.

[7] Tang L, Huang T S. Analysis-based facial expression synthesis. In 1996 IEEE Int. Conf. Image Process- ing (ICIP), 1996, Lausanne, Switzerland, ICIP-III (94): 98-102.

[8] Wong K, Cipolla R. Structure and motion from Silhou- ettes. In Proc. 8th IEEE Int. Conf. Computer Vision (ICCV), 2001, Vancouver, Canada, pp.217-222.

[9] Pighin F, Hecker J, Lischinski D et al. Synthe- sizing realistic facial expressions from photographs. Computer Graphics, Annual Conference Series, SIG- GRAPH, 1998, pp.75-84.

[10] Horace H S, Yin L. Constructing a 3D individual head model from two orthogonal views. The Visual Com- puter, 1996, 12(5): 254-266.

Rong-Hua Liang et al.: N e w Algor i thm for 3D Facial Model Reconstruct ion 509

[11] Lee W S, Thalmann N M. Head modeling from pictures and morphing in 3D with image metamorphosis based on triangulation. In the Int. Workshop on Modelling and Motion Capture Techniques for Virtual Environ- ments (CAPTECH), Geneva, Switzerland, L N A I 1537, Springer-Verlag, 1998, pp.254-267.

[12] Waters K. A muscle model for animating three- dimensional facial expressions. Computer Graphics, 1987, 21(4): 17-24.

[13] Lawagetto F, Pockaj R. The facial animation engine: To- ward a high-level interface for the design of MPEG-4 compliant animated faces. IEEE Trans. Circuits Syst. Video Technol., 1999, 9(2): 277-289.

[14] Fun P. Face models from uncalibrated video sequences. In the Int. Workshop on Modelling and Mo~ion Capture Techniques for Virtual Environments (CAPTECH), 1998, Geneva, Switzerland, LNAI 1537, Springer- Verlag, pp.214-222.

[15] Fun P. Using model-driven bundle-adjustment to model heads from raw video sequences. In Proc. 7th IEEE Int. Conf. Computer Vision (ICCV), 1999, Kerkyra, Corfu, Greece, pp.46-53.

[16] Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. In Computer Graphics, Annual Conference Series , SIGGRPAH, 1999, pp.187-194.

[17] Liu Z, Zhang Z, Jacobs C et al. Rapid Modeling of Animated Faces from Video. 2000. http://re- search.microsoft.com/research/pubs/trpub.aspx.

[18] Zhang Z. Flexible camera calibration by viewing a plane from unknown orientations. In Proc. 7th IEEE Int. Conf. Computer Vision (ICCV), Kerkyra, Corfu, Greece, 1999, pp.666-673.

[19] Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision, In Proc. Seventh Int. Joint Conference on Artificial Intelligence (IJCAI), 1981, Vancouver, Canada, pp.674-679.

[20] Shi J, Tomasi C. Good feature to track. In Proc. the IEEE I99,~ Conf. Computer Vision and Pattern Recog-

nition (CVPR), 1994, Seattle, USA, pp.593-600. [21] Jia B, Zhang Y, Lin X. General and fast algorithm for

disparity error detection and correction. J. Tsinghua University, 2000, 40(1): 28-31. (in Chinese)

[22] Zhang Z, Deriche R, Faugeras O et al. A robust tech- nique for matching two uncatibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 1995, 78(1-2): 87-119.

R o n g - H u a L iang received his Ph.D. degree from Zhejiang University in 2003. He is now a faculty member in in College of Information and Engineering, Zhejiang University of Technol- ogy. His current research in- terests include computer vision, computer graphics, and artificial intelligence.

Z h i - G e n g P a n received his Ph.D. degree from Zhejiang University in 1993. He is now a professor of State Key Lab of CAD&CG, Zhejiang University. He is also a Ph.D. supervisor of College of Computer Science, Zhejiang University. His current research interests in- clude computer vision, computer graphics, and VR.

C h u n C h e n received his Ph.D. degree from Zhe- jiang University in 1990. He is now the dean and a professor of College of Computer Science, Zhejiang University. He is also a Ph.D. supervisor of College of Computer Science, Zhejiang University. His current research interests include computer vision, image pro- cessing, and CSCW.