facial animation by: shahzad malik csc2529 presentation march 5, 2003

Facial AnimationFacial Animation

By: Shahzad MalikBy: Shahzad Malik

CSC2529 PresentationCSC2529 Presentation

March 5, 2003March 5, 2003

MotivationMotivation

Realistic human facial animation is a Realistic human facial animation is a challenging problem (DOF, challenging problem (DOF, deformations)deformations)

Would like expressive and plausible Would like expressive and plausible animations of a photorealistic 3D animations of a photorealistic 3D faceface

Useful for virtual characters in film, Useful for virtual characters in film, video gamesvideo games

PapersPapers

Three major areas are Lip Syncing, Three major areas are Lip Syncing, Face Modeling, and Expression Face Modeling, and Expression SynthesisSynthesis Video Rewrite (Bregler – SIGGRAPH 1997)Video Rewrite (Bregler – SIGGRAPH 1997) Making Faces (Guenter – SIGGRAPH 1998) Making Faces (Guenter – SIGGRAPH 1998) Expression Cloning (Noh – SIGGRAPH Expression Cloning (Noh – SIGGRAPH

2001)2001)

Video RewriteVideo Rewrite

Generate a new video of an actor Generate a new video of an actor mouthing a new utterance by piecing mouthing a new utterance by piecing together old footagetogether old footage

Two stages:Two stages: AnalysisAnalysis SynthesisSynthesis

Analysis StageAnalysis Stage

Given footage of the subject Given footage of the subject speaking, extract mouth position and speaking, extract mouth position and lip shapelip shape

Hand label 26 training images:Hand label 26 training images: 34 points on mouth (20 outer boundary, 34 points on mouth (20 outer boundary,

12 inner boundary, 1 at bottom of upper 12 inner boundary, 1 at bottom of upper teeth, 1 at top of lower teeth)teeth, 1 at top of lower teeth)

20 points on chin and jaw line20 points on chin and jaw line Morph training set to get to 351 imagesMorph training set to get to 351 images

EigenPointsEigenPoints

Create EigenPoint models using this setCreate EigenPoint models using this set Use derived EigenPoints model to label Use derived EigenPoints model to label

features in all frames of the training videofeatures in all frames of the training video

EigenPoints (continued)EigenPoints (continued)

Problem:Problem: EigenPoints assumes EigenPoints assumes features are undergoing pure features are undergoing pure translationtranslation

Face WarpingFace Warping

Before EigenPoints labeling, warp Before EigenPoints labeling, warp each image into a reference planeeach image into a reference plane

Use a minimization algorithm to Use a minimization algorithm to register imagesregister images

i

iFiT IIME 2)x()'x()(

1x'x

876

543

210

yx

mmmmmmmmm

M

Face Warping (continued)Face Warping (continued)

Use rigid parts of face to estimate warp Use rigid parts of face to estimate warp MM

Warp face by Warp face by MM-1-1

Perform eigenpoint analysisPerform eigenpoint analysis Back-project features by Back-project features by MM onto face onto face

Audio AnalysisAudio Analysis

Want to capture visual dynamics of Want to capture visual dynamics of speechspeech

Phonemes are not enoughPhonemes are not enough Consider Consider coarticulationcoarticulation Lip shapes for many phonemes are Lip shapes for many phonemes are

modified based on phoneme’s context modified based on phoneme’s context (e.g. /T/ in “beet” vs. /T/ in “boot”)(e.g. /T/ in “beet” vs. /T/ in “boot”)

Audio Analysis (continued)Audio Analysis (continued)

Segment speech into triphonesSegment speech into triphones e.g. “teapot” becomes /SIL-T-IY/, /T-e.g. “teapot” becomes /SIL-T-IY/, /T-

IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-SIL/)SIL/)

Emphasize middle of each triphoneEmphasize middle of each triphone Effectively captures forward and Effectively captures forward and

backward coarticulationbackward coarticulation

Audio Analysis (continued)Audio Analysis (continued) Training footage audio is labeled with Training footage audio is labeled with

phonemes and associated timingphonemes and associated timing Use gender-specific HMMs for segmentationUse gender-specific HMMs for segmentation Convert transcript into triphonesConvert transcript into triphones

Synthesis StageSynthesis Stage Given some new speech utteranceGiven some new speech utterance

Mark it with phoneme labelsMark it with phoneme labels Determine triphonesDetermine triphones Find a video example with the Find a video example with the desired desired

transitiontransition in database in database Compute a matching distance to Compute a matching distance to

each triphone:each triphone:

error = αDp + (1- α)Ds

Viseme ClassesViseme Classes Cluster phonemes into Cluster phonemes into visemeviseme classes classes Use 26 viseme classes (10 consonant, Use 26 viseme classes (10 consonant,

15 vowel):15 vowel):

(1) /CH/, /JH/, /SH/, /ZH/(1) /CH/, /JH/, /SH/, /ZH/

(2) /K/, /G/, /N/, /L/(2) /K/, /G/, /N/, /L/

……

(25) /IH/, /AE/, /AH/(25) /IH/, /AE/, /AH/

(26) /SIL/(26) /SIL/

Phoneme Context DistancePhoneme Context Distance Dp is phoneme context distance

Distance is 0 if phonemic categories are the same (e.g. /P/ and /P/)

Distance is 1 if viseme classes are different (e.g. /P/ and /IY/)

Distance is between 0 and 1 if different phonemic classes but same viseme class (e.g. /P/ and /B/)

Compute for the entire triphone Weight the center phoneme most

Lip Shape DistanceLip Shape Distance

Ds is distance between lip shapes in overlapping triphones Eg. for “teapot”, contours for /IY/ and /P/

should match between /T-IY-P/ and /IY-P-AA/

Compute Euclidean distance between 4-element vectors (lip width, lip height, inner lip height, height of visible teeth)

Solution depends on neighbors in both directions (use DP)

Time Alignment of Triphone Time Alignment of Triphone VideosVideos

Need to combine triphone videosNeed to combine triphone videos Choose portion of overlapping Choose portion of overlapping

triphones where lip shapes are close triphones where lip shapes are close as possibleas possible

Already done when computing DsAlready done when computing Ds

Time Alignment to Time Alignment to UtteranceUtterance

Still need to time align with target audioStill need to time align with target audio Compare corresponding phoneme Compare corresponding phoneme

transcriptstranscripts Start time of center phoneme in triphone is Start time of center phoneme in triphone is

aligned with label in target transcriptaligned with label in target transcript Video is then stretched/compressed to fit Video is then stretched/compressed to fit

time needed between target phoneme time needed between target phoneme boundariesboundaries

Combining Lips and Combining Lips and BackgroundBackground

Need to stitch new mouth movie into Need to stitch new mouth movie into background original face sequencebackground original face sequence

Compute transform Compute transform MM as before as before Warping replacement mask defines Warping replacement mask defines

mouth and background portions in final mouth and background portions in final videovideo

Mouth mask Background mask

Combining Lips and Combining Lips and BackgroundBackground

Mouth shape comes from triphone Mouth shape comes from triphone image, and is warped using image, and is warped using MM

Jaw shape is combination of background Jaw shape is combination of background jaw and triphone jaw linesjaw and triphone jaw lines

Near ears, jaw dependent on Near ears, jaw dependent on background, near chin, jaw depends on background, near chin, jaw depends on mouthmouth

Illumination matching is used to avoid Illumination matching is used to avoid seams mouth and backgroundseams mouth and background

Video Rewrite ResultsVideo Rewrite Results Video: 8 minutes of video, 109 sentencesVideo: 8 minutes of video, 109 sentences Training Data: front-facing segments of Training Data: front-facing segments of

video, around 1700 triphonesvideo, around 1700 triphones

“Emily” sequences

Video Rewrite ResultsVideo Rewrite Results

2 minutes of video, 1157 triphones2 minutes of video, 1157 triphones

JFK sequences

Video RewriteVideo Rewrite

Image-based facial animation systemImage-based facial animation system Driven by audioDriven by audio Output sequence created from real Output sequence created from real

videovideo Allows natural facial movements (eye Allows natural facial movements (eye

blinks, head motions)blinks, head motions)

Making FacesMaking Faces

Allows capturing facial expressions in Allows capturing facial expressions in 3D from a video sequence3D from a video sequence

Provides a 3D model and texture that Provides a 3D model and texture that can be rendered on 3D hardwarecan be rendered on 3D hardware

Data CaptureData Capture

Actor’s face digitized using a Cyberware Actor’s face digitized using a Cyberware scanner to get a base 3D meshscanner to get a base 3D mesh

Six calibrated video cameras capture Six calibrated video cameras capture actor’s expressionsactor’s expressions

Six camera views

Data CaptureData Capture

182 dots are glued to actor’s face182 dots are glued to actor’s face Each dot is one of six colors with Each dot is one of six colors with

fluorescent pigmentfluorescent pigment Dots of same color are placed as far Dots of same color are placed as far

apart as possibleapart as possible Dots follow the contours of the face Dots follow the contours of the face

(eyes, lips, nasio-labial furrows, etc.)(eyes, lips, nasio-labial furrows, etc.)

Dot LabelingDot Labeling Each dot needs a unique labelEach dot needs a unique label Dots will be used to warp the 3D Dots will be used to warp the 3D

meshmesh Also used later for texture generation Also used later for texture generation

from the six viewsfrom the six views For each frame in each camera:For each frame in each camera:

Classify each pixel as belonging to one Classify each pixel as belonging to one of six categoriesof six categories

Find connected componentsFind connected components Compute the centroidCompute the centroid

Dot Labeling (continued)Dot Labeling (continued)

Need to compute dot correspondences Need to compute dot correspondences between camera viewsbetween camera views

Must handle occlusions, false matchesMust handle occlusions, false matches Compute all point correspondences Compute all point correspondences

between between kk cameras and cameras and nn 2D dots 2D dots

2

2 nk

point

correspondences


For each correspondenceFor each correspondence Triangulate a 3D point based on closest Triangulate a 3D point based on closest

intersection of rays cast through 2D dotsintersection of rays cast through 2D dots Check if back-projection is above some Check if back-projection is above some

thresholdthreshold All 3D candidates below threshold are All 3D candidates below threshold are

storedstored

Dot Labeling (continued)Dot Labeling (continued) Project stored 3D points into a Project stored 3D points into a

reference viewreference view Keep points that are within 2 pixels Keep points that are within 2 pixels

of dots in reference viewof dots in reference view These points are potential 3D These points are potential 3D

matches for a given 2D dotmatches for a given 2D dot Compute average as final 3D positionCompute average as final 3D position Assign to 2D dot in reference viewAssign to 2D dot in reference view


Need to assign consistent labels to 3D dot Need to assign consistent labels to 3D dot locations across entire sequencelocations across entire sequence

Define a reference set of dots Define a reference set of dots DD (frame 0) (frame 0) Let Let ddjj ЄЄ DD be the neutral location for dot be the neutral location for dot jj Position of Position of ddjj at frame at frame ii is is ddjj

ii=d=djj+v+vjjii

For each reference dot, find the closest 3D For each reference dot, find the closest 3D dot of same color within some distance dot of same color within some distance εε

Moving the DotsMoving the Dots

Move reference dot to matched Move reference dot to matched locationlocation

For unmatched reference dot For unmatched reference dot ddkk, let , let nnkk be the set of neighbor dots with be the set of neighbor dots with match in current frame match in current frame ii

k

ij nd

ij

k

ik v

nv

1

Constructing the MeshConstructing the Mesh

Cyberware scan has problems:Cyberware scan has problems: Fluorescent markers cause bumps on meshFluorescent markers cause bumps on mesh No mouth openingNo mouth opening Too many polygonsToo many polygons

Bumps removed manuallyBumps removed manually Split mouth polygons, add teeth and Split mouth polygons, add teeth and

tongue polygonstongue polygons Run mesh simplification algorithm Run mesh simplification algorithm

(Hoppe’s algorithm: 460k to 4800 polys)(Hoppe’s algorithm: 460k to 4800 polys)

Moving the MeshMoving the Mesh

Move vertices by linear combination Move vertices by linear combination of offsets of nearest dotsof offsets of nearest dots

k

kik

jkj

ij ddpp

Dd

jk

k

1where

Assigning Blend CoefficientsAssigning Blend Coefficients

Assign blend coefficients for a grid of Assign blend coefficients for a grid of 1400 evenly distributed points on face1400 evenly distributed points on face

Assigning Blend CoefficientsAssigning Blend Coefficients

Label each dot, vertex, grid point as Label each dot, vertex, grid point as aboveabove, , belowbelow, or , or neitherneither

Find 2 closest dots to each grid point Find 2 closest dots to each grid point pp DDnn is set of dots within 1.8(d is set of dots within 1.8(d11+d+d22)/2 of )/2 of pp Remove points in relatively same Remove points in relatively same

directiondirection Assign blend values based on distance Assign blend values based on distance

from from pp

Assign Blend Coefficients Assign Blend Coefficients (cont.)(cont.)

If dot not in Dn, then a is 0If dot not in Dn, then a is 0 If dot in Dn:If dot in Dn:

ni Ddi

ii

ii l

l

pdl then ,

0.1let

For vertices, find closest grid pointsFor vertices, find closest grid points Copy blend coefficientsCopy blend coefficients

Dot RemovalDot Removal Substitute skin color for dot colorsSubstitute skin color for dot colors First low-pass filter the imageFirst low-pass filter the image Directional filter prevents color bleedingDirectional filter prevents color bleeding Black=symmetric, White=directionalBlack=symmetric, White=directional

Face with dots Low-pass mask

Dot Removal (continued)Dot Removal (continued)

Extract rectangular patch of dot-free Extract rectangular patch of dot-free skinskin

High-pass filter this patchHigh-pass filter this patch Register patch to center of dot Register patch to center of dot

regionsregions Blend it with the low-frequency skinBlend it with the low-frequency skin Clamp hue values to narrow rangeClamp hue values to narrow range

Dot Removal (continued)Dot Removal (continued)

Original Low-pass

High-pass Hue clamped

Texture GenerationTexture Generation

Texture map generated for each frameTexture map generated for each frame Project mesh onto cylinderProject mesh onto cylinder Compute mesh location (Compute mesh location (kk, , ββ11, , ββ22) for each ) for each

texel (u,v)texel (u,v) For each camera, transform mesh into viewFor each camera, transform mesh into view For each texel, get 3D coordinates in meshFor each texel, get 3D coordinates in mesh Project 3D point to camera planeProject 3D point to camera plane Get color at (x,y) and store as texel colorGet color at (x,y) and store as texel color

Texture Generation Texture Generation (continued)(continued)

Compute a texel weight (dot product Compute a texel weight (dot product between texel normal on mesh and between texel normal on mesh and direction to camera)direction to camera)

Merge the texture maps from all the Merge the texture maps from all the cameras based on the weight map cameras based on the weight map

ResultsResults

Making Faces SummaryMaking Faces Summary

3D geometry and texture of a face3D geometry and texture of a face Data generated for each frame of videoData generated for each frame of video Shading and highlights “glued” to faceShading and highlights “glued” to face Nice results, but not totally automatedNice results, but not totally automated Need to repeat entire process for every Need to repeat entire process for every

new face we want to animatenew face we want to animate

Expression CloningExpression Cloning

Allows facial expressions to be Allows facial expressions to be mapped from one model to anothermapped from one model to another

Source model

Animation

Target model

Expression Cloning OutlineExpression Cloning Outline

Motion capture data or any animation

mechanism

DeformDense surface

correspondences

Vertex displacements

Cloned expressions

Motion transfer

Source model Target model

Source animation Target animation

Source Animation CreationSource Animation Creation

Use any existing facial animation methodUse any existing facial animation method Eg. “Making Faces” paper described earlierEg. “Making Faces” paper described earlier

Motion capture data

Source model Source animation

Dense Surface Dense Surface CorrespondenceCorrespondence

Manually select 15-35 correspondencesManually select 15-35 correspondences Morph the source model using RBFsMorph the source model using RBFs Perform cylindrical projection (ray through Perform cylindrical projection (ray through

source vertex, into target mesh)source vertex, into target mesh) Compute Barycentric coords of intersectionCompute Barycentric coords of intersection

After RBFInitial Features After projection

Automatic Feature SelectionAutomatic Feature Selection

Can also automate initial Can also automate initial correspondencescorrespondences

Use basic facts about human Use basic facts about human geometry:geometry: Tip of Nose = point with highest Z valueTip of Nose = point with highest Z value Top of Head = point with highest Y valueTop of Head = point with highest Y value Currently use around 15 such heuristic Currently use around 15 such heuristic

rulesrules

Example DeformationsExample Deformations

Source Deformed source Target

Closely approximates the target models

Animation with Motion Animation with Motion VectorsVectors

Animate by displacing target vertex by Animate by displacing target vertex by motion of corresponding source pointmotion of corresponding source point

Interpolate Barycentric coordinates of Interpolate Barycentric coordinates of target vertices based on source verticestarget vertices based on source vertices

Need to project target model onto Need to project target model onto source model (opposite of what we did source model (opposite of what we did before)before)

Motion Vector TransferMotion Vector Transfer

Need to adjust direction and magnitudeNeed to adjust direction and magnitude

Source Target Source Target

Source motion vector

Proper target motion vector

Motion Vector Transfer Motion Vector Transfer (cont.)(cont.)

Attach local coordinate system for each Attach local coordinate system for each vertex in source and deformed sourcevertex in source and deformed source

X-axis = average normalX-axis = average normal Y-axis = proj. of any adjacent edge onto Y-axis = proj. of any adjacent edge onto

plane with X-axis as normalplane with X-axis as normal Z-axis = cross product of X and YZ-axis = cross product of X and Y

Source TargetDeformed source

XY

Z

X

Y

Z

T m

m m M

Motion Vector TransferMotion Vector Transfer

Compute transformation between Compute transformation between two coordinate systemstwo coordinate systems

Mapping determines the deformed Mapping determines the deformed source model motion vectorssource model motion vectors

Example Motion TransferExample Motion Transfer

Adjusted motions

Models Motion vectors

Source

TargetMore horizontal

Smaller

ResultsResults

Angry expression

Distorted mouth

Source Targets

Big open mouth

SummarySummary

EC can animate new models using a EC can animate new models using a library of existing expressionslibrary of existing expressions

Transfers motion vectors from source Transfers motion vectors from source animations to target modelsanimations to target models

Process is fast and can be fully Process is fast and can be fully automatedautomated

facial animation by: shahzad malik csc2529 presentation march 5, 2003

Documents