facial animation by: shahzad malik csc2529 presentation march 5, 2003
TRANSCRIPT
Facial AnimationFacial Animation
By: Shahzad MalikBy: Shahzad Malik
CSC2529 PresentationCSC2529 Presentation
March 5, 2003March 5, 2003
MotivationMotivation
Realistic human facial animation is a Realistic human facial animation is a challenging problem (DOF, challenging problem (DOF, deformations)deformations)
Would like expressive and plausible Would like expressive and plausible animations of a photorealistic 3D animations of a photorealistic 3D faceface
Useful for virtual characters in film, Useful for virtual characters in film, video gamesvideo games
PapersPapers
Three major areas are Lip Syncing, Three major areas are Lip Syncing, Face Modeling, and Expression Face Modeling, and Expression SynthesisSynthesis Video Rewrite (Bregler – SIGGRAPH 1997)Video Rewrite (Bregler – SIGGRAPH 1997) Making Faces (Guenter – SIGGRAPH 1998) Making Faces (Guenter – SIGGRAPH 1998) Expression Cloning (Noh – SIGGRAPH Expression Cloning (Noh – SIGGRAPH
2001)2001)
Video RewriteVideo Rewrite
Generate a new video of an actor Generate a new video of an actor mouthing a new utterance by piecing mouthing a new utterance by piecing together old footagetogether old footage
Two stages:Two stages: AnalysisAnalysis SynthesisSynthesis
Analysis StageAnalysis Stage
Given footage of the subject Given footage of the subject speaking, extract mouth position and speaking, extract mouth position and lip shapelip shape
Hand label 26 training images:Hand label 26 training images: 34 points on mouth (20 outer boundary, 34 points on mouth (20 outer boundary,
12 inner boundary, 1 at bottom of upper 12 inner boundary, 1 at bottom of upper teeth, 1 at top of lower teeth)teeth, 1 at top of lower teeth)
20 points on chin and jaw line20 points on chin and jaw line Morph training set to get to 351 imagesMorph training set to get to 351 images
EigenPointsEigenPoints
Create EigenPoint models using this setCreate EigenPoint models using this set Use derived EigenPoints model to label Use derived EigenPoints model to label
features in all frames of the training videofeatures in all frames of the training video
EigenPoints (continued)EigenPoints (continued)
Problem:Problem: EigenPoints assumes EigenPoints assumes features are undergoing pure features are undergoing pure translationtranslation
Face WarpingFace Warping
Before EigenPoints labeling, warp Before EigenPoints labeling, warp each image into a reference planeeach image into a reference plane
Use a minimization algorithm to Use a minimization algorithm to register imagesregister images
i
iFiT IIME 2)x()'x()(
1x'x
876
543
210
yx
mmmmmmmmm
M
Face Warping (continued)Face Warping (continued)
Use rigid parts of face to estimate warp Use rigid parts of face to estimate warp MM
Warp face by Warp face by MM-1-1
Perform eigenpoint analysisPerform eigenpoint analysis Back-project features by Back-project features by MM onto face onto face
Audio AnalysisAudio Analysis
Want to capture visual dynamics of Want to capture visual dynamics of speechspeech
Phonemes are not enoughPhonemes are not enough Consider Consider coarticulationcoarticulation Lip shapes for many phonemes are Lip shapes for many phonemes are
modified based on phoneme’s context modified based on phoneme’s context (e.g. /T/ in “beet” vs. /T/ in “boot”)(e.g. /T/ in “beet” vs. /T/ in “boot”)
Audio Analysis (continued)Audio Analysis (continued)
Segment speech into triphonesSegment speech into triphones e.g. “teapot” becomes /SIL-T-IY/, /T-e.g. “teapot” becomes /SIL-T-IY/, /T-
IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-SIL/)SIL/)
Emphasize middle of each triphoneEmphasize middle of each triphone Effectively captures forward and Effectively captures forward and
backward coarticulationbackward coarticulation
Audio Analysis (continued)Audio Analysis (continued) Training footage audio is labeled with Training footage audio is labeled with
phonemes and associated timingphonemes and associated timing Use gender-specific HMMs for segmentationUse gender-specific HMMs for segmentation Convert transcript into triphonesConvert transcript into triphones
Synthesis StageSynthesis Stage Given some new speech utteranceGiven some new speech utterance
Mark it with phoneme labelsMark it with phoneme labels Determine triphonesDetermine triphones Find a video example with the Find a video example with the desired desired
transitiontransition in database in database Compute a matching distance to Compute a matching distance to
each triphone:each triphone:
error = αDp + (1- α)Ds
Viseme ClassesViseme Classes Cluster phonemes into Cluster phonemes into visemeviseme classes classes Use 26 viseme classes (10 consonant, Use 26 viseme classes (10 consonant,
15 vowel):15 vowel):
(1) /CH/, /JH/, /SH/, /ZH/(1) /CH/, /JH/, /SH/, /ZH/
(2) /K/, /G/, /N/, /L/(2) /K/, /G/, /N/, /L/
……
(25) /IH/, /AE/, /AH/(25) /IH/, /AE/, /AH/
(26) /SIL/(26) /SIL/
Phoneme Context DistancePhoneme Context Distance Dp is phoneme context distance
Distance is 0 if phonemic categories are the same (e.g. /P/ and /P/)
Distance is 1 if viseme classes are different (e.g. /P/ and /IY/)
Distance is between 0 and 1 if different phonemic classes but same viseme class (e.g. /P/ and /B/)
Compute for the entire triphone Weight the center phoneme most
Lip Shape DistanceLip Shape Distance
Ds is distance between lip shapes in overlapping triphones Eg. for “teapot”, contours for /IY/ and /P/
should match between /T-IY-P/ and /IY-P-AA/
Compute Euclidean distance between 4-element vectors (lip width, lip height, inner lip height, height of visible teeth)
Solution depends on neighbors in both directions (use DP)
Time Alignment of Triphone Time Alignment of Triphone VideosVideos
Need to combine triphone videosNeed to combine triphone videos Choose portion of overlapping Choose portion of overlapping
triphones where lip shapes are close triphones where lip shapes are close as possibleas possible
Already done when computing DsAlready done when computing Ds
Time Alignment to Time Alignment to UtteranceUtterance
Still need to time align with target audioStill need to time align with target audio Compare corresponding phoneme Compare corresponding phoneme
transcriptstranscripts Start time of center phoneme in triphone is Start time of center phoneme in triphone is
aligned with label in target transcriptaligned with label in target transcript Video is then stretched/compressed to fit Video is then stretched/compressed to fit
time needed between target phoneme time needed between target phoneme boundariesboundaries
Combining Lips and Combining Lips and BackgroundBackground
Need to stitch new mouth movie into Need to stitch new mouth movie into background original face sequencebackground original face sequence
Compute transform Compute transform MM as before as before Warping replacement mask defines Warping replacement mask defines
mouth and background portions in final mouth and background portions in final videovideo
Mouth mask Background mask
Combining Lips and Combining Lips and BackgroundBackground
Mouth shape comes from triphone Mouth shape comes from triphone image, and is warped using image, and is warped using MM
Jaw shape is combination of background Jaw shape is combination of background jaw and triphone jaw linesjaw and triphone jaw lines
Near ears, jaw dependent on Near ears, jaw dependent on background, near chin, jaw depends on background, near chin, jaw depends on mouthmouth
Illumination matching is used to avoid Illumination matching is used to avoid seams mouth and backgroundseams mouth and background
Video Rewrite ResultsVideo Rewrite Results Video: 8 minutes of video, 109 sentencesVideo: 8 minutes of video, 109 sentences Training Data: front-facing segments of Training Data: front-facing segments of
video, around 1700 triphonesvideo, around 1700 triphones
“Emily” sequences
Video Rewrite ResultsVideo Rewrite Results
2 minutes of video, 1157 triphones2 minutes of video, 1157 triphones
JFK sequences
Video RewriteVideo Rewrite
Image-based facial animation systemImage-based facial animation system Driven by audioDriven by audio Output sequence created from real Output sequence created from real
videovideo Allows natural facial movements (eye Allows natural facial movements (eye
blinks, head motions)blinks, head motions)
Making FacesMaking Faces
Allows capturing facial expressions in Allows capturing facial expressions in 3D from a video sequence3D from a video sequence
Provides a 3D model and texture that Provides a 3D model and texture that can be rendered on 3D hardwarecan be rendered on 3D hardware
Data CaptureData Capture
Actor’s face digitized using a Cyberware Actor’s face digitized using a Cyberware scanner to get a base 3D meshscanner to get a base 3D mesh
Six calibrated video cameras capture Six calibrated video cameras capture actor’s expressionsactor’s expressions
Six camera views
Data CaptureData Capture
182 dots are glued to actor’s face182 dots are glued to actor’s face Each dot is one of six colors with Each dot is one of six colors with
fluorescent pigmentfluorescent pigment Dots of same color are placed as far Dots of same color are placed as far
apart as possibleapart as possible Dots follow the contours of the face Dots follow the contours of the face
(eyes, lips, nasio-labial furrows, etc.)(eyes, lips, nasio-labial furrows, etc.)
Dot LabelingDot Labeling Each dot needs a unique labelEach dot needs a unique label Dots will be used to warp the 3D Dots will be used to warp the 3D
meshmesh Also used later for texture generation Also used later for texture generation
from the six viewsfrom the six views For each frame in each camera:For each frame in each camera:
Classify each pixel as belonging to one Classify each pixel as belonging to one of six categoriesof six categories
Find connected componentsFind connected components Compute the centroidCompute the centroid
Dot Labeling (continued)Dot Labeling (continued)
Need to compute dot correspondences Need to compute dot correspondences between camera viewsbetween camera views
Must handle occlusions, false matchesMust handle occlusions, false matches Compute all point correspondences Compute all point correspondences
between between kk cameras and cameras and nn 2D dots 2D dots
2
2 nk
point
correspondences
Dot Labeling (continued)Dot Labeling (continued)
For each correspondenceFor each correspondence Triangulate a 3D point based on closest Triangulate a 3D point based on closest
intersection of rays cast through 2D dotsintersection of rays cast through 2D dots Check if back-projection is above some Check if back-projection is above some
thresholdthreshold All 3D candidates below threshold are All 3D candidates below threshold are
storedstored
Dot Labeling (continued)Dot Labeling (continued) Project stored 3D points into a Project stored 3D points into a
reference viewreference view Keep points that are within 2 pixels Keep points that are within 2 pixels
of dots in reference viewof dots in reference view These points are potential 3D These points are potential 3D
matches for a given 2D dotmatches for a given 2D dot Compute average as final 3D positionCompute average as final 3D position Assign to 2D dot in reference viewAssign to 2D dot in reference view
Dot Labeling (continued)Dot Labeling (continued)
Need to assign consistent labels to 3D dot Need to assign consistent labels to 3D dot locations across entire sequencelocations across entire sequence
Define a reference set of dots Define a reference set of dots DD (frame 0) (frame 0) Let Let ddjj ЄЄ DD be the neutral location for dot be the neutral location for dot jj Position of Position of ddjj at frame at frame ii is is ddjj
ii=d=djj+v+vjjii
For each reference dot, find the closest 3D For each reference dot, find the closest 3D dot of same color within some distance dot of same color within some distance εε
Moving the DotsMoving the Dots
Move reference dot to matched Move reference dot to matched locationlocation
For unmatched reference dot For unmatched reference dot ddkk, let , let nnkk be the set of neighbor dots with be the set of neighbor dots with match in current frame match in current frame ii
k
ij nd
ij
k
ik v
nv
1
Constructing the MeshConstructing the Mesh
Cyberware scan has problems:Cyberware scan has problems: Fluorescent markers cause bumps on meshFluorescent markers cause bumps on mesh No mouth openingNo mouth opening Too many polygonsToo many polygons
Bumps removed manuallyBumps removed manually Split mouth polygons, add teeth and Split mouth polygons, add teeth and
tongue polygonstongue polygons Run mesh simplification algorithm Run mesh simplification algorithm
(Hoppe’s algorithm: 460k to 4800 polys)(Hoppe’s algorithm: 460k to 4800 polys)
Moving the MeshMoving the Mesh
Move vertices by linear combination Move vertices by linear combination of offsets of nearest dotsof offsets of nearest dots
k
kik
jkj
ij ddpp
Dd
jk
k
1where
Assigning Blend CoefficientsAssigning Blend Coefficients
Assign blend coefficients for a grid of Assign blend coefficients for a grid of 1400 evenly distributed points on face1400 evenly distributed points on face
Assigning Blend CoefficientsAssigning Blend Coefficients
Label each dot, vertex, grid point as Label each dot, vertex, grid point as aboveabove, , belowbelow, or , or neitherneither
Find 2 closest dots to each grid point Find 2 closest dots to each grid point pp DDnn is set of dots within 1.8(d is set of dots within 1.8(d11+d+d22)/2 of )/2 of pp Remove points in relatively same Remove points in relatively same
directiondirection Assign blend values based on distance Assign blend values based on distance
from from pp
Assign Blend Coefficients Assign Blend Coefficients (cont.)(cont.)
If dot not in Dn, then a is 0If dot not in Dn, then a is 0 If dot in Dn:If dot in Dn:
ni Ddi
ii
ii l
l
pdl then ,
0.1let
For vertices, find closest grid pointsFor vertices, find closest grid points Copy blend coefficientsCopy blend coefficients
Dot RemovalDot Removal Substitute skin color for dot colorsSubstitute skin color for dot colors First low-pass filter the imageFirst low-pass filter the image Directional filter prevents color bleedingDirectional filter prevents color bleeding Black=symmetric, White=directionalBlack=symmetric, White=directional
Face with dots Low-pass mask
Dot Removal (continued)Dot Removal (continued)
Extract rectangular patch of dot-free Extract rectangular patch of dot-free skinskin
High-pass filter this patchHigh-pass filter this patch Register patch to center of dot Register patch to center of dot
regionsregions Blend it with the low-frequency skinBlend it with the low-frequency skin Clamp hue values to narrow rangeClamp hue values to narrow range
Dot Removal (continued)Dot Removal (continued)
Original Low-pass
High-pass Hue clamped
Texture GenerationTexture Generation
Texture map generated for each frameTexture map generated for each frame Project mesh onto cylinderProject mesh onto cylinder Compute mesh location (Compute mesh location (kk, , ββ11, , ββ22) for each ) for each
texel (u,v)texel (u,v) For each camera, transform mesh into viewFor each camera, transform mesh into view For each texel, get 3D coordinates in meshFor each texel, get 3D coordinates in mesh Project 3D point to camera planeProject 3D point to camera plane Get color at (x,y) and store as texel colorGet color at (x,y) and store as texel color
Texture Generation Texture Generation (continued)(continued)
Compute a texel weight (dot product Compute a texel weight (dot product between texel normal on mesh and between texel normal on mesh and direction to camera)direction to camera)
Merge the texture maps from all the Merge the texture maps from all the cameras based on the weight map cameras based on the weight map
ResultsResults
Making Faces SummaryMaking Faces Summary
3D geometry and texture of a face3D geometry and texture of a face Data generated for each frame of videoData generated for each frame of video Shading and highlights “glued” to faceShading and highlights “glued” to face Nice results, but not totally automatedNice results, but not totally automated Need to repeat entire process for every Need to repeat entire process for every
new face we want to animatenew face we want to animate
Expression CloningExpression Cloning
Allows facial expressions to be Allows facial expressions to be mapped from one model to anothermapped from one model to another
Source model
Animation
Target model
Expression Cloning OutlineExpression Cloning Outline
Motion capture data or any animation
mechanism
DeformDense surface
correspondences
Vertex displacements
Cloned expressions
Motion transfer
Source model Target model
Source animation Target animation
Source Animation CreationSource Animation Creation
Use any existing facial animation methodUse any existing facial animation method Eg. “Making Faces” paper described earlierEg. “Making Faces” paper described earlier
Motion capture data
Source model Source animation
Dense Surface Dense Surface CorrespondenceCorrespondence
Manually select 15-35 correspondencesManually select 15-35 correspondences Morph the source model using RBFsMorph the source model using RBFs Perform cylindrical projection (ray through Perform cylindrical projection (ray through
source vertex, into target mesh)source vertex, into target mesh) Compute Barycentric coords of intersectionCompute Barycentric coords of intersection
After RBFInitial Features After projection
Automatic Feature SelectionAutomatic Feature Selection
Can also automate initial Can also automate initial correspondencescorrespondences
Use basic facts about human Use basic facts about human geometry:geometry: Tip of Nose = point with highest Z valueTip of Nose = point with highest Z value Top of Head = point with highest Y valueTop of Head = point with highest Y value Currently use around 15 such heuristic Currently use around 15 such heuristic
rulesrules
Example DeformationsExample Deformations
Source Deformed source Target
Closely approximates the target models
Animation with Motion Animation with Motion VectorsVectors
Animate by displacing target vertex by Animate by displacing target vertex by motion of corresponding source pointmotion of corresponding source point
Interpolate Barycentric coordinates of Interpolate Barycentric coordinates of target vertices based on source verticestarget vertices based on source vertices
Need to project target model onto Need to project target model onto source model (opposite of what we did source model (opposite of what we did before)before)
Motion Vector TransferMotion Vector Transfer
Need to adjust direction and magnitudeNeed to adjust direction and magnitude
Source Target Source Target
Source motion vector
Proper target motion vector
Motion Vector Transfer Motion Vector Transfer (cont.)(cont.)
Attach local coordinate system for each Attach local coordinate system for each vertex in source and deformed sourcevertex in source and deformed source
X-axis = average normalX-axis = average normal Y-axis = proj. of any adjacent edge onto Y-axis = proj. of any adjacent edge onto
plane with X-axis as normalplane with X-axis as normal Z-axis = cross product of X and YZ-axis = cross product of X and Y
Source TargetDeformed source
XY
Z
X
Y
Z
T m
m m M
Motion Vector TransferMotion Vector Transfer
Compute transformation between Compute transformation between two coordinate systemstwo coordinate systems
Mapping determines the deformed Mapping determines the deformed source model motion vectorssource model motion vectors
Example Motion TransferExample Motion Transfer
Adjusted motions
Models Motion vectors
Source
TargetMore horizontal
Smaller
ResultsResults
Angry expression
Distorted mouth
Source Targets
Big open mouth
SummarySummary
EC can animate new models using a EC can animate new models using a library of existing expressionslibrary of existing expressions
Transfers motion vectors from source Transfers motion vectors from source animations to target modelsanimations to target models
Process is fast and can be fully Process is fast and can be fully automatedautomated