video rewrite: driving visual speech with audio

1Video Rewrite:Driving Visual Speech with Audio

Christoph Bregler

Michele Covell

Malcolm Slaney

Interval Research Corporation

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

2Goal: Photo-realistic Talking Face

Handcoded3D Model

Video Rewrite

Facial Animation History:

• Parke (1972)• Cohen & Massaro, Benoit et al. (1993)• Waters & Terzopolous (1990), DEC-Face• Lewis (1991)• Litwinowicz & Williams (1994)• Chen, Graf, Petajan, et al (1995)• Scott et al (1994)• Ezzat & Poggio (1997)• Pighin et al + Gunter et al (1998)• Brand (1999)• Cosatto, Graf (2000)

3Video Rewrite: Overview

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Annotation

• Phonetic Phonetic

• Head PoseHead Pose

• Mouth ShapeMouth Shape

/D/ /OH/ /N/ /AH/

Phonetic Annotation

HMM Labels/D/ /IY/ /P/ /AH/

/D-IY-P/ /IY-P-AH/

Phonetic Annotation

• Acoustic Front-End: RASTA-PLP (Channel Invariant)

• HMM Models / Gaussian Mixture Models (HTK)

• Phoneme Set: 56 categories (CMU)

• Triphone models trained on TIMIT

• Annotation using Forced-Viterbi

(and CMU pronunciation dictionary)

Annotation

/D/ /OH/ /N/ /AH/

Head Pose Annotation

match planartemplate

Annotation

/D/ /OH/ /N/ /AH/

Mouth / Chin Annotation

Eigenpoints

Eigenpoints - Training -

Graylevel +XY Control points

Eigenpoints - Mapping -

Graylevel +XY Control pointSpace

QuickTime™ and aYUV420 codec decompressor

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Synthesis - Overview -

background face

Synthesis:

• Transcribe Transcribe

• Find Lip ClipsFind Lip Clips

• Stitch TogetherStitch Together

/J/ /EH/ /L/ /IY/

Matching:

/T//AA/ /AA/

14Matching: Co-Articulation

/T//AA/ /AA/

/ UW - T - UW/

15Matching: Co-Articulation

/ UW - T - UW/

/T//AA/ /AA/

match / AA - T - AA/

16Co-Articulation: Tri-Phones

/ AA - S - AA/

/ AA - T - AA/

/ UW - T - UW/

More than 20,000 Tri-Phonesin English

16Viseme based Perceptual match

P B S T K …

Owens (1985) Confusion Matrix

11 Consonant Clusters:

- CH, JH, SH, ZH - K, G, N, L - T, D, S, Z - P, B, M - F, V - TH, DH

McGurk Effect -- Baldy by Cohen & Massaro

QuickTime™ and aCinepak decompressor

17Matching: Viseme-Distance

/ UW - T - UW/

/T//AA/ /AA/

correct phonewrong context:

/ AA - S - AA/correct visemecorrect context:

18Matching: Viseme-Distance

/ UW - T - UW/

/T//AA/ /AA/

approximatematch / AA - S - AA/

18Matching: Overlapping Triphones

Shape Distance

Matching: Trade-Offs

/T//AA/ /AA//P//IY/

Shape Distance

N-VisemeDistance

Rate of Speech Distance

Matching: N-Best Dynamic Programming

Error = V(t) + R(t) + S(t-1,t)

N-best

Stitching

MorphingMorphing

Morphing

Affine-Warp +Beier-Neely

21Simple Lighting Correction

Alpha Blending

Internsity

Video Rewrite Results

JFK - Video Model

2 minutes data

Ellen - Video Model

8 minutes data

Contributions

• Data-driven Data-driven lip animationlip animation

• Automatic Automatic using vision and speech using vision and speech

recognitionrecognition

• Photo realistic: Photo realistic:

implicitly captures specific appearance + implicitly captures specific appearance + dynamicsdynamics

Video Rewrite

Thanks !

S. AhmadM. BajuraF. CrowT. DarrellM. DavisG. Gordon

John F. Kennedy

Acknowledgments:K. ForceB. FusonB. LassiterJ. LewisK. Rahardja

S. SnibbeC. SequineE. TauberB. VerplankS. WhiteJ. Woodfill

1994: Scott et al (JPL + Graphco Technologies)

Matching Video-Snippets with Context

/ AA - S - AA/

/ AA - T - AA/

/ UW - T - UW/

“Video Model”

N-phone context

/T/ /AA/ /UW/ /S/

2000: Cosatto, Graf, AT&T Research

QuickTime™ and a decompressor

24Rewrite Techniques -- Future --

Model Data

Video Rewrite

video rewrite: driving visual speech with audio

scott et

aa aa t

aa uw t

gunter et

pighin et

triphones16 aa s

perceptual match16p

speech distancematching

Documents

overlapped-speech detection with applications to...

rewrite again

scream - rewrite

scoped dynamic rewrite rules

driving compliance through real-time speech analytics

rewrite with fractional exponents. rewrite with fractional...

maths rewrite

rewrite recipes -...

rewrite, rewrite, rewrite, rewrite, rewrite, · rewrite...

rewrite 1192

unit 7 - jrmolina -...

video rewrite driving visual speech with audio

should the driving age be raised? by stephanie masek...

rewrite sentences

rewrite a polynomial

primaryclass.coprimaryclass.co.uk/files/worksheetresources/29_10_13/e0069inverted... ·...

ifc final rewrite

reported speech 8 th form. 1. rewrite these statements as...

the differences between direct and indirect...

cosmo - dynamical core rewrite approach, rewrite and status...