video rewrite: driving visual speech with audio

54
1 Video Rewrite: Driving Visual Speech with Audio Christoph Bregler Michele Covell Malcolm Slaney Interval Research Corporation

Upload: shaeleigh-aguirre

Post on 31-Dec-2015

57 views

Category:

Documents


0 download

DESCRIPTION

1. Video Rewrite: Driving Visual Speech with Audio. Christoph Bregler Michele Covell Malcolm Slaney Interval Research Corporation. 2. Goal: Photo-realistic Talking Face. Video Rewrite. Handcoded 3D Model. OR. 2. Facial Animation History:. Parke (1972) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Video Rewrite: Driving Visual Speech with Audio

1Video Rewrite:Driving Visual Speech with Audio

Christoph Bregler

Michele Covell

Malcolm Slaney

Interval Research Corporation

Page 2: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 3: Video Rewrite: Driving Visual Speech with Audio

2Goal: Photo-realistic Talking Face

Handcoded3D Model

Video Rewrite

OR

Page 4: Video Rewrite: Driving Visual Speech with Audio

2

Facial Animation History:

• Parke (1972)• Cohen & Massaro, Benoit et al. (1993)• Waters & Terzopolous (1990), DEC-Face• Lewis (1991)• Litwinowicz & Williams (1994)• Chen, Graf, Petajan, et al (1995)• Scott et al (1994)• Ezzat & Poggio (1997)• Pighin et al + Gunter et al (1998)• Brand (1999)• Cosatto, Graf (2000)

Page 5: Video Rewrite: Driving Visual Speech with Audio

3Video Rewrite: Overview

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Page 6: Video Rewrite: Driving Visual Speech with Audio

4Video Rewrite: Overview

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Page 7: Video Rewrite: Driving Visual Speech with Audio

5

Annotation

• Phonetic Phonetic

• Head PoseHead Pose

• Mouth ShapeMouth Shape

/D/ /OH/ /N/ /AH/

Page 8: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 9: Video Rewrite: Driving Visual Speech with Audio

6

Phonetic Annotation

HMM Labels/D/ /IY/ /P/ /AH/

/D-IY-P/ /IY-P-AH/

Page 10: Video Rewrite: Driving Visual Speech with Audio

6

Phonetic Annotation

• Acoustic Front-End: RASTA-PLP (Channel Invariant)

• HMM Models / Gaussian Mixture Models (HTK)

• Phoneme Set: 56 categories (CMU)

• Triphone models trained on TIMIT

• Annotation using Forced-Viterbi

(and CMU pronunciation dictionary)

Page 11: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 12: Video Rewrite: Driving Visual Speech with Audio

5

Annotation

• Phonetic Phonetic

• Head PoseHead Pose

• Mouth ShapeMouth Shape

/D/ /OH/ /N/ /AH/

Page 13: Video Rewrite: Driving Visual Speech with Audio

7

Head Pose Annotation

match planartemplate

Page 14: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 15: Video Rewrite: Driving Visual Speech with Audio

5

Annotation

• Phonetic Phonetic

• Head PoseHead Pose

• Mouth ShapeMouth Shape

/D/ /OH/ /N/ /AH/

Page 16: Video Rewrite: Driving Visual Speech with Audio

8

Mouth / Chin Annotation

Eigenpoints

Page 17: Video Rewrite: Driving Visual Speech with Audio

8

Eigenpoints - Training -

Graylevel +XY Control points

Page 18: Video Rewrite: Driving Visual Speech with Audio

8

Eigenpoints - Mapping -

Graylevel +XY Control pointSpace

Page 19: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 20: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 21: Video Rewrite: Driving Visual Speech with Audio

9Video Rewrite: Overview

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Page 22: Video Rewrite: Driving Visual Speech with Audio

10Video Rewrite: Overview

AnalysisAnalysis

/D//D/ /IY//IY/ /P//P/ /AH//AH/

SynthesisSynthesis

Page 23: Video Rewrite: Driving Visual Speech with Audio

11

Synthesis - Overview -

background face

Page 24: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 25: Video Rewrite: Driving Visual Speech with Audio

12

Synthesis:

• Transcribe Transcribe

• Find Lip ClipsFind Lip Clips

• Stitch TogetherStitch Together

/J/ /EH/ /L/ /IY/

Page 26: Video Rewrite: Driving Visual Speech with Audio

13

Matching:

/T//AA/ /AA/

Page 27: Video Rewrite: Driving Visual Speech with Audio

14Matching: Co-Articulation

/T//AA/ /AA/

?

/ UW - T - UW/

Page 28: Video Rewrite: Driving Visual Speech with Audio

15Matching: Co-Articulation

/ UW - T - UW/

/T//AA/ /AA/

match / AA - T - AA/

Page 29: Video Rewrite: Driving Visual Speech with Audio

16Co-Articulation: Tri-Phones

/ AA - S - AA/

/ AA - T - AA/

/ UW - T - UW/

….

More than 20,000 Tri-Phonesin English

Page 30: Video Rewrite: Driving Visual Speech with Audio

16Viseme based Perceptual match

P B S T K …

P

B

S

T

K

Owens (1985) Confusion Matrix

11 Consonant Clusters:

- CH, JH, SH, ZH - K, G, N, L - T, D, S, Z - P, B, M - F, V - TH, DH

Page 31: Video Rewrite: Driving Visual Speech with Audio

McGurk Effect -- Baldy by Cohen & Massaro

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Page 32: Video Rewrite: Driving Visual Speech with Audio

17Matching: Viseme-Distance

/ UW - T - UW/

/T//AA/ /AA/

correct phonewrong context:

/ AA - S - AA/correct visemecorrect context:

Page 33: Video Rewrite: Driving Visual Speech with Audio

18Matching: Viseme-Distance

/ UW - T - UW/

/T//AA/ /AA/

approximatematch / AA - S - AA/

Page 34: Video Rewrite: Driving Visual Speech with Audio

18Matching: Overlapping Triphones

Shape Distance

Page 35: Video Rewrite: Driving Visual Speech with Audio

18

Matching: Trade-Offs

/T//AA/ /AA//P//IY/

Shape Distance

N-VisemeDistance

Rate of Speech Distance

Page 36: Video Rewrite: Driving Visual Speech with Audio

18

Matching: N-Best Dynamic Programming

Error = V(t) + R(t) + S(t-1,t)

t

N-best

Page 37: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 38: Video Rewrite: Driving Visual Speech with Audio

19

Stitching

+ +

Page 39: Video Rewrite: Driving Visual Speech with Audio

20

Stitching

+ +

Page 40: Video Rewrite: Driving Visual Speech with Audio

21

Stitching

MorphingMorphing

Page 41: Video Rewrite: Driving Visual Speech with Audio

21

Morphing

Affine-Warp +Beier-Neely

Page 42: Video Rewrite: Driving Visual Speech with Audio

21Simple Lighting Correction

Alpha Blending

X

X

Internsity

1.)

2.)

Page 43: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 44: Video Rewrite: Driving Visual Speech with Audio

22

Video Rewrite Results

JFK - Video Model

2 minutes data

Ellen - Video Model

8 minutes data

Page 45: Video Rewrite: Driving Visual Speech with Audio

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 46: Video Rewrite: Driving Visual Speech with Audio

23

Contributions

• Data-driven Data-driven lip animationlip animation

• Automatic Automatic using vision and speech using vision and speech

recognitionrecognition

• Photo realistic: Photo realistic:

implicitly captures specific appearance + implicitly captures specific appearance + dynamicsdynamics

Page 47: Video Rewrite: Driving Visual Speech with Audio

24

Video Rewrite

Thanks !

S. AhmadM. BajuraF. CrowT. DarrellM. DavisG. Gordon

John F. Kennedy

Acknowledgments:K. ForceB. FusonB. LassiterJ. LewisK. Rahardja

S. SnibbeC. SequineE. TauberB. VerplankS. WhiteJ. Woodfill

Page 48: Video Rewrite: Driving Visual Speech with Audio

1994: Scott et al (JPL + Graphco Technologies)

/o/

/n/

/e/

Page 49: Video Rewrite: Driving Visual Speech with Audio

1994: Scott et al (JPL + Graphco Technologies)

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 50: Video Rewrite: Driving Visual Speech with Audio

1994: Scott et al (JPL + Graphco Technologies)

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 51: Video Rewrite: Driving Visual Speech with Audio

Matching Video-Snippets with Context

/ AA - S - AA/

/ AA - T - AA/

/ UW - T - UW/

….

“Video Model”

N-phone context

/T/ /AA/ /UW/ /S/

Page 52: Video Rewrite: Driving Visual Speech with Audio

2000: Cosatto, Graf, AT&T Research

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 53: Video Rewrite: Driving Visual Speech with Audio

2000: Cosatto, Graf, AT&T Research

QuickTime™ and a decompressor

are needed to see this picture.

Page 54: Video Rewrite: Driving Visual Speech with Audio

24Rewrite Techniques -- Future --

Model Data

Video Rewrite