an automatic lip-reading method based on polynomial fitting

34
An Automatic Lip-reading Method An Automatic Lip-reading Method Based on Polynomial Fitting Based on Polynomial Fitting Meng LI Supervisor: Dr. Yiu-ming CHEUNG Department of Computer Science Hong Kong Baptist University

Upload: jayden

Post on 22-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

An Automatic Lip-reading Method Based on Polynomial Fitting. Meng LI Supervisor: Dr. Yiu-ming CHEUNG Department of Computer Science Hong Kong Baptist University. Content. Introduction. Lip segmentation. Visual speech recognition. Experiment. Conclusion and future work. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Automatic Lip-reading Method Based on Polynomial Fitting

An Automatic Lip-reading Method Based on An Automatic Lip-reading Method Based on Polynomial FittingPolynomial FittingAn Automatic Lip-reading Method Based on An Automatic Lip-reading Method Based on Polynomial FittingPolynomial Fitting

Meng LISupervisor: Dr. Yiu-ming CHEUNG

Department of Computer ScienceHong Kong Baptist University

Page 2: An Automatic Lip-reading Method Based on Polynomial Fitting

Content

Conclusion and future work

Introduction

Lip segmentation

Visual speech recognition

Experiment

Page 3: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Audio Audio ChannelChannel

Video Video ChannelChannel

PerceptionPerception

The speech perception is multimodal involves information from at least two sensory modalities.

Page 4: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Visual Only

Audio Only

Visual-Audio

0% 20% 40% 60% 80% 100%

73 %

91%

97%

Visual Only

Audio Only

Visual-Audio

0% 20% 40% 60% 80% 100%

73 %

47%

87%

Silent Environment

Noisy Environment

Page 5: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

The hottest research direction in lip-reading is visual-speech recognition (with audio information, or visual only)

1%1%5%5%

31%31%

63%63%

Others

Identification

Speech recognition in noisy environment

Visual-only speech recognition

Page 6: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Preprocessing

Acoustic Acoustic ProcessingProcessing

Audio

Video

Feature Extraction

Audio Feature Audio Feature ExtractionExtraction

Lip CapturingLip Capturing Visual Feature Visual Feature ExtractionExtraction

AV Fusion

Fusion and Fusion and RecognitionRecognition

The basic structure of an typical AVSR (Automatic Visual-Speech Recognition) system

Page 7: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Using all pixels in lip region as feature.

Pixel Based Motion Based

Model BasedShape Based

Capture the moving feature in all or parts of lip during pronunciation

Extract the boundary of lip as the feature.

Assume a lip modal, matching the lip shape and the modal, using some parameters to represent the shape of lip.

Page 8: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Using all pixels in lip region as feature.

Pixel BasedPixel Based Motion Based

Model BasedShape Based

Capture the moving feature in all or parts of lip during pronunciation

Extract the boundary of lip as the feature.

Assume a lip modal, matching the lip shape and the modal, using some parameters to represent the shape of lip.

Page 9: An Automatic Lip-reading Method Based on Polynomial Fitting

Sensitive to the illumination condition. Sensitive to the rotate, scale transform. Human dependence. High dimension of feature data.

Sensitive to the illumination condition. Sensitive to the rotate, scale transform. Human dependence. High dimension of feature data.

All information are utilized. Highest recognition in ideal

illumination condition.

All information are utilized. Highest recognition in ideal

illumination condition.

PositivePositive DisadvantageDisadvantageAdvantageAdvantage

Introduction

Page 10: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Using all pixels in lip region as feature.

Pixel Based Motion BasedMotion Based

Model BasedShape Based

Capture the moving feature in all or parts of lip during pronunciation

Extract the boundary of lip as the feature.

Assume a lip modal, matching the lip shape and the modal, using some parameters to represent the shape of lip.

Page 11: An Automatic Lip-reading Method Based on Polynomial Fitting

Sensitive to the illumination condition. Sensitive to the rotate, scale transform. Human dependence. High dimension of feature data.

Sensitive to the illumination condition. Sensitive to the rotate, scale transform. Human dependence. High dimension of feature data.

Represent the motion of lip directly and completely.

Represent the motion of lip directly and completely.

NegativeNegativePositivePositive DisadvantageDisadvantageAdvantageAdvantage

Introduction

Page 12: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Using all pixels in lip region as feature.

Pixel Based Moving Based

Model BasedModel BasedShape BasedShape Based

Capture the moving feature in all or parts of lip during pronunciation

Extract the boundary of lip as the feature.

Assume a lip modal, matching the lip shape and the modal, using some parameters to represent the shape of lip.

Page 13: An Automatic Lip-reading Method Based on Polynomial Fitting

High computation complexity. High computation complexity. Low dimension of feature data. Robust to rotate and scale

transformation. If the model appropriate, human

independence ca be implemented. Convenient to employ some classical

method (e.g. HMM) to match.

Low dimension of feature data. Robust to rotate and scale

transformation. If the model appropriate, human

independence ca be implemented. Convenient to employ some classical

method (e.g. HMM) to match.

NegativeNegativePositivePositive DisadvantageDisadvantageAdvantageAdvantage

Introduction

Tip

So far, the Model-based Feature

Extraction is the most common

method.

Page 14: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Page 15: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Lip segmentation under gray-levelLip segmentation under gray-level

Based on gray-level image.

Locate the minimum enclosing rectangular of mouth.

High processing speed. Low computation complexity.

Based on gray-level image.

Locate the minimum enclosing rectangular of mouth.

High processing speed. Low computation complexity.

The rest of this presentation.

Page 16: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Lip segmentation in colour spaceLip segmentation in colour space

Based on rgb, hsv and La*b* colour space.

Can extract the outer boundary of lip.

High accuracy.

High computation complexity.

Based on rgb, hsv and La*b* colour space.

Can extract the outer boundary of lip.

High accuracy.

High computation complexity.

The rest of this presentation.

Page 17: An Automatic Lip-reading Method Based on Polynomial Fitting

Introduction

Visual only speech recognitionVisual only speech recognition

Based on polynomial fitting.

High processing speed. Suitable for real-time system. Perform good in limited training set.

Based on polynomial fitting.

High processing speed. Suitable for real-time system. Perform good in limited training set.

The rest of this presentation.

Page 18: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (1)

Page 19: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (1)

Page 20: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (1)

Page 21: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (2)

Firstly, we transform the source image from RGB color space into La*b* space.

In a* channel, negative values indicate green while positive values indicate magenta.

So, it is helpful to highlight the lip region from skin.

Page 22: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (2)

1282/)77142221814503(377 24* BGRa

*min

*max

*min

**

aa

aaanorm

2552 * GaI normmask

GRGRrg R

GI 125610/

Page 23: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (2)

In source image, we get the pixels located in the non-black area, and transform them into HSV color space.

Then, we can get a vector as follow:

))2sin(),2cos(( iiiii shshI

We assume the data follow a normal distribution, and estimate the mean and variance via ML:

n

In

ii

n

i

Tii II

n 1

)ˆ)(ˆ(1

Page 24: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (2)

We can transform the source image into HSV color space, and get the vector as follow:

2

)ˆ()ˆ( 1

|ˆ|2

1255

Tglobali

globali II

seg eI

))2sin(),2cos(( iiiiglobali shshI

Then, we can get a new image:

The lighter pixel means it is similar to lip region in color space.

Page 25: An Automatic Lip-reading Method Based on Polynomial Fitting

Lip segmentation (2)

row

i

col

jseg

row

i

col

jseg

x

jiI

jiIj

g

1 1

1 1

),(

),(

row

i

col

jseg

row

i

col

jseg

y

jiI

jiIi

g

1 1

1 1

),(

),(

We select the block in which include the “gravity center” as the lip region.

maskI

Page 26: An Automatic Lip-reading Method Based on Polynomial Fitting

Visual speech recognition

Page 27: An Automatic Lip-reading Method Based on Polynomial Fitting

Visual speech recognition

For each utterance, we can get two curves correspond into the changing of width and height of lip, respectively.

We can employ LSE to construct two polynomial to fit the two curves.

n

k

kk xaP

1

n

i

n

ki

kk yxaI

0 1

2)(

0

ia

I

Page 28: An Automatic Lip-reading Method Based on Polynomial Fitting

Visual speech recognition

In this work, we get n=3.The maximum, minimum and the most right point is recorded as the feature vectors.

Twwwwww boundyyxyxF ],,,,[

maxmaxminmin

Thhhhhh boundyyxyxF ],,,,[

maxmaxminmin

Each utterance is assigned a label “j”, and we use the following equations to train:

21,

wiwi

jw

FTT

21,

hihi

jh

FTT

We use the following equations to test (F is the input feature vector, and T is the trained template feature vector):

||)||||(|| ,,minarg jhhjww

j

TFTFJ

Page 29: An Automatic Lip-reading Method Based on Polynomial Fitting

Experiment

The illumination source is an 18w fluorescent lamp, the resolution of camera is 320*240, FPS = 30, and the entire environment is shown as below.

Our task is to recognize 10 isolate digits (0 to 9) in Chinese mandarin.

There are 5 speakers (4 males and 1 female) take part into the experiment. For each digit, speakers were asked to repeat 10 times to train the system, and fifty times to test.

Page 30: An Automatic Lip-reading Method Based on Polynomial Fitting

Experiment

The experiment result is shown as below:

Digit Accuracy Digit Accuracy

0 0.972 5 0.912

1 0.952 6 0.964

2 0.976 7 0.744

3 0.964 8 0.952

4 0.788 9 0.932

Page 31: An Automatic Lip-reading Method Based on Polynomial Fitting

Experiment

Compare with some existed approaches which also utilize the width and height of lip as visual feature:

Method Accuracy

1 0.8127

2 0.7741

3 0.9149

4 0.7720

Our approach 0.9156

1,2 and 3: S.L.Wang, W.H.Lau, A.W.C.Liew, and S.H.Leung. Automatic lipreading with limited training data. In Proc. ICPR 2006, pp: 881-884, 2006.

4: A.R.Baig, R.Seguier, and G. Vaucher. Image sequence analysis using a spatio-temporal coding for automatic lipreading. In Porc. ICIAP 1999, pp: 544-549, 1999.

Page 32: An Automatic Lip-reading Method Based on Polynomial Fitting

Experiment

Page 33: An Automatic Lip-reading Method Based on Polynomial Fitting

Conclusion & Future work

In this paper, we have proposed a new approach to automatic lip reading recognition based upon polynomial fitting. The feature vector of our approach have low dimensions and the approach need small testing data set. Experiments have shown the promising result of the proposed approach in comparison with the existing methods.

However, in the more difficult experiment task, e.g. to recognize some words or sentences, some appropriate model is required. This is the emphasis of the next stage research.

Page 34: An Automatic Lip-reading Method Based on Polynomial Fitting

Thank you!

31-08-2009