a new verification-based fast-match for large vocabulary continuous speech recognition

31
A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition Reporter : CHEN, TZAN HWEI Author :M. Afify, F. Liu, H. Jiang and O.S iohan

Upload: diana-cole

Post on 02-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition. Author : M. Afify, F. Liu, H. Jiang and O.Siohan. Reporter : CHEN, TZAN HWEI. Reference. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

Reporter : CHEN, TZAN HWEI

Author :M. Afify, F. Liu, H. Jiang and O.Siohan

Page 2: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

2

Reference

M. Afify, F. Liu, H. Jiang and O.Siohan, “A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech ”, SAP 2005

H. Ney and, S. Ortmanns, “Progress in dynamic programming search for LVCSR”, Proc, IEEE 2000.

S. Ortmans, A. Eiden, H. Ney, and N. Coenen, “Look-ahead techniques for fast beam search”, ICASSP 1997

Page 3: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

3

Outline

Introduction to LVCSR

Proposed fast-match

Implementation of the fast-match

Experiment

Conclusion

Page 4: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

4

Introduction to LVCSR

In the statistical approach to automatic speech recognition the best word sequence is chosen by

Large vocabulary applications, on the order of several thousand words, resulting in a very large state-space.

Look-ahead techniques are among the popular ways for reducing the search space.

)|(max~

XWPWW

Page 5: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

5

Introduction to LVCSR (cont)

Structure of a phoneme and search space

Page 6: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

6

Introduction to LVCSR (cont)

Tree organized pronunciation lexicon

Page 7: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

7

Introduction to LVCSR (cont)

Look-ahead We “look-ahead” in time using some acoustic and/or

language model probabilities to predict some hypotheses that will score poorly in the future, and hence discard them from detailed evaluation.

In this paper, we just discuss about “acoustic look-ahead”, also named simply “fast-match”.

A “good” FM should accelerate the computation with minimal loss of accuracy.

Page 8: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

8

Introduction to LVCSR (cont)

Look-ahead (cont)

Page 9: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

9

Introduction to LVCSR (cont)

Look-ahead (cont) Global fast-match (GFM) : combines the node

score with the look-ahead score in making the pruning decision.

Local fast-match (LFM) : only the local look-ahead score is used in making the fast-match pruning decision.

Page 10: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

10

Introduction to LVCSR (cont)

Global fast-match (GFM)

Page 11: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

11

Proposed fast-match

Hypothesis testing : It is a general statistical framework for

deciding among several hypotheses based on some observations.

In general, binary hypothesis testing chooses one among two hypotheses, usually referred to as the null and alternative hypotheses 0H 1H

)|(

)|(

1

0

HXP

HXP

Page 12: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

12

Proposed fast-match (cont)

Hypothesis testing (cont): Two type of errors can occur

Here, we want to minimize

) trueis | Say Pr( alarm false ofy probabilit The

) trueis | Say Pr( miss ofy probabilit The

10

01

HHp

HHp

F

M

Mp

Page 13: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

13

Proposed fast-match (cont)

For a fast-match the null and alternative hypothesis testing for phoneme can be written as :

The first step toward developing a likelihood ratio test for the above hypothesis testing is to define suitable probability distributions for both hypotheses.

tat timestart not does :

tat time starts :

1

0

H

H

Page 14: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

14

Proposed fast-match (cont)

lyrespective

and , hypotheses ealternativ and null theof models thedenote and , set

we where),| P(and )| P( toreduce ondistributi two thecase thisIn

approach duration fixed adopt the we Here,duration. both of values thebeforehand

determine toposiblenot isit length, variablehave toknown are phonemes As

events. both of durations possible are equal)y necessaril(not

and , and ,],[ interval thein nsobservatio acoustic represents where

at t)start not does | P(: For

at t) starts | P(: For

_

21

_

12

1

0

2

1

a

dtt

dtt

ba

dtt

dtt

ddd

XX

ddbaX

XH

XH

aa

Page 15: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

15

Proposed fast-match (cont)

threshold.

decision theis whereotherwise, deciding and ,)( L when accepting

toreduces test ratio likelihoo The . hypothesis alternate therepresents hence and ,

phoneme of model hypothesis alternatefor stand and ),|P(log)( Where

)1()()()( L

as writtenbe can )( Lratio

likelihood log The . of duration ahead-look theis ].t[t, in sequence

nobservatio theis where,)|(log)( write weHence,

.at ends and at starts y that probabilit log the)as (define We

1t0

1

___

_

t

t

HH

H

XS

SαS

dd

XXPα S

dt tα S

adttt

tt

αα

dtt

dttt

αt

Page 16: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

16

Proposed fast-match (cont)

The null and alternate hypotheses scores is calculated for every time instance, and hence it would be interesting to incrementally calculate the score at time from the corresponding score at time .

If a phoneme is represented by 1-state HMM, the incremental calculation be reduced to the following very simple formula

t1t

)2()|(log)|(log)()( 11 tdttt xpxpSS

Page 17: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

17

Proposed fast-match (cont)

In turn, the likelihood ratio can also be incrementally calculated as

where

the probability can be calculated as

)3()|()|()()( 11 tdttt xqxqLL

)4()|(log)|(log)(_

xpxpq

)|( xp

),,(max

),,()|(1

mmmm

mmm

M

m

xNc

xNcxp

Page 18: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

18

Implementation of the fast-match

Definition of alternate hypothesis or anti-phoneme models.

Parameter estimation of the phoneme and anti-phoneme Gaussian mixture models.

Determining the phoneme look-ahead durations and decision thresholds.

Page 19: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

19

Implementation of the fast-match (cont)

Design of anti-phoneme models: A general trend in their design is to consider either

phoneme specific models or a shared model (background model).

In initial experiments we obtained similar results, in terms of speed-accuracy trade-off, for both the background model and the phoneme specific model.

Page 20: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

20

Implementation of the fast-match (cont)

Parameter estimation of phoneme and anti-phoneme model :

First, a set of training utterances is first segmented using forced alignment into phoneme unit.

The training data for each phoneme is then defined by collecting all segments belonging to this phoneme.

For constructing a general background model, all training data are put together.

Training models by ML estimation.

Page 21: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

21

Implementation of the fast-match (cont)

Calculation of look-ahead duration and decision

thresholds After the segments belonging to each phoneme are

identified using forced alignment of the training data, the look –ahead duration is computed as the average duration of these segments.

Page 22: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

22

Implementation of the fast-match (cont)

Calculation of look-ahead duration and decision

thresholds (cont): For each phoneme we evaluate the score, as in (1), of all

segments in the training set belonging to this phoneme .

We calculate the mean score and the standard deviation of the score of these segments.

The threshold is calculated as , where n is used to trade off the speed and accuracy.

n

Page 23: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

23

Experiment

Tested on a Japanese broadcast news transcription task, whose vocabulary is drawn from 20000 words.

Training and test speech data in addition to the trigram language model are provided by the Japan broadcasting corporation (NHK)

Page 24: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

24

Experiments (cont)

First, we illustrate the behavior of the fast match algorithm on a small development set, and describe how tuning the threshold affects the performance of the system.

The training data consists of 90 h of speech.

The test set consists of 162 utterance from male speakers in a clean studio environment.

Page 25: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

25

Experiments (cont)

The development set perplexity is about 34 and the out-of-vocabulary (OOV) rate is 0.76%

The baseline system runs in about 0.79 times real-time, for a WER of 4.04%

The Gaussian mixture size is set to 8, 12, and 32 while the threshold is taken that as discussed before, and n takes the values

}1,2,3,5.3,4{n

Page 26: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

26

Experiments (cont)

Fig. 2. Percentage word error rate (WER) and real time factor (RT) for the fast-match with mixture sizes 8, 16, and 32, and for different thresholds on the development set.

Page 27: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

27

Experiments (cont)

Fig. 3. Percentage word error rate (WER) and real time factor (RT)both likelihood ratio and likelihood scores are used here for the fast-matchfor comparison.

Page 28: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

28

Experiments (cont)

In a second series of experiments, we illustrate the performance of the proposed fast-match algorithm on a much larger data set.

TABLE I LIST OF ALL EVALUATION CONDITIONS AND NUMBER

OF TEST UTTERANCES PER CONDITION

Page 29: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

29

Experiments (cont)

The test perplexity varies from less than 10 to about 80 depending on the environment, with OOV rates ranging from 0.25% to 2.5%

The acoustic models used for recognition are build on about 170 hours of training data

The threshold is set to 5.3

Page 30: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

30

Experiments (cont)

TABLE IIWORD ERROR RATE (WER) AND REAL-TIME (RT) FACTOR ON NHK

EVALUATION TEST SET WITH AND WITHOUT FAST-MATCH

Page 31: A New Verification-Based Fast-Match for Large Vocabulary Continuous Speech Recognition

31

Conclusion

it is shown that the current frame the test can be incrementally calculated from the previous frame using very simple computation.

It offers robustness as evidenced in the multi-environment experiments .