speech recognition in mumis eric sanders (kun) march 2003

Speech recognition in MUMIS

Eric Sanders (KUN)

March 2003

People involved at KUN

Helmer Strik

Judith Kessens

Mirjam Wester

Janienke Sturm

Eric Sanders

Febe de Wet

Paul Tielen

Overview

Speech data

Baseline recognition

Adding data

Noise robustness

Word types

Conclusions

Examples of Data

Dutch“op _t ogenblik wordt in dit stadion de opstelling voorgelezen”

English“and they wanna make the change before the corner”

German“und die beiden Tore die die Hollaender bekommen hat haben”

From Yugoslavia-The Netherlands

Speech Data

All data

Language Dutch English German

# matches 6 3 21

# words 40,296 34,684 127,265

Speech Data

Match Dutch English German

Yugoslavia – The Netherlands 5,922 10,188 3,998

England – Germany 5,798 13,488 7,280

Test data (#words)


PMs: - trained on the other test match

Lex: - based on the other test set- match specific words added

LM: - category LM - based on the other test match- match specific words added


83,28

84,9186,84

93,16

85,71 85,21

78

80

82

84

86

88

90

92

94

YugNL EngGer

WE

R (

%)

Dutch

German

English

Adding Data

Extra training data:Dutch = 4 matchesGerman = 19 matchesEnglish = 1 match

Adding training data to train the lexicon and the language models (phone models trained on 1 match)

Adding Data (German)

75

80

85

90

95

0 100.000 200.000 300.000

number of words to train the LM

WE

R (%

)

Yug-NL, lex:1match

Yug-NL, lex:7matches

Yug-NL, lex:19matches

Eng-Ger, lex:7matches

Eng-Ger, lex:19matches

Noise Robustness Dutch English German

Noise Robustness

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30

SNR (dB)

WER

(%)

YugNL_NL

EngGer_NL

YugNL_ENG

EngGer_ENG

YugNL_GER (A)

YugNL_GER (B)

Eng-Ger_GER

Noise Robustness

Matching acoustic properties of train and test material

Training SNR dependent phone models

Applying noise robust feature extraction:Histogram Normalisation & FTNR

Possible solutions:

Noise RobustnessYUG-NL, very noisy

66

68

70

72

74

76

78

80

82

Semi-clean Noisy Very noisy

WE

R (

%)

Baseline

HN

HN + FTNR

Word Types

Not all words are equally important for an information retrieval task

Categories:- function words (prepositions, pronouns)- application specific words (player names)- other content words

WERs for different categories

0

20

40

60

80

100

NL Ger Eng NL Ger Eng

YugNL EngGer

WER

(%

) all

content w ords

function w ords

player names

Word Types

Conclusions

SNR values explain the WERs to a large extent

More data is not necessarily better

Applying noise robust features leads to best results

Overall WERs are very high, but application specific words are recognised relatively well

The end

speech recognition in mumis eric sanders (kun) march 2003

Documents