speech recognition in mumis eric sanders (kun) march 2003

18
Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Upload: yosef-dager

Post on 01-Apr-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Speech recognition in MUMIS

Eric Sanders (KUN)

March 2003

Page 2: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

People involved at KUN

Helmer Strik

Judith Kessens

Mirjam Wester

Janienke Sturm

Eric Sanders

Febe de Wet

Paul Tielen

Page 3: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Overview

Speech data

Baseline recognition

Adding data

Noise robustness

Word types

Conclusions

Page 4: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Examples of Data

Dutch“op _t ogenblik wordt in dit stadion de opstelling voorgelezen”

English“and they wanna make the change before the corner”

German“und die beiden Tore die die Hollaender bekommen hat haben”

From Yugoslavia-The Netherlands

Page 5: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Speech Data

All data

Language Dutch English German

# matches 6 3 21

# words 40,296 34,684 127,265

Page 6: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Speech Data

Match Dutch English German

Yugoslavia – The Netherlands 5,922 10,188 3,998

England – Germany 5,798 13,488 7,280

Test data (#words)

Page 7: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Baseline recognition

PMs: - trained on the other test match

Lex: - based on the other test set- match specific words added

LM: - category LM - based on the other test match- match specific words added

Page 8: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Baseline recognition

83,28

84,9186,84

93,16

85,71 85,21

78

80

82

84

86

88

90

92

94

YugNL EngGer

WE

R (

%)

Dutch

German

English

Page 9: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Adding Data

Extra training data:Dutch = 4 matchesGerman = 19 matchesEnglish = 1 match

Adding training data to train the lexicon and the language models (phone models trained on 1 match)

Page 10: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Adding Data (German)

75

80

85

90

95

0 100.000 200.000 300.000

number of words to train the LM

WE

R (%

)

Yug-NL, lex:1match

Yug-NL, lex:7matches

Yug-NL, lex:19matches

Eng-Ger, lex:7matches

Eng-Ger, lex:19matches

Page 11: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Noise Robustness Dutch English German

Page 12: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Noise Robustness

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30

SNR (dB)

WER

(%)

YugNL_NL

EngGer_NL

YugNL_ENG

EngGer_ENG

YugNL_GER (A)

YugNL_GER (B)

Eng-Ger_GER

Page 13: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Noise Robustness

Matching acoustic properties of train and test material

Training SNR dependent phone models

Applying noise robust feature extraction:Histogram Normalisation & FTNR

Possible solutions:

Page 14: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Noise RobustnessYUG-NL, very noisy

66

68

70

72

74

76

78

80

82

Semi-clean Noisy Very noisy

WE

R (

%)

Baseline

HN

HN + FTNR

Page 15: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Word Types

Not all words are equally important for an information retrieval task

Categories:- function words (prepositions, pronouns)- application specific words (player names)- other content words

WERs for different categories

Page 16: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

0

20

40

60

80

100

NL Ger Eng NL Ger Eng

YugNL EngGer

WER

(%

) all

content w ords

function w ords

player names

Word Types

Page 17: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Conclusions

SNR values explain the WERs to a large extent

More data is not necessarily better

Applying noise robust features leads to best results

Overall WERs are very high, but application specific words are recognised relatively well

Page 18: Speech recognition in MUMIS Eric Sanders (KUN) March 2003

The end