knowledge of language origin improves pronunciation accuracy

23
Improves Pronunciation Accuracy Ariadna Font Llitjos April 13, 2001

Upload: kemp

Post on 12-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Knowledge of Language Origin Improves Pronunciation Accuracy. Ariadna Font Llitjos April 13, 2001 Advisor: Alan W Black. Motivation. It is impossible to have a lexicon with complete coverage, and high proportion of unknown words are proper names: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Knowledge of Language Origin Improves Pronunciation Accuracy

Ariadna Font Llitjos

April 13, 2001

Advisor: Alan W Black

Page 2: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Motivation

It is impossible to have a lexicon with complete coverage, and high proportion of unknown words are proper names:

In an experiment done by [Black, Lenzo and Pagel, 1998], when processing the first section of the WSJ Penn Treebank (about 40,000 words), they found that 4.6% (1775 words) were out of vocabulary words (using OALD), 76.6% of which are proper names.

Page 3: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Motivation cont.

We need an automatic way of learning an acceptable pronunciation for OOV words, most of which are proper names.

General approach: LTS rules (CART)

Specifically, add language probability information

Page 4: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Data and limits- 56,000 proper names from the CMUDICT lexicon

with stress [originally from Bell Labs directory listings, ~20 years ago]

90% training set & 10% test set- We only looked at the educated native American

English pronunciation of proper names: e.g. for ‘Van Gogh’, we don’t want our system to say /F AE1 N G O K/ or /F AE1 N G O G/, which some people may claim is the correct way of pronouncing it, but rather the educated American pronunciation of it:

/V AE1 N . G OW1/.

Page 5: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Baseline Technique

Decision trees to predict phones based on letters and their context (n-grams). In English, letters map to epsilon, a phone or occasionally two phones:

(a) Monongahela m ah n oa1 ng g ah hh ey1 l ax (b) Pittsburgh p ih1 t s b er g (c) exchange ih k-s ch ey1 n jh 

Allowables (45 –> 101 phones) and alignments (stress & epsilon misplacements affect accuracy)

Page 6: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Origin Class info

What does origin class mean? - geographic? - etymologic? [Church, 2000] - language (our 1st approach) - data driven (what we really want, current work)

Page 7: Knowledge of  Language Origin Improves   Pronunciation Accuracy

LLM for 26 languages

- European Corpus IMC I: English, French, German, Spanish,

Croatian, Czech, Danish, Dutch, Estonian, Hebrew, Italian, Malaysian, Norwegian, Portuguese, Serbian, Slovenian, Swedish, Turkish

- using the Corpusbuilder + manually:Catalan, Chinese, Japanese, Korean, Polish, Thai, Tamil and other Indian languages (except for Tamil).

Page 8: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Language Identifier

An implementation of a variation of the

algorithm presented in Canvar, W.B., and

Trenkle J.M. N-Gram-Based Text

Categorization, in Proceedings of 3rd

Annual Symposium on Document

Analysis and Information Retrieval,1994.

Page 9: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Language Identifier cont.

The language identifier creates a LLM on the fly

for the input word (or document) and, for every

trigram in the input, it calculates the probability

of it belonging to all the languages by

multiplying them by the relative frequencies for

those trigrams in each one of the languages

(LLMs)

Page 10: Knowledge of  Language Origin Improves   Pronunciation Accuracy

LI example

./classify.pl -t "Ying Zhang" chinese-pn: 0.730594870150084 english.train: 0.0525988955766553 german-pn: 0.0506847882275029 british-pn: 0.0378543572677309 german.train: 0.0303455616225699 tamil-pn: 0.029581372574322 french-pn: 0.0201655107720744 spanish-pn: 0.0185146818045872 catalan-pn: 0.0162318631058251 japanese-pn: 0.00851225092810786 french.train: 0.002861385664355 spanish.train: 0.00205446230618505

Page 11: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Indirect use of the Language Identifier Instead of building trees explicitly for

each language (data sparseness problem), we use the results from the language identification process as features within the CART build process, allowing those features to affect the tree building only when their information is relevant.

Page 12: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Features for or pronunciation model We decided to add to the n-gram featured the

following: - most probable language, with its probability - 2nd most likely language, with its probability - difference between the 2 highest

probabilities

(zysk ( (best-lang slovenian.train) (higher-prob 0.18471) (2nd-best-lang czech.train)(2nd-higher-prob 0.18428) (prob-difference 0.00043)))

Page 13: Knowledge of  Language Origin Improves   Pronunciation Accuracy

CART example ((a

((n.n.n.name is 0) ((n.name is #) ((p.name is e) ((p.p.p.name is #) ((_epsilon_)) ((p.p.p.name is c) ((_epsilon_)) ((ax)))) ((ax))) ((n.name is y) ((p.p.p.name is #) ((ey1)) ((p.p.p.name is 0) ((ey1)) ((p.name is w) ((p.p.p.name is e) ((ey1)) ((p.p.p.name is t) ((ey)) ((p.p.p.name is n) ((2nd-best-lang is "english.train") ((ey)) ((ey1))) ((2nd-best-lang is "czech.train") ((p.p.p.name is d) ((ey1)) ((ey))) ((ey1)))))) ((p.name is d) ((2nd-best-lang is "english.train") ((p.p.p.name is l) ((ey)) ((ey1))) ((ey1))) ((p.p.p.name is c) ((ey1)) ((2nd-best-lang is "malaysian.train") ((p.p.p.name is m) ((ey1)) ((_epsilon_))) ((2nd-best-lang is "czech.train") ((_epsilon_)) ((ey)))))))))

Page 14: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Results

Lexicons Letters Words

PN-base-5 89.02% 54.08%

PN-lang-5 91.23% 61.72%

PN-base-8 90.29% 52.88%

PN-lang-8 90.63% 59.77%

CMUDICT 91.99% 57.80%

ODALD 95.80% 74.56%

Page 15: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Rho’s example

Cepstral’s talking head ./oscars-example

Page 16: Knowledge of  Language Origin Improves   Pronunciation Accuracy

User Studies From the names that both PN-base-8 and PN-

lang-8 got “wrong” (did not exactly match the CMUDICT pronunciation in the test set), I selected the ones for which the two models assigned a different pronunciation (112), and from those, I selected 20 at random to run perceived accuracy user studies.

Overall, the perceived accuracy of the PN-lang-8 model was 17% higher (PN-lang-8: 46%, PN-base-8: 29%, no preference: 25%).

… or a 60% relative improvement

Page 17: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Upper bound

UB is determined by: - how noisy the data is - how much language origin info can really help us in this task [ hard to

estimate without having reliably labeled data]

… - what about adding prior probabilities?

Page 18: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Priors For each language, we could have a prior

probability that would tell us how likely it is to find a name in that language, independently of the name. If our model were trained from newswires data instead of directory listings, it would be relatively easy to determine such priors. E.g.:

“Yesterday in Barcelona, the mayor Joan Clos inaugurated the Forum of Cultures…”,

P(Catalan) = 0.8 P(Spanish) = 0.15 P(all other languages) ~ 0

Page 19: Knowledge of  Language Origin Improves   Pronunciation Accuracy

What I’m working on now

Unsupervised clustering of proper names taking the pronunciation into account.

Traditionally, people working on grapheme to phoneme conversion only looked at the written words, but not at the actual pronunciation

Page 20: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Second approach

- Convert a word into a bunch of features of the form: l1 l2 l3 ph2

i.e. a letter in context (trigram) and the phone it is aligned to

- Bottom-up unsupervised clusteringCriterion: merge two clusters unless there is a clash

Page 21: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Defining clash

Two clusters will merged if the contexts (trigrams) are different or if, given a common context, it is aligned to the same phone on both clusters.

Example

Page 22: Knowledge of  Language Origin Improves   Pronunciation Accuracy

References - Black, A., Lenzo, K. and Pagel, V. Issues in Building General

Letter to Sound Rules. 3rd ESCA Speech Synthesis Workshop, pp. 77-80, Jenolan Caves, Australia, 1998

- CMUDICT. Carnegie Mellon Pronunciation Dictionary. 1998. http://www.speech.cs.cmu.edu/cgibin/cmudict

- Church, K. (2000). Stress Assignment in Letter to Sound rules for Speech Synthesis (Technical Memoradnum). AT&T Labs –Research. November 27, 2000.

- Chotimongkol, A. and Black, A. Statistically trained orthographic to sound models for Thai. Beijing October 2000.

- Tomokiyo, T. Applying Maximum Entropy to English Grapheme-to-Phoneme Conversion. LTI, CMU. Project for 11-744, unpublished. May 9, 2000.

- Ghani R., Jones R. and Mladenic D. Building Minority Language Corpora by Learning to Generate Web Search Queries. Technical Report CMU-CALD-01-100, 2001. http://www.cs.cmu.edu/~TextLearning/corpusbuilder/

Page 23: Knowledge of  Language Origin Improves   Pronunciation Accuracy

Question & Ideas

… Thanks