phonology from a computational point of view

68
Phonology from a computational point of view Phonemes, dialects, letter-to-sound conversion March 2001

Upload: jack

Post on 11-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Phonology from a computational point of view. Phonemes, dialects, letter-to-sound conversion March 2001. Phonology:. The study of the sound patterns of languages. We will extend this to include the letter patterns of languages. Syntax. Information Retrieval. Morphology catch + PAST. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Phonology from a computational point of view

Phonology from a computational point of view

Phonemes, dialects, letter-to-sound conversionMarch 2001

Page 2: Phonology from a computational point of view

Phonology:

The study of the sound patterns of languages.

We will extend this to include the letter patterns of languages.

Page 3: Phonology from a computational point of view

Sound

Phonemic representation K AO1 T

Spelling caught

Morphology catch + PAST

SyntaxInformationRetrieval

Page 4: Phonology from a computational point of view

Why study phonology in this course?Text to speech (TTS) applications include

a component which converts spelled words to sequences of phonemes ( = sound representations).

E.g., sight S AY1 TJohn J AA1 N

Page 5: Phonology from a computational point of view

Keep separate:

Spelling ( = “orthography”) Detailed description of pronunciation Abstract description of pronunciation

called “phonemic representation”

Page 6: Phonology from a computational point of view

Agenda:1. Phonology: set of phonemes; their

realizations as phones;2. The phonemes are reasonably

constant across a language.3. The phones vary a lot within a speaker

and across speakers.4. Some of that variation is extremely

rule-governed and must be understood: example, English “flap” (in butter).

Page 7: Phonology from a computational point of view

5. In addition to the phonemes: syllable structure, and

6. Prosody. Today: stress levels: 0,1,27. Text’s discussion of spelling errors, as

a lead-in to Viterbi-ing the Minimum Edit Distance

8. Letter to sound (LTS)

Page 8: Phonology from a computational point of view

All speakers have a set of several dozen basic pronunciation units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.

This phonemic inventory is not completely fixed and stable across the United States, but it is much more fixed and stable than is the pronunciation of these phonemes.

Page 9: Phonology from a computational point of view

How is that possible?

I’m from New York; the vowel that I have in cat is very different from the vowel in a south Chicago native’s cat – but the phonemes are the same – they correspond across thousands of words.

Page 10: Phonology from a computational point of view

Phonemic inventory In computational circles, phonemic inventory

described in DARPAbet: Some words from the CMU dictionaryTHE DH AH0THE(2) DH AH1THE(3) DH IY0THEA TH IY1 AH0THEALL TH IY1 LTHEANO TH IY1 N OW0THEATER TH IY1 AH0 T ER0

Page 11: Phonology from a computational point of view

Darpabet

AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D

Page 12: Phonology from a computational point of view

AA odd AA DAE at AE TAH hut HH AH TAO oughtAO TAW cow K AWAY hide HH AY DEH Ed EH DER hurt HH ER T

EY ate EY TIH it IH TIY eat IY TOW oat OW TOY toy T OYUH hood HH UH

DUW two T UW

15 Vowels

Page 13: Phonology from a computational point of view

24 ConsonantsB be B IYD dee D IYG greenG R IY NP pee P IYT tea T IYK key K IYS sea S IYSH she SH IY F fee F IYV vee V IYDH thee DH IYTH theta TH EY T AH

Z zee Z IYZH seizure S IY ZH

ERHH he HH IYCH cheese CH IY ZJH gee JH IYL lee L IYM me M IYN knee N IYNG ping P IY NGR read R IY DW we W IYY yield Y IY L D

Page 14: Phonology from a computational point of view

Moby system http://www.dcs.shef.ac.uk/research/ilash/Moby/

/&/ sounds like the "a" in "dab" /(@)/ sounds like the "a" in "air" /A/ sounds like the "a" in "far" /eI/ sounds like the "a" in "day" /@/ sounds like the "a" in "ado" or the glide "e" in "system" (dipthong schwa) /-/ sounds like the "ir" glide in "tire" or the "dl" glide in "handle" or the "den" glide in "sodden" (dipthong little schwa) /Oi/ sounds like the "oi" in "oil" /A/ sounds like the "o" in "bob" /AU/ sounds like the "ow" in "how" /O/ sounds like the "o" in "dog"

Page 16: Phonology from a computational point of view

The tremendous variety of actual pronunciations that native speakers can blissfully ignore is staggering

But speech recognition systems need to be trained on this, just as people are in their youth.

Page 17: Phonology from a computational point of view

Varieties of sounds in everyone’s speechMost phonemes have several different

pronunciations (called their allophones), determined by nearby sounds, most usually by the following sound.

The most striking instance of such variation is in the realization of the phoneme /T/ in American English.

Page 18: Phonology from a computational point of view

We’ll return to the flap after the syllable.

Page 19: Phonology from a computational point of view

S

onset rhyme

nucleus codah e l p

The syllable

Page 20: Phonology from a computational point of view

Flap (D) in American English We find the flap of water (wa[D]er)

under these conditions strictly inside a word: Following

vowelstressed

Followingvowelunstressed

Precedingvowelstressed

rare orimpossibleBeethoven

obligatoryatom

Precedingvowelunstressed

impossibleattire,atomic

optional:sanity

Page 21: Phonology from a computational point of view

But across words: Word initial t never flaps, regardless of

stresses before or after*; eat my tomato, see Topeka...

Word-final t followed by a vowel-initial word normally does flap, regardless of stresses before or after. at all, sit on it...

*But in the words to, tonight, today, tomorrow, the to acts as if it were linked to the preceding word. “go [D]o bed”

Page 22: Phonology from a computational point of view

Generalization English permits phonemes to belong

simultaneously to two syllables ( = be ambisyllabic) under certain conditions.

Ambisyllabic t's convert to flaps.Generally speaking:

Page 23: Phonology from a computational point of view

B UH1 T ER

onset rhyme onset rhyme

This is where we get a flap in American English

Page 24: Phonology from a computational point of view

Within a word: C becomes part of syllable with a

following onset ("maximize syllable onset"):

Page 25: Phonology from a computational point of view

...within a word:

C V

Page 26: Phonology from a computational point of view

This also applies across words --in English, and in many languages, but not (e.g.) in German

V C [ #

Page 27: Phonology from a computational point of view

Within a word, ambisyllabification before an unstressed vowel

V VC

-stress+stress

e.g., atom

Page 28: Phonology from a computational point of view

But not across word boundaries

we don't say my tomato my [D]omato

Page 29: Phonology from a computational point of view

/T/ as flap: inside words

following stressed

following unstressed

preceding stressed

no flap:Beethoven,attar

flap: matter, cattle

preceding unstressed

no flap:return, Mattel

optional:sanity

Page 30: Phonology from a computational point of view

/T/ as flap at word-edgeIf a word ends in a /t/ and the next word

starts with a vowel, flap is normal:at [D] all, What [D] is your name?, etc.

If a word ends in a vowel and the next word starts with a vowel, never a flap – unless the second word starts with the prefix to- !

the [t] tomato, the [t] topology of… butgo [D] to the moon, go [D] tomorrow…

Page 31: Phonology from a computational point of view

Most computational devices avoid worrying about these issues…by (always) treating phonemes in the

context of their left- and right-hand neighbors.

Need to produce an AE? Find out what neighbors it needs to be produced next to. H AE T? Find an AE that was produced after an H and before a T.

Page 33: Phonology from a computational point of view

Ongoing changes in American English pronunciation1. Loss of difference between AA (cot)

and AO (caught). See also hot dog (h AA t d AO g).

Some speakers produce these vowels differently (I do). Others do not.

Labov’s group has produced the following map:

Page 34: Phonology from a computational point of view

AA / AO distinction/collapse:

Page 35: Phonology from a computational point of view

Distinction between vowels IH and EH before nink-pen versus baby-pin:distinction lost in the South.

Page 36: Phonology from a computational point of view

in/en distinction (pin/pen)

Page 37: Phonology from a computational point of view

Variation in AE phoneme (“hat”)

A very wide range of American speakers do NOT have the same vowels in sand and sang.

The vowels in cat and sang are the same, but in sand the vowel is much higher.

However, in the Northern Cities shift, all AE is pronounced like the last two syllables of idea – this is prevalent right here in the south Chicago area.

Page 38: Phonology from a computational point of view
Page 39: Phonology from a computational point of view

Sound – Letter relationships

LTS: Letter to sound, orPhoneme-Grapheme relationships.In most languages, this is simple.But in English and in French, it’s very messy.Why? Because the spelling system in both is

based on how the language used to be pronounced, and the pronunciation has since changed.

Page 40: Phonology from a computational point of view

Other languagesIn most other languages, spelling reflects

current pronunciation much more accurately.

Stress: most languages don’t mark which syllable is stressed. In some languages, there are simple principles that tell us which syllable is stressed, but when there are no such principles (e.g. English, Russian), then you need to build word-lists with the stressed indicated.

Page 41: Phonology from a computational point of view

Letter to sound for English

Letter >> phoneme for speech synthesis

Phoneme >> letter for speech recognition

Page 42: Phonology from a computational point of view

Challenges to Letter-to-Sound

There are always new words being found, and most of them are new proper names (people, places, products, companies, etc.)

Page 43: Phonology from a computational point of view

Damper, Marchand, Adamson and Gustafson 1998: Testing Letter to SoundThird ESCA/COCOSDA Workshop on SPEECH SYNTHESIS

November 1998They contest Liberman and Church’s statement in 1991:“We will describe algorithms for pronunciation of English words…

that reduce the error rate to only a few tenths of a percent for ordinary text, about two orders of magnitude better than the word error rates of 15% or so that were common a decade ago.”

They write,“In this paper, we have shown that automatic pronunciation of

novel words is not a solved problem in TTS synthesis. The best that can be done is about 70% words correct using PbA [Pronunciation by Analogy]…traditional rules…perform very badly – much worse than pronunciation by analogy and other data-driven approaches….”

Page 44: Phonology from a computational point of view

Damper et al.Compare 4 approaches:1. Hand-written phonological rules2. Pronunciation by analogy (based on

Dedina and Nusbaum 1991)3. Neural networks (based on Sejnowski

and Rosenberg’s NETtalk)4. Information theory-based approach

(“Nearest neighbor”)

Page 45: Phonology from a computational point of view

How to evaluate LTS?

Systems typically use 1. a large dictionary2. a set of “exceptional words”3. a backoff strategy for words that slip

through the first 2 steps.Is it fair to test the backoff strategy on

words in the first two sets, then?

Page 46: Phonology from a computational point of view

Damper et al propose:

Test on a single, entire, large dictionary; Strict scoring, not frequency-weighted,

giving credit only for full-word correct; A standardized phoneme output set

should be employed

Page 47: Phonology from a computational point of view

Evaluation

In reality, different descriptions of English use different sets of phonemes (e.g., is stress marked on the vowels? British versus American)

Issues in testing data-driven methods, because the performance of a data-driven method is tightly linked to the data it was trained on.

Page 48: Phonology from a computational point of view

Data-driven method

Learning method

Data

Letter-to-sound conversion system

Page 49: Phonology from a computational point of view

In theory, you should never test a data-driven method on data that it was trained on….

In theory, if you want to test the performance of the method on the whole dictionary, you can train the system on the whole dictionary less one word, and then test it on that word; and do all of that each time for each word.

But that takes too long! and we’re also interested in the relationship between training corpus size and total performance.

Page 50: Phonology from a computational point of view

Damper et al’s work-around

For various values of N (up to half the size of the dictionary):

Take two random samples of the dictionary, each of size N. Train on one set, test on the other.

N = 100, 500, 1000, 2000, 5000 and 8,140.

Dictionary is of size 16,280.

Page 51: Phonology from a computational point of view

Results: Hand-written rules

Elovitz et al: hand-written rules for this purpose. 25.7% of words were entirely correct. “Length errors (especially due to geminate consonants), /g/-/j/ confusions and vowel substitutions abound.” Extensive efforts were made to make sure that this low figure was not an error!

Page 52: Phonology from a computational point of view

Pronunciation by analogy Begin with a (hand-made) alignment of

letters to sounds. For every observed string of letters, gather the set of phonemes that it can be associated with, and store in data-structure along with their frequency.

For the test word, find all ways of dividing the word up into pieces that are present in the data structure. Weight the resulting analyses by (1) how many subpieces are involved, and (2) frequencies of the subpieces, and choose the best.

Page 53: Phonology from a computational point of view

Results PbA; neural net

PbA: 71.8% correct. Neural net: 54.4%, when trained on the

whole dictionary

Page 54: Phonology from a computational point of view

Information-Gain trees IB1-IG: 57.4% correctThis approach is a variant on decision-

tree learning (an important paradigm in machine learning)….

Page 55: Phonology from a computational point of view

In simplest terms, a decision-tree approach studies a problem like, “What phoneme realizes this letter in this context?” by looking at all relevant examples in the data, and considering all context data (what precedes, what follows, etc.) and deciding, first, which factor “gives the most information”:

Measure the uncertainty first: uncertainty of how this “t” should be pronounced;

Measure the uncertainty if you know what the following letter is.

Measuring uncertainty…

Page 56: Phonology from a computational point of view

Entropy as measure of uncertainty

Set of possibilities for realizing ‘t’: T 64% TH 36%calculate:0.64 * log (0.64) + 0.36 * log (0.36) and multiply by –1 = 0.94268

Page 57: Phonology from a computational point of view

realization of ‘t’:if following letter is ‘h’ (36%)T .02TH .98 Entropy: -1(.02*log(.02) + .98 log(.98) ) =.14144 (base 2 logs!)if following letter is anything else: (64%)T 1.00TH .00Entropy: -1 ( 1* log 1)+0 log 0 ) = 0Total entropy now: 0.36 * .14144 + 0 =.05092 – a huge decrease from 0.94268!

Page 58: Phonology from a computational point of view

Information gain and LTS

The idea is to use this method of testing to automatically determine which aspects of a letter’s neighborhood are most revealing in determining how that letter should be realized in that word.

But: 57.4% fully correct results in this experiment.

Page 59: Phonology from a computational point of view

Bottom line

Still a lot of work to be done – both in getting results and testing how well various methods work.

Page 60: Phonology from a computational point of view

Minimal Edit Distance

A first look at Viterbi in action

Page 61: Phonology from a computational point of view

What’s the best way to line up two different strings? To answer that question, we have to make some specifications.

One (p. 53ff in textbook, Section 5.6) could be that perfect alignments are “free”, while a deletion (non-alignment) costs 1 and a substitution costs 2.

Page 62: Phonology from a computational point of view

E X E C U T I O N

I N T E N T I O NThese are free; and there are no reduced fares for any kindof partial match for the others.

Page 63: Phonology from a computational point of view

E X E C U T I O N

I N T E N T I O N

Cost: 3 substitutions + 2 hangings = 8

Page 64: Phonology from a computational point of view

E X E C U T I O N

I N T E N T I O N

Same cost – that’s how we’ve set up the problem.

Cost: 1 substitutions + 6 hangings = 8

Page 65: Phonology from a computational point of view

N 9 10

11

10

11

12

11

10

9 8

O 8 9 10

9 10

11

10

9 8 9

I 7 8 9 8 9 10

9 8 9 10

T 6 7 8 7 8 9 8 9 10

11

N 5 6 7 6 7 8 9 10

11

12

E 4 5 6 5 6 7 8 9 10

11

T 3 4 5 6 7 8 9 10

11

12

N 2 3 4 5 6 7 8 8 10

11

I 1 2 3 4 5 6 7 8 9 10

# 0 1 2 3 4 5 6 7 8 9# E X E C U T I O N

Page 66: Phonology from a computational point of view

The chart tells us something about how we walk through it, but (the book’s not clear on this), we also have to keep track on a memo-pad what the best path was that got us to that box.

We need to find a path that only goes Right, Up, or Both (Up & Right) and leads us to the best final box.

Page 67: Phonology from a computational point of view

We can arbitrarily choose one of the best ways to get to a box in this case, because the problem at hand doesn’t set different costs depending on the row-transitions. But very frequently such costs must be borne in mind.

Page 68: Phonology from a computational point of view

N 9 10

11

10

11

12

11

10

9 8

O 8 9 10

9 10

11

10

9 8 9

I 7 8 9 8 9 10

9 8 9 10

T 6 7 8 7 8 9 8 9 10

11

N 5 6 7 6 7 8 9 10

11

12

E 4 5 6 5 6 7 8 9 10

11

T 3 4 5 6 7 8 9 10

11

12

N 2 3 4 5 6 7 8 8 10

11

I 1 2 3 4 5 6 7 8 9 10

# 0 1 2 3 4 5 6 7 8 9# E X E C U T I O N