the long-term retention of fine- grained phonetic details: evidence from a second language voice...

The long-term retention of fine-grained phonetic details:

evidence from a second language voice identification training task

Steve WintersCAA Presentation

Victoria, BCOctober 13, 2010

Basic Precepts• Exemplar theory: listeners store in memory every

speech experience they have in their lifetime (Johnson, 2007).

• Including all details of those experiences.

• Variability forms an inherent (and informative) part of linguistic representations.

• Evidence: interactions in speech processing between indexical and linguistic information.

1. Word recognition is easier for familiar voices. (Nygaard and Pisoni, 1998)

2. Talker recognition is easier in familiar languages. (Goggin et al., 1991; Perrachione et al., 2009)

Bilingual Talker Interactions• Winters et al. (2008) tested generalization of bilingual

voice recognition across languages.

1. Listeners trained to identify voices speaking in English:

• Showed reduced identification accuracy in German

• (language-dependent knowledge)

2. Listeners trained to identify voices speaking in German:

• Showed equivalent ID accuracy in English

• (language-independent knowledge)

• Levi et al. (submitted): listeners trained to identify talkers speaking in German do not show a word recognition advantage for those talkers in English.

L2 Speech Perception• Indexical and linguistic information do not seem to interact when listeners learn to identify German voices.

• Q: Are L2 stimuli not stored in exemplar fashion?

• I.e., are phonetic details lost in memory?

• Note: non-native sound contrasts can often be difficult for second language learners to acquire.

• Japanese listeners have difficulty discriminating between English /l/ + /r/ (Miyawaki et al., 1975).

• English listeners have difficulty discriminating between Thai voiced + unaspirated stops. (Abramson + Lisker, 1970).

• Perhaps listeners only store in memory what they know how to label. (Pierrehumbert, 2001)

Empirical Ambitions• Thai contains a variety of phonetic features which are

not contrastive in English:

• Lexical tones, vowel length, three-way VOT contrast (voiced ~ unaspirated ~ aspirated stops)…

• Can listeners encode this information in long-term memory?

• Experimental goal: train listeners to identify Thai voices which are associated with a particular phonetic property.

• (an implicit perception task)

Experimental Design• Example talker identification training paradigm:

• Talker A is associated with Tone 1

• Talker B is associated with Tone 2

• Talker C is associated with Tone 3, etc.

• Q1: How much do these phonetic associations improve talker identification accuracy over a control condition?

• Q2: How much is identification accuracy impaired when the tone-talker associations no longer hold?

• Generalization:

• Talker A is presented with not-Tone 1

• Talker B is presented with not-Tone 2, etc.

Experimental Design• Four different training conditions:

1. Tone-talker associations

2. VOT-talker associations

3. (Vowel-talker associations)

4. Control: no consistent associations between talkers and phonetic properties

• Anticipated hierarchy of talker ID accuracy:

• Tone associations > Vowel associations > VOT associations

• (primarily for reasons of cue duration)

Exp. 1: Talker-Tone Associations• 21 native English listeners learned to identify 5

Thai/English bilingual voices.

• Training paradigm: 6 learning sessions (2 on each day)

• familiarization, training w/feedback, testing

• In these training sessions, each voice produced only Thai words with a particular tone.

• High, Mid, Low, Falling, Rising

• Final day of experiment: generalization

1. English words

2. Novel Thai words in which previous tone-talker associations no longer held.

Talker-Tone Demo

Talker-Tone Demo

Rising

Mid

Low

High

Falling

Talker-Tone ResultsTalker-Tone Learning

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 Thai English

Testing Session

Percent Corect

Tone

Talker-Tone Results• Rapid (and consistent) learning of voices during training

• Generalization:

• No effect of language

• Worse performance than on initial session

• Note: Thai generalization performance statistically equivalent to performance on first feedback session.

• Generalization mistakes:

• 37.6% gave the talker associated with the stimulus tone in training.

• (remember that chance = 1/4 = 25%)

• Conclusion: listeners used tone as a cue to voice identity.

Exp. 2: Talker-VOT Associations• 20 native English listeners learned to identify six Thai/English bilingual voices.

• Identical training paradigm

• (with a few more stimuli)

• In training session, each voice produced only Thai words with a particular Voice Onset Time:

• Voiced, unvoiced, aspirated

• Note: two voices associated with each VOT type

• Generalization: novel English + novel Thai words

• (without the same Talker-VOT associations)

Talker-VOT Demo

Talker-VOT Demo

Aspirated

Unaspirated

Aspirated

Voiced

Voiced

Unaspirated

Talker-VOT ResultsThai Voice Learning

0%

20%

40%

60%

80%

100%


Testing Session

Percent Correct

Tone VOT

Talker-VOT Results• Result #1: Listeners do learn to identify the voices.

• Although pace of learning is slower than in Tone condition.

• Possible confounds:

• More voices to learn in VOT condition (6)

• Two voices associated with each VOT type

• Result #2: Performance does drop off significantly in generalization.

• Listeners use VOT distinctions to identify voices.

• VOT distinctions are encoded in memory.

• Note: Allen & Miller, 2004; Francis and Driscoll, 2006

Talker-VOT Mistakes• In generalization, there are three potential mistake types.

• Stimuli: Talker (VOT Type A) - Word (VOT Type B)

Mistake #1: Respond with other talker of Type A. (1/5)

Mistake #2: Respond with talker of Type B. (2/5)

Mistake #3: Respond with unrelated talker. (2/5)

• Totals:

Mistake #1 (talker bias): 20.2%

Mistake #2 (stimulus bias): 46.3%

Mistake #3 (neither): 33.4%

• VOT similarities are more salient than voice similarities.

Exp. 3: Control Condition

• 20 native English listeners learned to identify six Thai/English bilingual voices.

• Identical training paradigm to Experiment 2.

• No consistent associations in training between voices and particular phonetic properties.

• Note: essentially equivalent to German training in Winters et al. (2008)…

• with fewer speakers

• and with a different language.

0%

20%

40%

60%

80%

100%


Testing Session

% Talkers Correctly Identified

Tone VOT Control

*

*

*

Results: Experiments 1-3• In Training:

• Tone accuracy > Control + VOT accuracy in all six sessions.

• VOT accuracy > Control in sessions 3-6.

• In all conditions: accuracy is higher in session 6 than in session 1.

• In Generalization:

• No differences between learning conditions.

• But in Control: accuracy is higher for Thai stimuli than for English stimuli.

Discussion• Listeners are storing in memory low-level acoustic

cues to non-native sound contrasts.

• When they are associated with talker identity.

• Lexical tones provide more salient cues than VOT, but even VOT distinctions can be a cue to talker identity.

• Generalization to novel tokens works best in a Control condition.

• …even though rate of learning is slower in this condition, as well.

Conclusions• These results provide further evidence for exemplar-based speech processing.

• Listeners encode in memory any potential cue which can be used to perform a listening task;

• Even if those cues are not distinctive in the listener’s native language…

• Or are not necessarily accessible to conscious reflection.

• Note: a perceptual reliance on highly specific phonetic details…

• Can make generalization hard.

Thanks!• Thanks go to Kelly-Ann Casey, Tara Dainton and Sue Jackson, for all of their work in recording speakers, editing stimuli, analyzing data and running subjects through the listening experiments.

• This work was supported by a University of Calgary University Research Grants Committee starter grant.

Future Directions1. Stronger test of exemplar-based memory:

• token recognition of training items

2. Is knowledge of talkers’ voices generalizable across different voice qualities?

3. Which phonetic properties support a familiar talker advantage in word recognition across languages?

4. Does learning to identify talkers associated with particular phonetic properties facilitate the learning of non-native sound contrasts?

Experiment 4: Vowels

• Still in progress!

• 9 native English listeners learned to identify six Thai/English bilingual voices.

• Identical training paradigm to Experiment 2

• Each talker consistently produced only front, central, or back Thai vowels.

• In Generalization: talker-vowel quality associations no longer held.

• Voice/name labels were randomized between listeners.

The Thai Vowel Space

i u

e o

a

two talkers two talkers two talkers

• Note: there are also long/short vowel contrasts

Talker Identification Accuracyby Learning Condition

0%

20%

40%

60%

80%

100%


Testing Session

% Correct

Tone VOT Control Vowel

• Performance in the Vowel condition is no better (or worse) than the Control…yet.

One Persistent Issue: Talker Distinctiveness

Talker Distinctiveness, Tone Training

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

1 2 3 4 5 6 English Thai

Testing Session

D-Prime

1 2 3 4 5

Talker Distinctiveness, VOT Training

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

1 2 3 4 5 6 ge gt

Testing Session

D-Prime

1 2 3 4 5 6

Talker Distinctiveness, Control Condition

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

1 2 3 4 5 6 ge gt

Testing Session

D-prime

1 2 3 4 5 6

• One future direction: How much do talker representations depend on voice quality?

Imponderables• Q: What cues do the listeners use to make the cross-language transfer?

• One future direction:

• Copy Thai Tones onto English words.

• Do language-dependent effects emerge:

• English word recognition?

• English talker identification?

• Also try the same trick with vowel-talker associations.

• “Linguistically irrelevant” vs. “Linguistically relevant” language-independent talker information.

More Future Directions• A stronger test of exemplar memory:

• Listeners store in memory consistent cues to talker identity…

• Do they also store in memory inconsistent talker cues (found in particular tokens)?

• Plan: train listeners to identify talkers with particular (focused) phonetic associations

• Test them on training token recognition with:

• Words that differ in focused and unfocused phonetic properties.

More Future Directions• Could talker identification training--with talker-property

associations--aid L2 learners in the acquisition of non-native sound contrasts?

• Compare sound identification training regimen that:

1. alternates with talker identification training

2. alternates with a different listening task

• Does learning improve more with:

1. One-to-one talker-property associations?

2. Many-to-many talker-property associations?

the long-term retention of fine- grained phonetic details: evidence from a second language voice...

Documents

talkertone associations

talker recognition

talker b

talker identification

talker c

phonetic associations

listeners store

native english listeners