the long-term retention of fine- grained phonetic details: evidence from a second language voice...
TRANSCRIPT
The long-term retention of fine-grained phonetic details:
evidence from a second language voice identification training task
Steve WintersCAA Presentation
Victoria, BCOctober 13, 2010
Basic Precepts• Exemplar theory: listeners store in memory every
speech experience they have in their lifetime (Johnson, 2007).
• Including all details of those experiences.
• Variability forms an inherent (and informative) part of linguistic representations.
• Evidence: interactions in speech processing between indexical and linguistic information.
1. Word recognition is easier for familiar voices. (Nygaard and Pisoni, 1998)
2. Talker recognition is easier in familiar languages. (Goggin et al., 1991; Perrachione et al., 2009)
Bilingual Talker Interactions• Winters et al. (2008) tested generalization of bilingual
voice recognition across languages.
1. Listeners trained to identify voices speaking in English:
• Showed reduced identification accuracy in German
• (language-dependent knowledge)
2. Listeners trained to identify voices speaking in German:
• Showed equivalent ID accuracy in English
• (language-independent knowledge)
• Levi et al. (submitted): listeners trained to identify talkers speaking in German do not show a word recognition advantage for those talkers in English.
L2 Speech Perception• Indexical and linguistic information do not seem to interact when listeners learn to identify German voices.
• Q: Are L2 stimuli not stored in exemplar fashion?
• I.e., are phonetic details lost in memory?
• Note: non-native sound contrasts can often be difficult for second language learners to acquire.
• Japanese listeners have difficulty discriminating between English /l/ + /r/ (Miyawaki et al., 1975).
• English listeners have difficulty discriminating between Thai voiced + unaspirated stops. (Abramson + Lisker, 1970).
• Perhaps listeners only store in memory what they know how to label. (Pierrehumbert, 2001)
Empirical Ambitions• Thai contains a variety of phonetic features which are
not contrastive in English:
• Lexical tones, vowel length, three-way VOT contrast (voiced ~ unaspirated ~ aspirated stops)…
• Can listeners encode this information in long-term memory?
• Experimental goal: train listeners to identify Thai voices which are associated with a particular phonetic property.
• (an implicit perception task)
Experimental Design• Example talker identification training paradigm:
• Talker A is associated with Tone 1
• Talker B is associated with Tone 2
• Talker C is associated with Tone 3, etc.
• Q1: How much do these phonetic associations improve talker identification accuracy over a control condition?
• Q2: How much is identification accuracy impaired when the tone-talker associations no longer hold?
• Generalization:
• Talker A is presented with not-Tone 1
• Talker B is presented with not-Tone 2, etc.
Experimental Design• Four different training conditions:
1. Tone-talker associations
2. VOT-talker associations
3. (Vowel-talker associations)
4. Control: no consistent associations between talkers and phonetic properties
• Anticipated hierarchy of talker ID accuracy:
• Tone associations > Vowel associations > VOT associations
• (primarily for reasons of cue duration)
Exp. 1: Talker-Tone Associations• 21 native English listeners learned to identify 5
Thai/English bilingual voices.
• Training paradigm: 6 learning sessions (2 on each day)
• familiarization, training w/feedback, testing
• In these training sessions, each voice produced only Thai words with a particular tone.
• High, Mid, Low, Falling, Rising
• Final day of experiment: generalization
1. English words
2. Novel Thai words in which previous tone-talker associations no longer held.
Talker-Tone Demo
Talker-Tone Demo
Rising
Mid
Low
High
Falling
Talker-Tone ResultsTalker-Tone Learning
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 Thai English
Testing Session
Percent Corect
Tone
Talker-Tone Results• Rapid (and consistent) learning of voices during training
• Generalization:
• No effect of language
• Worse performance than on initial session
• Note: Thai generalization performance statistically equivalent to performance on first feedback session.
• Generalization mistakes:
• 37.6% gave the talker associated with the stimulus tone in training.
• (remember that chance = 1/4 = 25%)
• Conclusion: listeners used tone as a cue to voice identity.
Exp. 2: Talker-VOT Associations• 20 native English listeners learned to identify six Thai/English bilingual voices.
• Identical training paradigm
• (with a few more stimuli)
• In training session, each voice produced only Thai words with a particular Voice Onset Time:
• Voiced, unvoiced, aspirated
• Note: two voices associated with each VOT type
• Generalization: novel English + novel Thai words
• (without the same Talker-VOT associations)
Talker-VOT Demo
Talker-VOT Demo
Aspirated
Unaspirated
Aspirated
Voiced
Voiced
Unaspirated
Talker-VOT ResultsThai Voice Learning
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 Thai English
Testing Session
Percent Correct
Tone VOT
Talker-VOT Results• Result #1: Listeners do learn to identify the voices.
• Although pace of learning is slower than in Tone condition.
• Possible confounds:
• More voices to learn in VOT condition (6)
• Two voices associated with each VOT type
• Result #2: Performance does drop off significantly in generalization.
• Listeners use VOT distinctions to identify voices.
• VOT distinctions are encoded in memory.
• Note: Allen & Miller, 2004; Francis and Driscoll, 2006
Talker-VOT Mistakes• In generalization, there are three potential mistake types.
• Stimuli: Talker (VOT Type A) - Word (VOT Type B)
Mistake #1: Respond with other talker of Type A. (1/5)
Mistake #2: Respond with talker of Type B. (2/5)
Mistake #3: Respond with unrelated talker. (2/5)
• Totals:
Mistake #1 (talker bias): 20.2%
Mistake #2 (stimulus bias): 46.3%
Mistake #3 (neither): 33.4%
• VOT similarities are more salient than voice similarities.
Exp. 3: Control Condition
• 20 native English listeners learned to identify six Thai/English bilingual voices.
• Identical training paradigm to Experiment 2.
• No consistent associations in training between voices and particular phonetic properties.
• Note: essentially equivalent to German training in Winters et al. (2008)…
• with fewer speakers
• and with a different language.
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 Thai English
Testing Session
% Talkers Correctly Identified
Tone VOT Control
*
*
*
Results: Experiments 1-3• In Training:
• Tone accuracy > Control + VOT accuracy in all six sessions.
• VOT accuracy > Control in sessions 3-6.
• In all conditions: accuracy is higher in session 6 than in session 1.
• In Generalization:
• No differences between learning conditions.
• But in Control: accuracy is higher for Thai stimuli than for English stimuli.
Discussion• Listeners are storing in memory low-level acoustic
cues to non-native sound contrasts.
• When they are associated with talker identity.
• Lexical tones provide more salient cues than VOT, but even VOT distinctions can be a cue to talker identity.
• Generalization to novel tokens works best in a Control condition.
• …even though rate of learning is slower in this condition, as well.
Conclusions• These results provide further evidence for exemplar-based speech processing.
• Listeners encode in memory any potential cue which can be used to perform a listening task;
• Even if those cues are not distinctive in the listener’s native language…
• Or are not necessarily accessible to conscious reflection.
• Note: a perceptual reliance on highly specific phonetic details…
• Can make generalization hard.
Thanks!• Thanks go to Kelly-Ann Casey, Tara Dainton and Sue Jackson, for all of their work in recording speakers, editing stimuli, analyzing data and running subjects through the listening experiments.
• This work was supported by a University of Calgary University Research Grants Committee starter grant.
Future Directions1. Stronger test of exemplar-based memory:
• token recognition of training items
2. Is knowledge of talkers’ voices generalizable across different voice qualities?
3. Which phonetic properties support a familiar talker advantage in word recognition across languages?
4. Does learning to identify talkers associated with particular phonetic properties facilitate the learning of non-native sound contrasts?
Experiment 4: Vowels
• Still in progress!
• 9 native English listeners learned to identify six Thai/English bilingual voices.
• Identical training paradigm to Experiment 2
• Each talker consistently produced only front, central, or back Thai vowels.
• In Generalization: talker-vowel quality associations no longer held.
• Voice/name labels were randomized between listeners.
The Thai Vowel Space
i u
e o
a
two talkers two talkers two talkers
• Note: there are also long/short vowel contrasts
Talker Identification Accuracyby Learning Condition
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 Thai English
Testing Session
% Correct
Tone VOT Control Vowel
• Performance in the Vowel condition is no better (or worse) than the Control…yet.
One Persistent Issue: Talker Distinctiveness
Talker Distinctiveness, Tone Training
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
1 2 3 4 5 6 English Thai
Testing Session
D-Prime
1 2 3 4 5
Talker Distinctiveness, VOT Training
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 2 3 4 5 6 ge gt
Testing Session
D-Prime
1 2 3 4 5 6
Talker Distinctiveness, Control Condition
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 2 3 4 5 6 ge gt
Testing Session
D-prime
1 2 3 4 5 6
• One future direction: How much do talker representations depend on voice quality?
Imponderables• Q: What cues do the listeners use to make the cross-language transfer?
• One future direction:
• Copy Thai Tones onto English words.
• Do language-dependent effects emerge:
• English word recognition?
• English talker identification?
• Also try the same trick with vowel-talker associations.
• “Linguistically irrelevant” vs. “Linguistically relevant” language-independent talker information.
More Future Directions• A stronger test of exemplar memory:
• Listeners store in memory consistent cues to talker identity…
• Do they also store in memory inconsistent talker cues (found in particular tokens)?
• Plan: train listeners to identify talkers with particular (focused) phonetic associations
• Test them on training token recognition with:
• Words that differ in focused and unfocused phonetic properties.
More Future Directions• Could talker identification training--with talker-property
associations--aid L2 learners in the acquisition of non-native sound contrasts?
• Compare sound identification training regimen that:
1. alternates with talker identification training
2. alternates with a different listening task
• Does learning improve more with:
1. One-to-one talker-property associations?
2. Many-to-many talker-property associations?