voice source characteristics in speaker segregation
DESCRIPTION
Voice source characteristics in speaker segregation. Patti Adank. Aim project : to establish whether voice source characteristics of speakers can be useful to listeners when attending to a target speaker in a multi-speaker situation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/1.jpg)
![Page 2: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/2.jpg)
Voice source characteristics in
speaker segregation
Patti Adank
![Page 3: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/3.jpg)
• Some speaker-related characteristics have been
found to be helpful:
Darwin et al. 2003, F0 (pitch) and vocal tract length (VTL)
differences between concurrent speakers help listeners attending
to the target speaker
• Aim project:
to establish whether voice source characteristics of speakers can
be useful to listeners when attending to a target speaker in a
multi-speaker situation
![Page 4: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/4.jpg)
• Speaker-related differences that might aid listeners:
- style of speech
- voice quality: creaky voice, roughness, breathiness
• My experiments:
- establish the possible relevance of acoustic aspect of a creaky
voice: jitter
• Speaker-related differences that aid listeners:
- F0 difference (if > 2 semitones)
- Vocal tract length difference (VTL) (if > 1.08)
- Effects of F0 and VTL are superadditive Darwin et al. 2003
![Page 5: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/5.jpg)
Time (s)0 0.0663689
-0.8568
0.9091
0
Pitch: periodicity of the voice source
Time (s)0 0.111872
-0.8568
0.9091
0
![Page 6: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/6.jpg)
Time (s)0 0.1065
-0.7458
0.8588
0
Jitter: a-periodicity of the voice source
![Page 7: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/7.jpg)
• Literature:
- McAdams (1989): natural jitter present in speaker’s voice may be
helpful for listeners
- Ellis (1993): segregate simultaneously presented vowels using
jitter differences alone, for a computational model
![Page 8: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/8.jpg)
How could jitter help listeners?
•Auditory Scene Analysis
- primitive segregations cues
bottom-up
involuntary listening
- schema-driven segegation cues (Bregman, 1990)
top-down
voluntary/effortful listening
![Page 9: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/9.jpg)
•Pitch =
primitive segregation cue
(Scheffers, 1983, Assmann & Summerfield, 1990 etc…)
+
schema-driven segregation cue
(Darwin et al, 2003)
![Page 10: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/10.jpg)
• Hypotheses:
0. jitter does not aid the auditory system
1. jitter is only a primitive segregation cue
2. jitter is a primitive cue AND schema-driven cue
3. jitter is only a schema-driven segregation cue
![Page 11: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/11.jpg)
• Experiments:
1. one double-vowel experiment with pitch as the experimental
factor to replicate earlier results for pitch as a primitive cue
2. one double-vowel experiment with jitter as the experimental
factor to establish if jitter is a primitive cue
3. An experiment like Darwin et al., with pitch and jitter as
factors to establish if jitter is a schema-driven cue
![Page 12: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/12.jpg)
• Experiment 1:
- Double-vowel experiment to test pitch effect
- Synthetic vowels (Klat 1990):
AH, EE, ER, OO, OR, 200 milliseconds
- five versions of each vowel:
100 Hz, +1/4 semitone (st), +1/2 st, +1 st, +2 st
![Page 13: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/13.jpg)
• Experiment 2:
- Double-vowel experiment to test jitter effect
- Synthetic vowels (Klat 1990) altered version:
AH, EE, ER, OO, OR, 200 milliseconds
- five versions of each vowel:
100 Hz, +/-1%, +/-2%, +/-4%, +/-8%
![Page 14: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/14.jpg)
• Procedure (1 & 2):
- 7 listeners (5 British-English, 2 bilingual)
- categorization pre-test (45 stimuli)
- experiment 1 (or 2):
presentation double vowel (125 combinations)
select one of 15 options
![Page 15: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/15.jpg)
![Page 16: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/16.jpg)
Results pitch
2626262626N =
Pitch
+2+1+1/2+ 1/4100 Hz
95
% C
I P
ER
CE
NT
70
60
50
40
![Page 17: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/17.jpg)
3030303030N =
Jitter
+/-8%+/-4%+/-2%+/-1%0
95
% C
I P
ER
CE
NT
70
60
50
40
Results jitter
![Page 18: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/18.jpg)
• Hypotheses:
0. jitter does not aid the auditory system
1. jitter is only a primitive segregation cue
2. jitter is a primitive cue AND schema-driven cue
3. jitter is only a schema-driven segregation cue
4. jitter is a primitive segregation cue if there is also a pitch
difference.
![Page 19: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/19.jpg)
1010101010N =
v2 = jit & f0 max
v1 = jitmax v2 = f0m
Jit max
F0 max
baseline
95
% C
I P
ER
CE
NT
80
70
60
50
40
Results jitter & pitch
![Page 20: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/20.jpg)
Is there still hope for jitter?
• Next experiment: test if jitter is schema-driven cue
Setup as in Darwin et al.:
2 sentences from same speaker presented simultaneously
attend to target sentence
report on target words
vary jitter and pitch of the sentences
![Page 21: Voice source characteristics in speaker segregation](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814fdc550346895dbda42e/html5/thumbnails/21.jpg)