analyzing complementary acoustic cues for signalling prominence in different languages
DESCRIPTION
Analyzing complementary acoustic cues for signalling prominence in different languages. William J. Barry Bistra Andreeva. Jacques Koreman. This talk presents the related results from three recent presentations:. Basis for this presentation. - PowerPoint PPT PresentationTRANSCRIPT
COST2102 International School - Development of Multimodal Interfaces slide 1
Analyzing complementary acoustic cues for signalling prominence in different languages
William J. Barry Bistra Andreeva
Jacques Koreman
COST2102 International School - Development of Multimodal Interfaces slide 2
Basis for this presentation
This talk presents the related results from three recent presentations:
• Koreman, J., Andreeva, B. & Barry, W.J. (2008). Accentuation cues in French and German, in: P.A. Barbosa, S. Madureira and C. Reis. Proc. Speech Prosody 2008, Campinas (Brazil), 613-616. Campinas, Brazil: Editora RG/CNPq.
• Koreman, J., Van Dommelen, W., Sikveland, R., Andreeva, B. & Barry, W.J. (in print). Cross-language differences in the production of phrasal prominence in Norwegian and German, Proc. Nordic Prosody 2008, Helsinki (Finland).
• Barry, William J. & Bistra Andreeva (2009). Cross-language and individual differences in the production and perception of syllabic prominence, Annual Meeting SPP 1234 Sprachlautliche Kompetenz 2009, Cologne (Germany).
COST2102 International School - Development of Multimodal Interfaces slide 3
Why present this here?
• Björn Granström: “Coherence between audio and video?”, e.g. between nodding and F0 in “Båten seglede forbi”.
• Kristiina Jokinen: “To what extent does non-verbal activity, esp. gestures and facial expressions, co-occur with verbal expressions?” (culture-dependence, communicative function)
Are there cross-cultural (-language) differences in importance of acoustic and visual cues? (There are for prosodic dimensions.)
Are they complementary? (Prosodic dimensions are.)
What does that mean for synchrony detection? (Trouble?)
This talk only deals with the acoustics of prominence. But because that involves several prosodic dimensions, the data analysis may also be relevant to multi-modal speech.
COST2102 International School - Development of Multimodal Interfaces slide 4
Outline
The ideas about the acoustic realization of prominence that I present here are mainly Bill Barry’s and Bistra Andreeva’s.(This is an acknowledgement, not an attempt to evade responsibility.)
from each of the three presentations
• Research questions
• Recordings
• Measurements
• Statistical analysis
• Results
• Discussion
• Conclusion and possible relevance to COST 2102
COST2102 International School - Development of Multimodal Interfaces slide 5
Research questions
• How do different languages exploit the universal means of signalling the varying prominence of words in an utterance?• duration• fundamental frequency• energy • spectral properties
• Do the different word-phonological requirements of a language affect the degree to which the properties are exploited?• duration (length opposition; word stress)• fundamental frequency (tonal word-accent)• spectral properties (phonologized vowel reduction)
COST2102 International School - Development of Multimodal Interfaces slide 6
Project
• The present work is part of a larger project funded by the German Research Council:Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited.
• The languages investigated in the projects are article 1 article 2 article 3
• German• English• Norwegian• Bulgarian• Russian• French • Japanese
COST2102 International School - Development of Multimodal Interfaces slide 7
Recordings
• Six speakers from homogeneous groups in each language• Comparable production task across languages: varying
accentuation due to different focus on critical words (CWs) elicited by questions:
• broad• narrow non-contrastive (early or late)• narrow contrastive (early or late)
• Text replies to questions followed by “dada” versionNorwegian sentences:1. Hun Siv drar med skipet snart.2. Han Karl tenker på fag nå.3. Hans far brukte sagen da.4. Min pasta blir kald til da.6. Min stabsmann forblir bak nå.7. Han Krister fikk skiftet mitt.
German sentences:1. Der Mann fuhr den Wagen vor.2. Das Bild soll nicht hässlich sein.3. Das Kind sollte im Bett sein.4. Der Peter kann den Film gucken.5. Das Mädchen soll ein Bild malen.6. Mein Vater kann Türkisch lesen.
Results given here,but checked with
text versions
B E L
text dada
COST2102 International School - Development of Multimodal Interfaces slide 8
Measurements
Duration • Duration (ms) of stressed vowels, stressed syllables, CWs, feet
F0 • Mean F0 (semitones) across stressed vowel of CW
• F0 contour by comparison of stressed vowel in CW with preceding/following vowels
Intensity • Mean intensity (dB) of stressed vowel in CW• Spectral balance = difference between
70-1000 Hz band and 1200-5000 Hz band in stressed vowel of CW
Normalized relative to mean across corresponding units in sentence
Spectr. def. • F1–F3 at middle of stressed nucleus of CW
COST2102 International School - Development of Multimodal Interfaces slide 9
Statistical analysis FR-GE (Speech Prosody data)
• Multivariate Anova’sfor CW1 and CW2 separatelywith independent variables:
• language (FR, GE)• focus (accented, deaccented)• number of syllables in CW (1,2)
• Multivariate Anova’s per language (FR, GE)
• Stepwise discriminant analyses: cue weighting• for CW1 and CW2 separately• for each language separately
COST2102 International School - Development of Multimodal Interfaces slide 10
Results: Manova’s
Main effects for language
Parameter CW1 CW2vowel dur.syllable dur.word dur.foot dur.
*******-
***********
F0 meanF0 difference
******
--
intensityspect. bal.
******
****
F1F2F3
***-*
*****-
Interactions lang. accentuation
Parameter CW1 CW2vowel dur.syllable dur.word dur.foot dur.
****-
*********-
F0 meanF0 difference
******
******
intensityspect. bal.
--
--
F1 F2F3
-**-
**-
COST2102 International School - Development of Multimodal Interfaces slide 11
CW1 syllable duration
CW1 word duration
11 111 1
22 2222
1 1 1 1 11 222222
GE FR
Results for durationsy
llabl
e du
ratio
nw
ord
dura
tion
COST2102 International School - Development of Multimodal Interfaces slide 12
CW2 word duration
111 222
in final foot
111222
1 1122
1
1 11 22 1
CW2 syllable duration
GE FR
Effects greater for French than for German
Results for durationsy
llabl
e du
ratio
nw
ord
dura
tion
COST2102 International School - Development of Multimodal Interfaces slide 13
CW1Language Parameter sdc
French
mean F0
syllable dur.vowel dur.
intensity
0.7090.665
-0.3790.328
German
intensitymean F0
word durationspect. balancevowel dur.foot dur.
0.6830.5750.399
-0.2090.1710.158
Language Parameter sdc
French
mean F0
intensityF0 change
vowel dur.word dur.
0.9620.576
-0.4190.2790.164
German
intensityvowel dur.mean F0
syllable dur.spect. balance
0.9320.6710.515
-0.430-0.345
CW2
Results: discriminant analyses
COST2102 International School - Development of Multimodal Interfaces slide 14
• Duration effects accented-deaccented in anova greater for French than for German: exploitation in German constrained due to segmental vowel length opposition??
• Spectral balance included as DA-predictor in German: reduction increases accented-deaccented opposition (but no interaction lg x accentuation in Anova’s).
• But importance of duration in French compared to German not so clear in DA, probably due to correlation between acoustic cues. DA therefore not very suitable for analyzing these data.
Discussion
COST2102 International School - Development of Multimodal Interfaces slide 15
Statistical analysis NO-GE (Nordic Prosody data)
• Multivariate Anova’sfor CW1 and CW2 separatelywith independent variables:
• language (NO, GE)• focus (broad, early narrow, late narrow)• number of syllables in CW (1,2)
• Multivariate Anova’s per language (NO, GE)
COST2102 International School - Development of Multimodal Interfaces slide 16
Results
Main effects for language
Parameter CW1 CW2
vowel dur.syllable dur.word dur.foot dur.
n.s.
n.s.
F0 meanF0 difference
intensityspect. balance
F1F2F3
Interactions lang. accentuation
Parameter CW1 CW2
vowel dur.syllable dur.word dur.foot dur.
F0 meanF0 difference
intensityspect. balance
n.s.
F1 F2F3
n.s.n.s.n.s.
n.s.n.s.n.s.
COST2102 International School - Development of Multimodal Interfaces slide 18
Results: Manova’s per language
η2-values for accentuation (for both CWs, NO and GE)*
* η2 = ratio of treatment / total variances
η2 in red > 0.5; η2 in grey n.s.
NO GEParameter CW1 CW2 CW1 CW2Vowel duration .556 .669 .038 .020Syllable duration .684 .756 .390 .168Word duration .527 .627 .335 .243Foot duration .155 .454 .035 .067F0 mean .576 .246 .837 .884F0 difference .145 .053 .709 .702Intensity .433 .428 .756 .884Spectral balance .057 .112 .123 .437F1 .134 .058 .331 .392F2 .012 .095 .022 .013F3 .003 .007 .026 .004
COST2102 International School - Development of Multimodal Interfaces slide 19
Results
• η2-values are a ratio of treatment and total variance, and thus indicate the part of the total variance explained by the focus conditions.
• In Norwegian, durational cues (esp. syllable duration) distinguish the three conditions.
• In German, intensity and F0 are the strongest cues to distinguish the three conditions.
• The lack of importance of F0 in Norwegian is most likely an artefact of the different realizations of the lexical tone 1 for mono- and disyllabic stimuli.
COST2102 International School - Development of Multimodal Interfaces slide 20
Results for intensityvo
wel
inte
nsity
vow
el in
tens
ity
• Similar patterns for (normalized) intensity for German and Norwegian
• But greater differences between early, late and broad focus in German than in Norwegian
• In Norwegian late and broad focus intensity of CW2 less than that of CW1, but not in German
GERMAN NORW.
CW1
CW2
early
late
X
Position
Bars show Means
D N
Sprache
0,00
1,40
2,80
4,20
5,60
V1_
No
rmM
ean
I
early
late
X
Position
Bars show Means
D N
Sprache
0,00
1,40
2,80
4,20
5,60
V2_
No
rmM
ean
I
earlylatebroad
Focus
COST2102 International School - Development of Multimodal Interfaces slide 21
Results for duration critical word 1sy
llabl
e du
ratio
nw
ord
dura
tion
GERMAN NORWEGIAN
• Greater (normalized) durational differen-ces between early, broad and late focus in Norwegian than in German
• Similar effect for CW2
D N
early
late
X
Position
Bars show Means
1 2
nbumber syl
0,00
50,00
100,00
150,00
200,00
KS
ilbe1
_ND
1 2
nbumber sylD N
early
late
X
Position
Bars show Means
1 2
nbumber syl
0,00
50,00
100,00
150,00
200,00
KW
ort
1_N
D
1 2
nbumber syl
1 σ 2 σ
1 σ 2 σ1 σ 2 σ
1 σ 2 σearlylatebroad
Focus
COST2102 International School - Development of Multimodal Interfaces slide 22
Results: summary
• German strongly uses intensity to signal prominence
• Norwegian uses duration more→ but Norwegian also has a vowel length opposition
and is classified as the same rhythm type as German (stress-timed), so this disconfirms the hypothesis that the use of acoustic cues depends on their phonological status in a language!
• F0 does play a role (esp. for German), but our measures do not reflect the different accent types well. → There is a difference in peak alignment of early and
late/broad focus between Norwegian and German
COST2102 International School - Development of Multimodal Interfaces slide 29
Analysis 6 languages (SPP1234 data)
• Anova’s with languages as independent variables• Dependent variable is mean change in values from
broad to contrastive focus• Mean change is expressed as a percentage (duration, F0)
or in dB (intensity)
COST2102 International School - Development of Multimodal Interfaces slide 30
Results for syllable duration of [da]
Languages use the acoustic carriers of prominence to different degrees (CS=Critical Syllable):
NO > FR > RU ~ GE > EN ~ BU CS1 46% 32% 25% 22% 17% 16%
NO > FR > RU > GE ~ BU ~ EN CS2 53% 38% 26% 17% 17% 14%
Note: No apparent connection between vowel lengthopposition and use of duration for accentuation(in contrast to Rebecca Dauer‘s claim)
COST2102 International School - Development of Multimodal Interfaces slide 31
Results for F0 in text recordings
Languages use the acoustic carriers of prominence to different degrees:
FR > EN ~ GE > BU ~ NO > RU CS1 72% 61% 58% 28% 27% 20%
GE ~ FR > EN > BU > RU > NOCS2 64% 62% 51% 38% 31% 10%
Note: Despite some shift in rank between FR, EN, GE and between NO and RU for the early (CS1) and the late position (CS2), the generally high vs. low dynamics for the groups remain (the ranking for [dada] is even more consistent)
COST2102 International School - Development of Multimodal Interfaces slide 32
Results for intensity in [dada] recordings
Languages use the acoustic carriers of prominence to different degrees (intensities in dB):
BU > FR ~ GE > RU ~ EN > NO CS1 5.8 3.2 3.0 2.7 2.5 1.6
BU > FR = GE > EN > RU > NOCS2 6.5 5.6 5.6 4.2 3.7 2.8
Note: Larger intensity differences for CS2 than CS1.
COST2102 International School - Development of Multimodal Interfaces slide 33
Conclusion and possible relevance
• For each acoustic parameter, there is a hierarchy of its exploitation for signalling focus-induced prominence in different languages.
• Similar differences may exist between languages/cultures in the way they exploit different gestures (face, hand, arm, etc.) and/or for the relative explotiation of acoustic/visual cues, e.g. to signal focus or other communicative functions.
• Possibly not only correlation (synchrony), but also complementarity of parameters.
COST2102 International School - Development of Multimodal Interfaces slide 34
Thank you for your attention