intrinsic microprosodic variations in estonian and finnish. acoustic analysis

Fonetiikan päivät 2006 — The Phonetics Symposium 2006

INTRINSIC MICROPROSODIC VARIATIONS IN ESTONIANAND FINNISH: ACOUSTIC ANALYSIS

Einar Meister*, Stefan Werner**

*Laboratory of Phonetics and Speech TechnologyInstitute of Cybernetics at Tallinn University of Technology

[email protected]

**General Linguistics & Language TechnologyUniversity of Joensuu

[email protected]

AbstractThe aim of our joint research work is to provide comparative data of intrinsic characteristics ofEstonian and Finnish vowels in order to test their role in perception and adopt the results inprosody models for TTS. In this paper some preliminary results of an acoustic analysis ofintrinsic duration and fundamental frequency of Estonian and Finnish vowels are reported.

Keywords: intrinsic duration, intrinsic fundamental frequency, microprosody.

1 GeneralIn general, the prosody models of text-to-speech (TTS) systems master well larger units(words, sentences, paragraphs) but microprosodic features of phonemes are poorlycontrolled. It is often assumed that good suprasegmental modeling only is ofimportance whereas intrinsic microprosodic variations of pitch, duration and intensitycan largely be neglected. And while there seems to be broad unanimity not only inspeech technology but also in phonetics (as can be seen from current textbooks in thesefields) about the factual existence of a family of phenomena most often calledmicroprosody, surprisingly little has been published in terms of systematic empiricaldescriptions of these phenomena based on more than small and often accidentalsamples.

The objective of our paper is to report on the initial results of a project aimed atproviding just such a reliable empirical basis for the study of microprosody in Estonianand Finnish. We analyzed recorded read speech of Estonian and Finnish speakers forvowel F0 and duration in order to compare the results with claims aboutmicrointonation and microduration made in the literature. Our first results partlyconfirm these claims but also show more variation and fuzzier category boundaries thanwas to be expected on the basis of earlier research.

103

mailto:[email protected]

mailto:[email protected]

Meister & Werner: Intrinsic Microprosodic Variations

2 Background

2.1 MicroprosodyMicroprosody is widely considered to be a universal feature of human speech (seminalresearch in the area includes Meyer 1937, Black 1949, Peterson and Lehiste 1960,Lehiste and Peterson 1961). It is assumed that due to anatomical and physiologicalfactors the articulation of different vowels affects prosodic parameters in specificsystematic ways, independently from speakers' intentional control of their articulationprocesses. The same is held to be true for the influence of consonant articulation on theprosodic realization of adjacent vowels. Since it is believed that all such minor butmeasurable local prosodic modifications are caused by articulatory constraints and arethus 'hardwired' into the speech sounds, researchers also refer to these characteristics asintrinsic and co-intrinsic F0, duration and intensity of vowels.

Microprosodic variation often is assumed to be irrelevant for perception. Hardlyany speech synthesis system, for example, provides means for microprosodic control ortakes this variation into account at all (notable exceptions include Kohler 1990 andVainio et al. 1997). Instead, intrinsic variation is regarded as negligible, a kind of noise.Likewise, empirical models of prosody aim at filtering out microprosody, seeing itseffects merely as local perturbations of the higher-domain trends to be captured by themodel (see e.g. Hirst's MOMEL algorithm (Hirst et al. 2000) which separates micro-from macro-f0).

2.2 Assumed universalsIntrinsic F0 has been reported already more than one hundred years ago (Meyer 1897)but also findings on intrinsic duration and intensity have by now acquired the status ofwidely shared assumptions. They are summarized for vowels in Table 1.

Table 1. Intrinsic features of vowels

Open vowels Close vowelsF0 lower higherDuration longer shorterIntensity higher lower

Voiced consonants tend to exhibit lower F0 than neighboring vowels.As to co-intrinsic effects, vowel F0 tends to be higher after unvoiced than after

voiced consonants (Löfqvist et al. 1989) and vowel duration to be shorter before anunvoiced consonant than before a voiced one. The most comprehensive descriptions ofintrinsic prosody phenomena to date can be found in Di Cristo 1985 and, for intrinsic F0only, Whalen and Levitt 1995. Unfortunately, most of the accounts are based on verysparse data only.

2.3 Physiological motivation(s)Intrinsic duration of vowels is explained by different articulatory effort necessary for theproduction of different vowels. The more energy has to be spent in a certain timeinterval the longer the interval seems – extending this psychological reality to the vowelproduction Meyer (1903) concludes that higher energy consumption needed forarticulation of high vowels results in a longer subjective interval, consequently highvowels are produced shorter.

104


Another hypothesis claims that longer duration of low vowels is the result of thelonger distance the articulatory organs have to traverse during the production of lowvowels (Jespersen 1920).

According to the so-called tongue-pull hypothesis, the angle between cricoid andthyroid cartilages changes – thus modifying F0 – as the larynx position shifts verticallywith tongue movement (Honda 2004). This could explain the F0 difference betweenclose and open vowels.

Increasing subglottal pressure has been shown to be used as a compensation forshortness of vowels (Fischer-Jørgensen 1990), and changes in subglottal pressure havealso been connected to the intrinsic F0 phenomenon (e.g. Vilkman et al. 1991). Otherfactors that have been suggested as causal include hyoid-laryngeal changes andcricothyroid muscle activity (Vilkman et al. 1989).

Both intrinsic F0 and intrinsic duration have also been attributed to acompensation conditioned by different resonance factors of the vocal tract(Neweklowsky 1975). For conclusive evidence in favor of any of the mentionedexplanations (or perhaps an entirely new insight) we will still have to wait.

2.4 Microprosody in Estonian and FinnishThere is very few analysis data available about Estonian and Finnish microprosodicfeatures. For Estonian, the analysis of segmental durations and F0 has been carried outmainly in the context of word prosody with the focus on the Estonian quantity degrees(Liiv 1961, Eek & Meister 1998, Eek & Meister 2003). In Finnish, experiments ofmodeling of microprosodic features in speech synthesis using artificial neural networkshave been reported (Aulanko 1985; Vainio & Altosaar 1996, 1998; Vainio et al. 1997;Vainio et al. 1999). Actually, none of the studies were designed specifically for themeasurement of Estonian or Finnish microprosodic features; therefore the available datarepresents rather higher level prosodic phenomena and cannot be interpreted as “purely”intrinsic.

In the case of quantity languages like Finnish and Estonian, the question ofintrinsic duration is of special interest as the speakers have to carefully control segmentdurations in order to distinguish between short and long sounds. Does intrinsic durationmanifest itself also in different quantity oppositions – that is an additional item of studyin the case of Finnish and Estonian.

3 Methodological issuesIntrinsic duration and F0 should manifest themselves as a function of vowel quality incase other factors are kept constant. The crucial problem of acoustic analysis of intrinsicfeatures lies in the appropriate design of speech material, i.e. in how to keep differentpossible influencing factors constant. In the case of spontaneous speech and readingaloud, meaningful sentences with variable content different higher prosodic levels aredominating and probably the intrinsic features are not “visible”. Instead, carefullycontrolled laboratory speech should be used for acoustic analysis. Ideally, the speechsamples should be recorded by reading of non-sense CVC words in a short framesentence with constant articulation rate and at constant fundamental frequency.

The other important issue concerns the methods of segmentation and F0extraction. As the inter-vowel differences of intrinsic duration lie in the range of 5 to 15ms, the results are very sensitive to segmentation errors. Thus, only manualsegmentation can provide reliable results. For F0 extraction different algorithms should

105

https://www.researchgate.net/publication/228782658_Physiological_factors_causing_tonal_characteristics_of_speech_from_global_to_local_prosody?el=1_x_8&enrichId=rgreq-46625c13-d062-4830-afa2-7faa1d9a11d6&enrichSource=Y292ZXJQYWdlOzI2MjA2ODM1NjtBUzoxMDIxNjUzMTQwMTUyNDZAMTQwMTM2OTUxNzU4MQ==

https://www.researchgate.net/publication/21026416_Intrinsic_F_0_in_Tense_and_Lax_Vowels_with_Special_Reference_to_German?el=1_x_8&enrichId=rgreq-46625c13-d062-4830-afa2-7faa1d9a11d6&enrichSource=Y292ZXJQYWdlOzI2MjA2ODM1NjtBUzoxMDIxNjUzMTQwMTUyNDZAMTQwMTM2OTUxNzU4MQ==

https://www.researchgate.net/publication/258690036_Articulatory_hyoid-laryngeal_changes_vs_cricothyroid_muscle_activity_in_the_control_of_intrinsic_F0_of_vowels?el=1_x_8&enrichId=rgreq-46625c13-d062-4830-afa2-7faa1d9a11d6&enrichSource=Y292ZXJQYWdlOzI2MjA2ODM1NjtBUzoxMDIxNjUzMTQwMTUyNDZAMTQwMTM2OTUxNzU4MQ==

https://www.researchgate.net/publication/258690211_Is_subglottal_pressure_a_contributing_factor_to_the_intrinsic_F0_phenomenon?el=1_x_8&enrichId=rgreq-46625c13-d062-4830-afa2-7faa1d9a11d6&enrichSource=Y292ZXJQYWdlOzI2MjA2ODM1NjtBUzoxMDIxNjUzMTQwMTUyNDZAMTQwMTM2OTUxNzU4MQ==


be tested and manual correction can be applied when necessary. Which value of F0-curve of a vowel – start, mid, end, min, max, median or mean – represents best intrinsicF0, must be decided by the experimenter, as well.

4 Material

4.1 EstonianTo investigate the intrinsic characteristics in Estonian the CVC blocks of two female(speaker codes HH and KV) and two male (speaker codes AE and PE) speakers fromthe Estonian BABEL Database (Meister & Eek 1999) were used. The CVC blocksinclude all Estonian vowels in the context of plosives:

Block V1, short vowels (in SAMPA transcription)tit:t tet:t t{t:t tyt:t t2t:t t7t:t tut:t tot:t tAt:ttit’:t tet’:t t{t’:t tyt’:t t2t’:t t7t’:t tut’:t tot’:t tAt’:tkik:k kek:k k{k:k kyk:k k2k:k k7k:k kuk:k kok:k kAk:kpip:p pep:p p{p:p pyp:p p2p:p p7p:p pup:p pop:p pAp:p

Block V2, long vowels (in SAMPA transcription)tiit:t teet:t t{{t:t tyyt:t t22t:t t77t:t tuut:t toot:t tAAt:ttiit’:t teet’:t t{{t’:t tyyt’:t t22t’:t t77t’:t tuut’:t toot’:t tAAt’:tkiik:k keek:k k{{k:k kyyk:k k22k:k k77k:k kuuk:k kook:k kAAk:kpiip:p peep:p p{{p:p pyyp:p p22p:p p77p:p puup:p poop:p pAAp:p

The blocks were read and digitally recorded (16 bit, sampling frequency 20 kHz)in a sound-treated room using high-quality microphone. The subjects were instructed toread the words line by line at suitable speaking rate keeping F0 on constant level andavoid F0 fall at the end of lines.

The signals were manually segmented on the phonemic level using Praat. AlsoF0-extraction was carried out in Praat applying its autocorrelation method.

4.2 FinnishFor the acoustic analysis of Finnish, no CVC material was available. Instead, recordingsfrom the Suopuhe research corpora for speech synthesis were used. They consist of thetext of 25 newspaper articles, each read by one female and one male professionalnewsreader and manually segmented and annotated on the phoneme level. Both shortand long vowels in different consonantal contexts were acoustically analyzed.

5 Preliminary resultsAs expected, the preliminary analysis results of Estonian and Finnish speech bothexhibit systematic differences between open and close vowels and harmonize with thedata of other languages studied earlier. The Estonian results show the intrinsic effectsmore distinctly as the Estonian speech material is especially appropriate for this kind ofstudy. As expected, Finnish results are more ambiguous.

5.1 Estonian resultsF0 analysis of vowels in CVC-context shows that there are systematic variations in thefundamental frequency of high, mid and low vowels in both short and long vowels.

106


Based on the current data, the F0 difference is on average around 6 Hz both betweenhigh and mid and between mid and low vowel groups. The intrinsic F0 values given inTable 2 are averaged over different plosive contexts. The influence of context as well asinter-speaker variability need further detailed analysis. Distribution of measurementdata is illustrated in Figure 1 (male speakers) and Figure 2 (female speakers).

Table 2. Intrinsic F0 values of Estonian short and long vowels in CVC-contextVowel AE PE HH KV

i 170 133 205 262ü 157 123 201 260u 161 125 197 257e 158 120 201 255ö 152 119 196 246õ 161 126 195 260o 159 117 192 251ä 155 112 197 246a 151 105 190 247

High 162 127 201 260Mid 158 120 196 253Low 153 109 194 246

ii 180 130 184 225üü 171 118 174 221uu 168 122 177 218ee 166 120 173 218öö 167 117 171 216õõ 171 123 174 218oo 169 117 172 210ää 165 115 172 213aa 153 104 166 208

High 173 123 178 222Mid 168 119 172 215Low 159 110 169 211

Average F0 ofshort vowels, Hz

Average F0 oflong vowels, Hz

Average F0 ofvowel groups, Hz

Average F0 ofvowel groups, Hz

Figure 1. Box plots (indicating median, upper and lower quartile as well asminimum and maximum) of intrinsic F0 values of Estonian vowel groups in thecase of short and long vowels. Left: male speaker AE; right: male speaker PE

120

130

140

150

160

170

180

190

High Mid Low High Mid Low

Shor t vow e ls Long vow e ls

F0, H

z

90

100

110

120

130

140

150


Shor t vow e ls Long vow e ls

F0, H

z

107


Figure 2. Intrinsic F0 values of Estonian vowel groups in the case of short andlong vowels. Left: female speaker HH; right: female speaker KV

Duration data obtained in the study is in good harmony with previous knowledgeabout intrinsic duration, but there are substantial differences between short and longvowels. The initial hypothesis – intrinsic durations occur in short vowels and exhibitvaguely in long ones – is supported by the data. In the case of short vowels thedifference between each vowel groups is in average around 6 ms; in the data of longvowels the difference between high and mid vowels is even larger – around 15 mswhereas low vowels tend to be about 5 ms shorter than mid vowels.

Although contextual differences exist the data from different plosive contexts isaveraged and presented in Table 3. Individual variations are due to different speechrates of speakers, variability in speakers' data is shown in Figure 3 and Figure 4.

Table 3. Intrinsic durations of Estonian short and long vowels in CVC-context

Vowel AE PE HH KVi 102 66 59 71ü 106 65 59 77u 101 63 60 80e 115 70 62 77ö 127 71 69 88õ 114 66 61 83o 108 69 63 83ä 122 74 72 88a 109 74 66 90

High 103 65 59 76Mid 116 69 64 83Low 116 74 69 89

ii 218 200 181 290üü 231 212 228 329uu 221 232 205 334ee 242 217 206 329öö 261 230 219 369õõ 244 212 211 347oo 253 232 211 334ää 259 219 221 354aa 238 205 203 315

High 223 215 205 318Mid 250 222 212 345Low 248 212 212 334

Average durationof short vowels,

ms

Average durationof vowel groups,

ms

Average durationof long vowels, ms

Average durationof vowel groups,

ms

150

160

170

180

190

200

210

220


Short vow els Long vow els

F0, H

z

190

210

230

250

270

290


Short vow e ls Long vow e ls

F0, H

z

108


Figure 3. Intrinsic durations of Estonian vowel groups in the case of short andlong vowels; male speakers

Figure 4. Intrinsic durations of Estonian vowel groups in the case of short andlong vowels; female speakers

5.2 Finnish resultsCorresponding tables and diagrams of measurement value distributions for Finnish aregiven in Tables 4 and 5 and Figures 5 to 7. The results for Finnish are overall less clearthan for Estonian, which was to be expected since the material used was considerablyless controlled and thus contained more variation than the Estonian CVC corpus. Inparticular, the dispersion of the Finnish F0 values is much higher, not only because ofthe more complex material but probably also due to the fact that vowel position withinword and utterance was not marked in the transcription and thus could not be taken intoaccount in the F0 summary calculations.

Short vowels

40

60

80

100

120

140

160


Speaker AE Speaker PE

Dur

atio

n, m

s

Long vowels

150

180

210

240

270

300


Speaker AE Speaker PE

Dur

atio

n, m

s

Short vowels

40

60

80

100

120


Speaker HH Speaker KV

Dur

atio

n, m

s

Long vowels

120

160

200

240

280

320

360

400

High Mid Low High Mid LowSpeaker HH Speaker KV

Dur

atio

n, m

s

109


Table 4. Intrinsic F0 values of Finnish short and long vowels

Male Female Male Femalei 88 157 i: 91 164y 87 153 y: 87 147u 88 160 u: 90 164e 87 153 e: 85 152ö 87 156 ö: 80 152o 90 160 o: 83 147ä 87 150 ä: 86 152a 88 154 a: 85 151

High 88 157 High 89 158Mid 88 156 Mid 83 150Low 87 152 Low 86 152

Intrinsic F0, Hz Intrinsic F0, HzShortvowels

Longvowels

Figure 5. Intrinsic F0 values of Finnish vowel groups in short and long vowels.Left: male speaker; right: female speaker

Table 5. Intrinsic durations of Finnish short and long vowels

Male Female Male Femalei 52 68 i: 109 122y 62 70 y: 98 112u 60 67 u: 128 136e 60 74 e: 106 122ö 47 62 ö: 162 172o 74 81 o: 102 139ä 59 75 ä: 128 144a 62 80 a: 120 139

High 58 68 High 112 123Mid 60 72 Mid 123 144Low 61 78 Low 124 142

Shortvowels

Intrinsic duration, ms Longvowels

Intrinsic duration, ms

60

80

100

120

140

160

180


Short vowels Long vowels

F0, H

z

60

110

160

210

260

310

360

410


Short vowels Long vowels

F0, H

z

110


Short vowels

0

50

100

150

200

250

300


Male speaker Female speaker

Dur

atio

n, m

s

Figure 6. Intrinsic durations of Finnish vowel groups: short vowels

Long vowels

40

80

120

160

200

240

280

320


Male speaker Female speaker

Dur

atio

n, m

s

Figure 7. Intrinsic durations of Finnish vowel groups: long vowels

6 SummaryFor most of our data, duration of open short vowels is 10 to 15 ms longer than durationof close short vowels and F0 is approx. 10 to 15 Hz lower, correspondingly. However,there are considerable contextual and individual differences for both parameters. Moredata will have to be collected and analyzed before reliable description of thesemicroprosodic features for various speaking styles can be formulated.

7 ReferencesAulanko, R. (1985). Microprosodic features in speech: experiments on Finnish. In XIII Fonetiikan

päivät Turku 1985 / XIII Meeting of Finnish Phoneticians — Turku 1985 (eds. Aaltonen,O. & Hulkko, T.). Publications of the Department of Finnish and General Linguistics ofthe University of Turku, pp. 33-54.

Black, J. W. (1949). Natural frequency, duration, and intensity of vowels in reading. Journal ofSpeech and Hearing Disorders 14: 216-221.

Di Cristo, A. (1985). De la microprosodie à l'intonosyntaxe. Publications Université de Provence.

111


Eek, A. & Meister, E. (1998). Quality of standard Estonian vowels in stressed and unstressedsyllables of the feet in three distinctive quantity degrees. Linguistica Uralica 3, 226-233.

Eek, A. & Meister, E. (2003). Foneetilisi katseid ja arutlusi kvantiteedi alalt. Häälikukestusimuutvad kontekstid ja välde. Keel ja Kirjandus 11: 815-837, 12: 904-918.

Fischer-Jørgensen, E. (1990). Intrinsic F0 in tense and lax vowels with special reference toGerman. Phonetica 47, 99-140.

Hirst, D. J., Di Cristo, A. & Espesser, R. (2000). Levels of representation and levels of analysisfor intonation. In M. Horne (ed.), Prosody: Theory and Experiment. (pp. 37-88).Dordrecht: Kluwer.

Honda, K. (2004). Physiological factors causing tonal characteristics of speech: from global tolocal prosody. Proceedings of Speech Prosody, Nara.

Jespersen, O. (1920). Lehrbuch der Phonetik. Berlin.Kohler, K. J. (1990). Macro and micro F0 in the synthesis of intonation. In: Papers in Laboratory

Phonology I (eds. J. Kingston & M.E. Beckman), Cambridge: Cambridge UniversityPress, pp. 115-138.

Lehiste, I. & Peterson, G.E. (1961). Some basic considerations in the analysis of intonation,Journal of the Acoustical Society of America 33(4): 419-425.

Liiv, G. (1961). Eesti keele kolme vältusastme vokaalide kestus ja meloodiatüübid. Keel jaKirjandus 1961, nr 7, lk 412-424; nr 8, lk 480-490.

Löfqvist A., Baer, T., McGarr, N. & Story, R. S. (1989). The cricothyroid muscle in voicingcontrol. Journal of the Acoustical Society of America 85(3): 1314-1321.

Meister, E. & Eek, A. (1999). Estonian Phonetic Database. EU Copernicus Programme, ProjectNo. 1304 “BABEL – A Multi-Language Database”. Tallinn.

Meyer, E. A. (1897). Zur Tonbewegung des Vokals im gesprochenen und gesungenen Einzelwort.Phonetische Studien (Beiblatt zu der Zeitschrift Die Neueren Sprachen) 10: 1-21.

Meyer, E.A. (1903). Englishe Lautdauer. Uppsala.Meyer, E. A. (1937). Die Intonation im Schwedischen. Stockholm.Neweklowsky, G. (1975). Specific duration and specific tongue height of vowels. Phonetica

32(1): 38-60.Peterson, G.E. and Lehiste, I. (1960). Duration of syllable nuclei in English, Journal of the

Acoustical Society of America 32(6): 693-703.Vainio, M. & Altosaar, T. (1996). Pitch, loudness, and segmental duration correlates: towards a

model for the phonetic aspects of Finnish prosody. In Proceedings ICSLP 96: the FourthInternational Conference on Spoken Language Processing, Philadelphia, PA, October 3-6, 1996, pp. 2052-2055.

Vainio, M. & Altosaar, T. (1998). Pitch, loudness, and segmental duration correlates in Finnishprosody. In Nordic prosody: proceedings of the VIIth conference, Joensuu 1996 (ed. S.Werner), Frankfurt a.M.: Peter Lang, pp. 247-255.

Vainio, M., Altosaar, T., Karjalainen, M. & Aulanko, R. (1997). Modeling Finnish microprosodyfor speech synthesis. In Intonation: theory, models and applications. Proceedings of anESCA Workshop, September 18-20, 1997, Athens, Greece, pp. 309-312.

Vainio, M., Altosaar, T., Karjalainen, M., Aulanko, R. & Werner, S. (1999). Neural NetworkModels for Finnish Prosody. Proceedings of the XIVth ICPhS, pp. 2347-2350.

Vilkman, E., Aaltonen, O. & Raimo, I. (1991). Is subglottal pressure a contributing factor to theintrinsic F0 phenomenon? In Proceedings of the XIIth ICPhS 19.-24.8.1991, Aix-en-Provence.

Vilkman, E., Aaltonen, O., Raimo, I., Arajärvi, P. & Oksanen, H. (1989). Articulatory hyoid-laryngeal changes vs. cricothyroid muscle activity in the control of intrinsic Fo of vowels.Journal of Phonetics 17, 193-203.

Whalen, D. H. & Levitt, A. G. (1995). The universality of intrinsic Fo of vowels. Journal ofPhonetics 23: 349-366.

112

PUBLICATIONS OF THE DEPARTMENT OF SPEECH SCIENCES UNIVERSITY OF HELSINKI

HELSINGIN YLIOPISTON PUHETIETEIDEN LAITOKSEN JULKAISUJA

* 53 *

FONETIIKAN PÄIVÄT 2006 THE PHONETICS SYMPOSIUM 2006

toim./ed. Reijo Aulanko, Leena Wahlberg & Martti Vainio

2006

Puhetieteiden laitos Department of Speech Sciences Helsingin yliopisto University of Helsinki PL 9 (Siltavuorenpenger 20 A) P.O.Box 9 (Siltavuorenpenger 20 A) 00014 Helsingin yliopisto FI-00014 University of Helsinki ISSN 1795-2425 ISBN 978-952-10-3663-7 (nid./paperback) ISBN 978-952-10-3664-4 (PDF, http://ethesis.helsinki.fi) Hakapaino Oy, Helsinki 2006 Copyright © The Authors and the Department of Speech Sciences, University of Helsinki 2006

intrinsic microprosodic variations in estonian and finnish. acoustic analysis

Documents