acoustic characteristics of arabic fricatives · 2010-05-07 · problem of variability in the...

ACOUSTIC CHARACTERISTICSOF ARABIC FRICATIVES

By

MOHAMED ALI AL-KHAIRY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2005

Copyright 2005

by

Mohamed Ali Al-Khairy

To my father who did not live to see the fruit of his work.

ACKNOWLEDGMENTS

After finishing writing this dissertation on a rainy summer night I decided

not to bother with a lengthy acknowledgment section. After all I was the one who

wrote it. Well, leaving ego and false pride aside, this work could not have been

done without the help of many. First and foremost, thanks go to The Almighty

GOD for His guidance and blessings without which graduate school would have

been a worse nightmare. My gratitude goes also to my wonderful supervisor and

mentor Dr. Ratree Wayland whose dedication to her students, teaching, and

research is beyond highest expectations. Without her help, guidelines, constant

encouragement, and support, this work would not have been possible. Members

of my supervisory committee (Dr. Gillian Lord and Dr. Caroline Wiltshire

from Linguistics, and Dr. Rahul Shirvastav from Communication Sciences and

Disorders) were of the utmost help in the process of finishing this work.

My stay in Gainesville introduced me to many people. Most were nice and

cheerful and some one could definitively live without. I will skip the latter

group to save space. However, among such nice and wonderful people I got

to know during this journey are the wonderful students, faculty, and staff of

the Linguistics Department who were of tremendous help both personally and

academically. My special thanks and gratitude go also to Dr. Aida Bamia and Dr.

Haig Der-Houssikian from the Department of African and Asian Languages and

Literature. Their supervision, friendship, and encouragement went far beyond the

responsibilities of mentors to those of parents. For that I will be eternally grateful.

I also would like to thank my study partners, Yousef Al-Dlaigan, who was unjustly

forced to change his career, and AbdulWaheed Al-Saadi, who was brave enough

iv

to finish his Ph.D. I regret to say that I am still unclear of the process of gene

transformation in strawberry and citrus. I hope though you learned from me how

to read a spectrogram. I tried my best.

Now is the fun part: thanking my friends in the phonetics lab. Listed in

chronological order of their liberation from school are Rebecca Hill, Jodi Bray,

Philip Monahan, Sang-Hee Yeon, HeeNam Park, Victor Prieto, and Manjula

Shinge. Yet to feel the wonderful breeze outside Turlington basement are my great

friends Andrea Dallas, Bin Li, and Priyankoo Sarmah. I thank them for all the

cheerful moments and laughs we shared at the University of Florida. Although life

might take us into different routes, our friendship is eternal.

Although they are in a different time zone, I thank my friends on the west

cost and across the Atlantic for their great advice and emotional support, without

which long nights would definitely have been longer. I will send them my phone

bills later. I am sure that I left out some names; for those unintentionally missed I

extend my apologies and sincere thanks.

The acoustic analyses in this dissertaion were carried out in a timely manner

thanks to the existence of the wonderful free PRAAT program and the abundant

help and suggestion from its authors and the PRAAT user community. Also, I was

extremely fortunate to escape the nightmare of typesetting using the popular-

but-not-really-friendly commercial software. I thank Ron Smith for making his

ufthesis LATEX class freely available.

Across oceans and continents, the prayers and encouragement of my parents

and siblings were a driving force and endless motivation to finish and join them

back home. Although God had other plans for my father and older brother, I am

sure they are proud of what their prayers from high above have accomplished.

Finally, words fall short in describing my gratitude and thanks toward my wife,

Nadaa; and kids, Faisal and Farah. They have suffered through this dissertation

v

almost as much as I have; maybe even more. Through the many nights I spent at

the lab, they have shown endless patience, love, and understanding. I truly cannot

imagine having gone through this process without such amazing love and support.

Parts of this work were supported by a McLaughlin Dissertation Fellowship

from the College of Liberal Arts and Sciences, University of Florida.

vi

TABLE OF CONTENTSpage

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Fricative Production . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Acoustic Cues to Fricative Place of Articulation . . . . . . . . . . 7

2.3.1 Amplitude Cues . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Duration Cues . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Spectral Cues . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.4 Formant Transition Cues . . . . . . . . . . . . . . . . . . . 22

2.4 Studies of Arabic Fricatives . . . . . . . . . . . . . . . . . . . . . 26

3 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1.3 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.1 Segmentation of Speech . . . . . . . . . . . . . . . . . . . . 313.2.2 Acoustic Analyses . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Statistical Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 AMPLITUDE AND DURATION . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Amplitude Measurements . . . . . . . . . . . . . . . . . . . . . . . 424.1.1 Normalized Frication Noise RMS Amplitude . . . . . . . . 424.1.2 Relative Amplitude of Frication Noise . . . . . . . . . . . . 45

vii

4.2 Temporal Measurements . . . . . . . . . . . . . . . . . . . . . . . 564.2.1 Absolute Duration of Frication Noise . . . . . . . . . . . . 564.2.2 Normalized Duration of Frication Noise . . . . . . . . . . . 59

5 SPECTRAL MEASUREMENTS . . . . . . . . . . . . . . . . . . . . . . 63

5.1 Spectral Peak Location . . . . . . . . . . . . . . . . . . . . . . . . 635.2 Spectral Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.1 Spectral Mean . . . . . . . . . . . . . . . . . . . . . . . . . 715.2.2 Spectral Variance . . . . . . . . . . . . . . . . . . . . . . . 745.2.3 Spectral Skewness . . . . . . . . . . . . . . . . . . . . . . . 805.2.4 Spectral Kurtosis . . . . . . . . . . . . . . . . . . . . . . . 89

6 FORMANT TRANSITION . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.1 Second Formant (F2) at Transition . . . . . . . . . . . . . . . . . 966.2 Locus Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7 STATISTICAL CLASSIFICATION OF FRICATIVES . . . . . . . . . . 102

7.1 Discriminant Function Analysis . . . . . . . . . . . . . . . . . . . 1027.2 Classification Accuracy of DFA . . . . . . . . . . . . . . . . . . . 1037.3 Classification Power of Predictors . . . . . . . . . . . . . . . . . . 1057.4 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . 105

8 GENERAL DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.1 Temporal Measurement . . . . . . . . . . . . . . . . . . . . . . . . 1128.2 Amplitude Measurement . . . . . . . . . . . . . . . . . . . . . . . 1138.3 Spectral Measurement . . . . . . . . . . . . . . . . . . . . . . . . 1158.4 Transition Information . . . . . . . . . . . . . . . . . . . . . . . . 1188.5 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1198.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

viii

LIST OF TABLESTable page

1–1 Arabic Fricatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4–1 Relative Amplitude: Vowel Context . . . . . . . . . . . . . . . . . . . 48

4–2 Mean Relative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . 53

5–1 Spectral Peak Location . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5–2 Spectral Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5–3 Spectral Skewness: Significant Contrasts for Voiced Fricatives . . . . . 86

5–4 Spectral Skewness: Significant Contrasts for Voiceless Fricatives . . . . 86

6–1 Second Formant at Transition . . . . . . . . . . . . . . . . . . . . . . 97

6–2 Locus Equation: Slope and y-intercept . . . . . . . . . . . . . . . . . . 101

7–1 Prior Probabilities for Group Membership . . . . . . . . . . . . . . . . 103

7–2 Variance Accounted for by DFA Functions . . . . . . . . . . . . . . . . 104

7–3 Overall Voiceless Classification . . . . . . . . . . . . . . . . . . . . . . 107

7–4 Cross-Validated Classification Results . . . . . . . . . . . . . . . . . . 107

7–5 Overall Voiced Classification . . . . . . . . . . . . . . . . . . . . . . . 109

7–6 Cross-Validated Voiced Classification . . . . . . . . . . . . . . . . . . 109

7–7 Overall Voiceless Classification . . . . . . . . . . . . . . . . . . . . . . 109

7–8 Cross-Validated Voiceless Classification . . . . . . . . . . . . . . . . . 110

ix

LIST OF FIGURESFigure page

3–1 Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 32

3–2 Segmentation of /Q/ . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3–3 Hamming vs. Kaiser Window . . . . . . . . . . . . . . . . . . . . . . 35

3–4 Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4–1 Frication Noise RMS Amplitude . . . . . . . . . . . . . . . . . . . . . 43

4–2 Frication Noise RMS Amplitude: Vowel Context . . . . . . . . . . . . 44

4–3 Frication Noise RMS Amplitude: Place and Voicing . . . . . . . . . . 45

4–4 Relative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4–5 Relative Amplitude: Place and Voicing . . . . . . . . . . . . . . . . . 49

4–6 Relative Amplitude; Place and Short Vowels . . . . . . . . . . . . . . 51

4–7 Relative Amplitude; Place and Long Vowels . . . . . . . . . . . . . . 52

4–8 Relative Amplitude: Voicing and Short Vowels . . . . . . . . . . . . . 54

4–9 Relative Amplitude: Voicing and Long Vowels . . . . . . . . . . . . . 55

4–10 Fricative Duration: Place and Voicing . . . . . . . . . . . . . . . . . . 57

4–11 Fricative Duration: Place and Voicing Interactions . . . . . . . . . . . 58

4–12 Fricative Duration: Vowel Context . . . . . . . . . . . . . . . . . . . . 59

4–13 Normalized Frication Noise: Place and Voicing . . . . . . . . . . . . . 60

4–14 Normalized Fricative Duration: Place and Voicing Interactions . . . . 61

4–15 Normalized Frication Noise: Vowel Context . . . . . . . . . . . . . . . 62

5–1 Spectral Peak Location: Place and Voicing . . . . . . . . . . . . . . . 66

5–2 Spectral Peak Location: Place × Voicing Interaction . . . . . . . . . . 67

5–3 Spectral Peak Location: Place × Vowels . . . . . . . . . . . . . . . . 68

5–4 Spectral Peak Location: Place × Short Vowel Interaction . . . . . . . 69

x

5–5 Spectral Peak Location: Place × Long Vowel Interaction . . . . . . . 70

5–6 Spectral Mean: Place and Voicing . . . . . . . . . . . . . . . . . . . . 75

5–7 Spectral Mean: Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5–8 Spectral Mean: Place × Voicing Interaction . . . . . . . . . . . . . . 77

5–9 Spectral Mean: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5–10 Spectral Variance: Place and Voicing . . . . . . . . . . . . . . . . . . 81

5–11 Spectral Variance: Place × Voicing Interaction . . . . . . . . . . . . . 82

5–12 Spectral Variance: Vowel . . . . . . . . . . . . . . . . . . . . . . . . 83

5–13 Spectral Skewness: Place and Voicing . . . . . . . . . . . . . . . . . . 85

5–14 Spectral Skewness: Voice . . . . . . . . . . . . . . . . . . . . . . . . . 87

5–15 Spectral Skewness: Place × Voicing Interaction . . . . . . . . . . . . 88

5–16 Spectral Skewness: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . 89

5–17 Spectral Kurtosis: Place and Voicing . . . . . . . . . . . . . . . . . . 91

5–18 Spectral Kurtosis: Voicing . . . . . . . . . . . . . . . . . . . . . . . . 93

5–19 Spectral Kurtosis: Place × Voice interaction . . . . . . . . . . . . . . 94

5–20 Spectral Kurtosis: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . 95

6–1 Second Formant: Place × Voicing Interaction . . . . . . . . . . . . . 98

6–2 Second Formant: Vowel Context . . . . . . . . . . . . . . . . . . . . . 99

6–3 Locus Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7–1 Discrimination Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7–2 Discrimination Plane by Voicing . . . . . . . . . . . . . . . . . . . . . 110

xi

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

ACOUSTIC CHARACTERISTICSOF ARABIC FRICATIVES

By

Mohamed Ali Al-Khairy

August 2005

Chair: Ratree WaylandMajor Department: Linguistics

The acoustic characteristics of fricatives were investigated with the aim

of finding invariant cues that classify fricatives into their place of articulation.

However, such invariant cues are hard to recognize because of the long-noticed

problem of variability in the acoustic signal. Both intrinsic and extrinsic sources

of variability in the speech signal lead to a defective match between a signal

and its percept. Nevertheless, such invariance can be circumvented by using

appropriate analysis methods. The 13 fricatives of Modern Standard Arabic

(/f, T, D, DQ, s, sQ, z, S, X, K, è, Q, h/) were elicited from 8 male adult speakers

in 6 vowel contexts (/i, i:, a, a:, u, u:/). The acoustic cues investigated included

amplitude measurements (normalized and relative frication noise amplitude),

spectral measurements (spectral peak location and spectral moments), temporal

measurements (absolute and normalized frication noise duration), and formant

information at fricative-vowel transition (F2 at vowel onset and locus equation).

For the most part, fricatives in Arabic had patterns similar to those reported

for similar fricatives in other languages (e.g., English, Spanish, Portuguese) . A

discriminant function analysis showed that among all the cues investigated, spectral

xii

mean, skewness, second formant at vowel onset, normalized RMS amplitude,

relative amplitude, and spectral peak location were the variables contributing

the most to overall classification with a success rate of 83.2%. When voicing was

specified in the model, the correct classification rate increased to 92.9% for voiced

and 93.5% for voiceless fricatives.

xiii

CHAPTER 1INTRODUCTION

Since the early years of speech research, studies (using various models and

methods) have focused on finding the properties that distinguish among naturally

produced speech sound. Many such studies investigated the properties of the

acoustic signal through which sound is transmitted from speaker to hearer.

However, the task is complicated by the long-noticed problem of variability in

the acoustic signal resulting in a defective match between a signal and its percept

(Liberman, Cooper, Shankweiler, and Studdert-Kennedy 1967). The production

mechanism of speech sounds, particularly fricatives, involves intrinsic sources of

variability arising from changes in the shape of the vocal tract and the rate of air

flow (Strevens 1960; Tjaden and Turner 1997). Variability in the speech signal also

arises from extrinsic sources including speaker age (Pentz, Gilbert, and Zawadzki

1979), vocal tract size (Hughes and Halle 1956), speaking rate (Nittrouer 1995),

and linguistic context (Tabain 2001). Variability in speech also is often a result of a

combination of these factors.

Withstanding the variability found in the speech signal, numerous studies

(Stevens 1985; Behrens and Blumstein 1988a,b; Forrest, Weismer, Milenkovic, and

Dougall 1988; Sussman, McCaffrey, and Matthews 1991; Hedrick and Ohde 1993;

Jongman, Wayland, and Wong 2000; Abdelatty Ali, Van der Spiegel, and Mueller

2001; Nissen 2003) found invariant cues in the speech signal when the appropriate

analyses are carried out. Along this line of research, our study investigated the

defining properties of fricative sounds as produced in Modern Standard Arabic

(MSA).

1

2

We used Arabic fricatives for three equally important reasons. First, the

articulatory space of fricatives in Arabic spans across most of the places of

articulation in the vocal tract, starting from the lips and ending at the glottis.

Second, unlike most of the languages used in acoustic studies of fricatives,

Arabic has two unique features that serve a phonemic distinction: pharyngeal

co-articulation and segment length. Specifically, a phonemic distinction exists

between plain fricatives (/D/ and /s/) and their pharyngealized counterparts

/DQ/ and /sQ/ in Arabic. Furthermore, although governed by some phonological

distribution rules, consonant and vowel length in Arabic are phonemic. Third, most

studies on the acoustic characteristics of fricatives were conducted predominantly

with reference to English fricatives. Given the phonetic status of Arabic and

the gap in the literature due to the lack of Arabic-related research, our study is

theoretically and empirically important. Our findings will contribute generally

to the way fricative production is viewed and specifically to the way languages

differ in that respect. Further, such findings will aid speech synthesis and parsing

softwares related to the less-understood, yet important, Arabic language.

As mentioned, both consonant and vowel length are phonemic in Arabic.

However, to compare and contrast the performance of cues used in our study with

those reported in the literature for other languages, we examined only vowel–length

variations. The inventory of fricatives in Arabic is shown in Table 1–1. Arabic

has 11 fricatives, with only 4 pairs in voicing contrast. Also, for voiced dental

and voiceless alveolar fricatives, a pharyngealized counterpart also exists. The

voiced post-alveolar fricative /Z/ was excluded, since it was articulated in most

of the elicited data as an affricate /Ã/. Studies of Standard Arabic and Arabic

dialectology suggest that /Z/ is realized as either /Z, Ã, g/ or /j/ depending on the

geographical region in which Arabic is spoken (Kaye 1972).

3

Table 1–1. Place of articulation of Arabic fricatives

Labio-Dental Alveolar

Post-Uvular Pharyngeal Glottal

dental alveolarvoiceless f T s S X è hvoiced D z K Q

/D/ and /s/ have pharyngealized counterparts /DQ/ and /sQ/.

Both local (static) and global (dynamic) cues have been shown to participate

in the identification of (English) fricatives. Specifically, three main acoustic features

have been examined in research aimed to distinguish fricatives: the spectral

properties of the frication noise, the relation between the frequency characteristics

of frication noise versus the vowel, and duration of frication noise. Our study

aimed to describe the acoustic characteristics of Arabic fricatives using many of

the acoustic measurements used in other related studies with specific interest in

finding cues that differentiate between plain and pharyngealized fricatives. Our

study also aimed to see if phonemic differences in vowel length affect the acoustic

cues measured. Our data were elicited from 8 male adult speakers (mean age =

20) who had no history of hearing or speaking impairments and who had limited

experience with English as a second language.

Cues investigated in our study were amplitude measurements (normalized and

relative frication noise amplitude), spectral measurements (spectral peak location

and spectral moments), temporal measurements (absolute and normalized frication

noise duration), and formant information at fricative-vowel transition (F2 at vowel

onset and locus equation). Normalized amplitude is defined here as the ratio

between the average RMS amplitude (in dB) of three consecutive pitch periods

at the point of maximum vowel amplitude and the RMS amplitude of the entire

frication noise. Relative amplitude, on the other hand, is defined as the amplitude

of the frication noise relative to the vowel amplitude measured in certain frequency

regions. Spectral peak location relates the fricative place of articulation to the

4

frequency location of energy maximum in the frication noise. Spectral moments

analysis is a statistical approach that treats FFT spectra as a random probability

distribution from which the first four moments (mean, variance, skewness, and

kurtosis) are calculated. Spectral mean refers to the average energy concentration

and variance to its range. Skewness, on the other hand, is a measure of spectral tilt

that indicates the frequency of most energy concentration. Kurtosis is an indicator

of the distribution peakedness. Formant transitions were assessed using locus

equations that relate second formant frequency at vowel onset (F2onset) to that at

vowel midpoint (F2vowel).

Along with reporting how each of the acoustic measures mentioned above

differentiates between different places of fricatives articulation, we used a statistical

method (discriminant function analysis) to find the most parsimonious combination

of acoustic cues that distinguish among the different places of fricative articulation

and the contribution of each selected cue to the overall classification of fricatives

into their places of articulation.

CHAPTER 2LITERATURE REVIEW

2.1 Introduction

In this chapter we review relevant literature that deals with the acoustic

characteristics that have been shown to be effective in differentiating among

fricative place of articulation and voicing in the world’s languages. Given the

fact that certain fricatives that exist in Standard Arabic (e.g., pharyngealized vs.

non-pharyngealized) do not occur in other languages of the world, in this chapter,

we also discuss whether these acoustic cues will be effective in differentiating

acoustically among Standard Arabic fricatives.

2.2 Fricative Production

Fricative production is best described in terms of the source-filter theory of

speech production (Fant 1960). According to that theory, speech can be modeled

as a result of two independent components: a source signal (which could be the

glottal source, or noise generated at a compressed level in the vocal tract); and a

filter (reflecting the resonance in the cavities of the vocal tract downstream from

the glottis, or the constriction).

The basic mechanism for fricative production is that a turbulence forms in

the air flow at a point in the oral cavity. To generate such turbulence, a steady

air flow with velocity greater than a critical number1 passes through a narrow

constriction in the oral cavity and forms a jet that mixes with surrounding air in

1 This number is Reynold’s Number (Re) which is a dimensionless quantity thatrelates the constriction size to the volume velocity needed to produce turbulence inthe air. For speech Re > 1800 (Kent and Read 2002).

5

6

the vicinity of a constriction to generate eddies. These eddies, which are random

velocity fluctuations in the air flow, act as the source for frication noise (Stevens

1971). Depending on the nature of the constriction, frication noise can also be

generated at either an obstacle or a wall (Shadle 1990). According to Shadle,

obstacle source refers to fricatives in which sound is generated primarily at a rigid

body perpendicular to the air flow. An example is the production of voiceless

alveolar and voiceless post-alveolar fricatives (/s, S/): the upper and lower teeth,

respectively, act as the spoiler for the airflow. Such sources are characterized by

a maximum source amplitude for a given velocity. On the other hand, wall source

occurs when sound is generated primarily along a rigid body parallel to the air

flow. Spectrums of sounds generated by a wall source, like voiced and voiceless

velar fricatives (/x, G/), are characterized by a flat broad peak with less amplitude

than sounds of obstacle sources (Shadle 1990). Vibration of the vocal folds also

adds to the sources responsible for voiced fricative production.

Whatever the source, the resulting turbulence is then modified by the

resonance characteristics of the vocal tract (filter). The spectrum of the product

of such a filter represents the effect of transfer function of the vocal tract which

in turn depends on 1) the natural frequencies of the cavities anterior to the

constriction (poles), 2) the radiation characteristics of the sound leaving the mouth,

and 3) the resonant frequency of the posterior cavity (zeros). For fricatives, the

vocal tract is tightly constricted and hence the coupling between the front and back

cavities is small (Johnson 1997). Therefore, the transfer function of the vocal tract

for fricatives is largely dependent on the resonances of the front cavity. The nth

resonance can be calculated using Equation (2–1) where c is the speed of sound and

l is the length of the vocal tract. In case a strong coupling occurs between the front

and back cavities, such as when the “constriction is gradually tapered” (Kent and

Read 2002, p. 43), the resonances of the back cavity are calculated using Equation

7

(2–2). Resonances of the back and front cavities sharing the same frequency and

bandwidth cancel each other out.

fnfront =(2n− 1) c

4l(2–1)

fnback =(n) c

2l(2–2)

2.3 Acoustic Cues to Fricative Place of Articulation

Both local (static) and global (dynamic) cues have been shown to participate

with different degrees in the identification of (English) fricatives. The three main

acoustic cues that have been of most interest in the literature on fricatives are the

amplitude and spectral properties of the frication noise, the relationship between

the frequency characteristics of frication noise and those of the vowel, and the role

of duration of frication noise in distinguishing fricative place and voicing.

2.3.1 Amplitude Cues

2.3.1.1 Frication amplitude

Most studies of frication noise amplitude have focused on (English) voiceless

fricatives, and found similar results: sibilants (/s, z, S, Z/) have higher amplitude

than nonsibilants (/f, v, T, D/) with no differences within each class. This difference

in amplitude between sibilants and nonsibilants is predictable if one looks into the

aerodynamics of producing these fricatives. For example, to examine fricative

production mechanisms, Shadle (1985) used a mechanical model in which

constriction area, length, location can vary, and the presence or absence of an

obstacle can be manipulated. Based on results from spectra produced using such a

model, Shadle (1985) concluded that the lower teeth act as an obstacle at some 3

cm downstream from the noise source of sibilant constriction. Such configuration

results in an increase in turbulence of the airflow, which in turn causes an increase

in the sibilant amplitude. Nonsibilant fricatives, on the other hand, have no such

obstacle, resulting in very low energy levels. The difference between the sibilant

8

and nonsibilant fricatives with regard to frication amplitude was also found to have

auditory salience. McCasland (1979) studied the role of amplitude as a perceptual

cue to fricative place of articulation. He cross-spliced naturally spoken syllables

of English /f, T, s, S/ and /i/ such that the fricative part in /si/ and /Si/ was

cross-spliced to the vocalic part of both /fi/ and /Ti/. The overall amplitude of the

spliced-in frication noise was attuned to the same level of intensity as that of the

original nonsibilant fricative by reducing /s, S/ amplitude to that of /f/ and /T/.

The resulting fricative-vowel syllables sounded like /fi/ and /Ti/ when the vocalic

part of the utterance was coming from an original /fi, Ti/, respectively. These

findings led McCasland to conclude that the low amplitude of nonsibilant fricatives

was used as a perceptual cue to distinguish them from the sibilants /s, S/. However,

because of the cross-splicing method used, it is not clear whether the results can

be attributed solely to the reduction of /s, S/ amplitude. In fact, Behrens and

Blumstein (1988a) pointed out that the results of McCasland’s method are not

conclusive since the method involves mismatching information from frication noise

and vocalic transition. Specifically, it is not clear whether listeners were using the

reduced noise amplitude of sibilants as a cue for nonsibilants, or they were using

transitional information in the original vocalic part of the nonsibilant to judge the

token to be /f, T/. Listeners might be using either one of those cues, or both; and

there was no way of telling which, using the cross-splicing methodology.

One way to remedy the shortcomings of the cross-splicing method is to use

synthetic speech. Gurlekian (1981) used synthetic /sa, fa/ syllables in which the

frequency and the amplitude of the vowel were kept constant in order to test

whether the distinction between sibilant and nonsibilant fricatives could be based

solely on differences in their noise amplitude. For fricatives, the center frequency of

the noise was kept fixed at 4500 Hz, while its amplitude was manipulated to vary

relative to the fixed vowel amplitude. The central frequency used was similar to the

9

range at which /s/ was correctly identified 90% of the time by Argentine Spanish

listeners (Manrique and Massone 1979), and within the range described for English

/s/ (Heinz and Stevens 1961). An identification test with 6 Argentine Spanish and

6 English listeners showed that both groups assigned a /fa/ percept to the tokens

with low noise amplitude and a /sa/ percept to those with high noise amplitude.

Also, Behrens and Blumstein (1988a) investigated the role of fricative noise

amplitude in distinguishing place of articulation among fricatives. Basically,

Behrens and Blumstein altered the amplitude of the frication part of CV syllables,

with the C being one of /f, T, s, S/, while preserving the vocalic part of the

utterance. This matching was done by raising the noise amplitude of /f, T/ to

that of /s, S/ and conversely, lowering the noise amplitude of /s, S/ to that of

/f, T/ without substituting or changing the vocalic part of the utterance. They

found, contrary to previous studies, that the overall amplitude of the fricative noise

relative to the amplitude of the following vowel does not constitute the primary cue

for sibilant/nonsibilants distinction. Therefore, Behrens and Blumstein called for

an integration of spectral properties and amplitude characteristics of fricatives in

order to successfully discriminate among their places of articulation.

Another way to capture classification information found in frication noise

amplitude is to measure the Root-Mean-Square (RMS) amplitude of the fricative

noise normalized relative to the vowel. Jongman et al. (2000) used this method

in their large-scale study of English fricatives. Among the many measures used to

characterize fricatives, Jongman et al. measured the difference between the average

RMS amplitude (in dB) of three consecutive pitch periods at the point of maximum

vowel amplitude and the RMS amplitude of the entire frication noise. Results were

derived from 20 native speakers of American English (10 females and 10 males).

The speakers produced all 8 English fricatives in the onset of CVC syllables with

the rhyme consisting of each of six vowels /i, e, æ, A, o, u/ and /p/. The authors

10

found that this “normalized RMS amplitude” can differentiate among all four

places of fricatives in English with voiced fricatives having a smaller amplitude

than their voiceless counterparts.

The integration of fricative and vowel amplitude as a way of normalization

was also used for automatic recognition of continuous speech. Abdelatty Ali et al.

(2001) used Maximum Normalized Spectral Slope (MNSS), which relates the

spectral slope of the frication noise spectrum to the maximum total energy in the

utterance, thus capturing the spectral shape of the fricative and its amplitude in

addition to the vowel amplitude features in one quantity. It differs, however, from

Jongman and colleagues’ normalized amplitude in two ways: first it uses peak

amplitude instead of RMS amplitude for the vowel and the fricative; and second, it

uses only the strongest peak of the fricative (as opposed to whole frication noise)

and normalizes that in relation to the strongest peak of the vowel (as opposed

to the average of the strongest three pitch periods). For MNSS, a statistically

determined threshold (0.01 for voiced and 0.02 for voiceless fricatives) is used

to classify the fricative as nonsibilant if MSNN falls below the threshold, and as

sibilant if it is above it. Using such criteria, Abdelatty Ali et al. obtained a 94%

recognition accuracy of sibilant vs. nonsibilants fricatives. No further information

was given on using MSNN to classify fricatives within these classes.

2.3.1.2 Relative amplitude

Since amplitude cues from the frication noise and spectral cues of the vocalic

part in a syllable depend on each other (Behrens and Blumstein 1988a; Jongman

et al. 2000); changes in amplitude might carry more perceptual weight if the

frequency range over which such changes occur is taken into consideration. Such

integration was presented by Stevens and Blumstein (1981) as an invariant

property of speech production. They demonstrated theoretically that different

amplitude changes that occur at the consonant-vowel boundary in certain frequency

11

ranges are related to articulatory mechanisms associated with certain places in the

vocal tract. Therefore, listeners might be using these relational values as a cue for

the place of a consonant production. To test this claim, Stevens (1985) synthesized

sibilant/nonsibilant and anterior/nonanterior continua such that the frication noise

amplitude at certain frequency ranges on the continuum was gradually changed

from one stimuli to the other. Listeners’ judgments abruptly shift from /T/ to

/s/ when the amplitude of frication noise in the fifth and sixth formant frequency

regions (F5 & F6 ) is increased relative to the amplitude in the same frequency

regions at vowel onset. On the other hand, listeners identified the consonant to be

/s/ rather than /S/ when the frication noise amplitude at the F3 region, relative

to F3 amplitude of the vowel, rises at the transition and as /S/ if it falls. These

findings led Stevens to hypothesize that the vowel is used as an “anchor against

which the spectrum of the fricative noise is judged or evaluated” (Stevens 1985, p.

249).

Other researchers tried to test the robustness of this feature in different

contexts. Hedrick and Ohde (1993) looked into the effect of frication duration

and vowel context on the relative amplitude and whether such changes would

affect perception of fricative place of articulation. This was done by varying the

amplitude of the fricative relative to vowel onset amplitude at F3 and F5 for the

contrast /s/-/S/ and /s/-/T/ respectively. Frication duration and vowel context

also varied. Ten adult listeners with no history of speech or hearing disorders who

successfully perceived (with 70% accuracy) the end points of /s - S/ and /s - T/

continua were asked to identify each stimulus as one member of the contrastive

pairs above. In the /s/-/S/ contrast, listeners chose more /s/ responses when

presented with lower relative amplitude and more /S/’s when presented with higher

relative amplitude. These findings held constant across the different vowel and

duration conditions and were in agreement with those obtained by Stevens (1985).

12

Furthermore, the additional post-fricative vowel contexts in Hedrick and Ohde’s

study influenced only the magnitude of the relative amplitude effect for a given

contrast. Hedrick and Ohde claim that relative amplitude is used as a primary

invariant cue since listeners used relative amplitude information more effectively

than the context-dependent formant transitions. To further test this assumption,

Hedrick and Ohde (1993) also varied along a continuum the appropriate formant

transitions of the contrasts presented above while keeping the relative amplitude

fixed across all stimuli. The hypothesis was that if relative amplitude was indeed

a primary cue, then variation in formant transition would not affect identification

of members of the contrasting pair. Their findings indicate that for the /s/-/S/

contrast, formant transition did affect the identification of at least the end points of

the continua. For the /s/-/T/ contrast, formant transitions had a negligible effect

on the identification of the two fricatives even at boundary points.

Taken together, all these findings indicate that relative amplitude is part of

a primary cue to fricative place of articulation. Such a role becomes more salient

when the contrast involves sibilant vs. nonsibilant fricatives. Additionally, Hedrick

and Ohde (1993) findings also suggest that formant transitions do influence the

perception of fricative place of articulation, at least among sibilants.

However, a trading relationship seems to exist between the use of the two

cues in the presence of factors obstructing an effective use of a given cue. Hedrick

(1997) found that listeners with sensorineural hearing loss relied less on formant

transition information than on relative amplitude in discriminating between English

/s/ and /f/. On the other hand, listeners with normal hearing showed the opposite

preference. This was the case even when the formant transition information was

presented at a level audible to listeners with sensorineural hearing loss.

So far, relative amplitude has been shown only to differentiate between

sibilants and nonsibilants as a class, with the exception of Jongman et al. (2000)

13

study, in which they found that relative amplitude, as defined by Hedrick and Ohde

(1993), also differentiates among all four places of fricatives articulation in English.

2.3.2 Duration Cues

Fricative duration measures were used in previous research mainly to

differentiate between sibilants and nonsibilants, and to assess the voicing of

fricatives. One such study was conducted by Behrens and Blumstein (1988b)

who recorded three native speakers of English producing each of the 4 English

voiceless fricatives /f, T, s, S/ followed by one of the five vowels /i, e, a, o, u/. They

found that sibilants /s, S/ were longer than nonsibilants /f, T/ with an average

difference of 33 ms. Also, they found no significant differences between the duration

of members of the same class. The vowel effect was found to be minimal and

only among the nonsibilant fricatives. Similar results were obtained by Pirello,

Blumstein, and Kurowski (1997). The researchers also found that alveolar fricatives

were longer on average than labiodental fricatives in English.

Jongman (1989) questioned the importance of frication noise duration as a cue

for fricative identification. He found that listeners can identify fricatives based on a

fraction of its frication noise duration. In a perception test, listeners only needed as

little as 50-ms of the initial frication noise of a naturally produced fricative-vowel

syllable to successfully classify fricatives. Although cues like amplitude or spectral

properties localized at the initial parts of the frication noise may have been used

here, it is important to note that such results undermine the significance of an

absolute duration value in classifying fricatives. Temporal features of speech can

vary as a function of speaking rate. In fact, when frication noise duration was

normalized by taking the ratio of fricative duration over word duration, Jongman

et al. (2000) found a significant difference among all places of fricative articulation

with the exception of the labiodental and interdental contrast.

14

Frication noise duration has also been used to assess the voicing distinction

between fricatives of the same place of articulation. Cole and Cooper (1975)

examined the role of frication noise duration on the perception of voicing in

fricatives. They found that decreasing the length of frication noise of voiceless

fricative in syllable-initial position resulted in a shift in their perception toward

their voiced counterparts. They noted also that in syllable-final position, duration

of the frication noise relative to that of the preceding vowel becomes the cue for

fricative voicing (voiced fricatives being shorter than voiceless). Similar findings

were also obtained by Manrique and Massone (1981) for Spanish fricatives /B, f,

D, s, S, Z, x, G/ in three conditions: isolated, in CV syllables, and CVCV words.

Noise duration was significantly shorter for voiced fricatives than for voiceless

fricatives in all three conditions. However, of these fricatives, only /S, Z/ and

/x, G/ are homorganic; while the other two pairs do not share the same place

of articulation (Baum and Blumstein 1987). Therefore, the reported temporal

differences in Manrique and Massone’s study might have been due to factors other

than fricative voicing since, as mentioned previously, durational differences existed

between fricatives sharing the same voicing but belonging to different places of

articulation (Behrens and Blumstein 1988b). Nevertheless, Baum and Blumstein’s

own experiments showed that syllable-initial voiceless English fricatives in citation

forms are longer than their voiced counterparts. However, they noted considerable

overlap in duration distributions of voiced and voiceless fricatives at all places

studied.

Using connected speech, Crystal and House (1988) also found that, on average,

voiceless fricatives in word-initial position are longer than voiced fricatives. Like

Baum and Blumstein’s results, there was a considerable amount of overlap between

the duration distributions of the voiced and voiceless fricatives in connected speech.

Again, the use of duration per se as the sole cue for fricative voicing was questioned

15

by Jongman (1989) who found that identification of fricatives voicing was accurate

(83%) even if only 20 ms of frication noise is used. However, Jongman et al. (2000)

used a relative measure of duration to quantify its use as a cue for fricative voicing.

Normalized fricative noise duration (defined as the ratio of fricative duration over

that of the carrier word) significantly longer for voiceless than for voiced fricatives.

They also found that such differences are more apparent in nonsibilant than in

sibilant fricatives.

2.3.3 Spectral Cues

In addition to amplitude and duration, spectral properties of the frication

noise have been investigated to find cues that identify fricative place of articulation.

Among the spectral properties previously studied are spectral peak location and

spectral moments measurements.

2.3.3.1 Spectral peak location

One of the early attempts to relate the fricative place of articulation to the

frequency location of energy maximum in the frication noise was the study by

Hughes and Halle (1956). In this study, gated 50 ms windows of the frication noise

were used to produce spectra of English fricatives /f, v, s, z, S, Z/. An investigation

of the fricative spectra revealed that for some speakers a strong energy component

was located at the frequency region below 700 Hz for the spectrum of voiced

fricatives. Such energy concentration was absent at the same region for voiceless

fricatives. However, these findings were not consistent among all speakers. Based

on this inconsistency, in addition to the similarities found between the spectra

of homorganic voiced and voiceless fricatives above 1 kHz, Hughes and Halle

ruled out the use of spectral prominence as a basis for voicing distinction among

fricatives. On the other hand, the distinction of place was found to be related,

to a certain extent, to the location of the most prominent spectral peak. Hughes

and Halle found that /f, v/ had a relatively flat spectrum below 10 kHz, whereas

16

spectral prominence was observed for /S, Z/ at the region of 2-4 kHz, and for /s,

z/ at the region above 4 kHz. Also, they found that the exact location of the

peak for each fricative was lower for males and higher for females. Based on these

observations, Hughes and Halle concluded that the size and shape of the resonance

chamber in front of the fricative’s point of constriction determine the place of

energy maximum in frication noise spectra. Specifically, they reported that the

length of the vocal tract from the point of constriction to the lips was inversely

related to the frequency of the peak in the spectrum. Thus, the spectral peak

increases as the point of articulation becomes closer to the lips. Such observations

are consistent with predictions made by the the source-filter theory of speech

production presented in section 2.2.

Strevens (1960) also looked into the use of spectral prominence to differentiate

between fricatives through examining the front (/F, f, T/), mid (/s, S, ç/) and back

(/x, X, h/) voiceless fricatives as produced by subjects with professional training in

phonetics. Based on average line spectra, Strevens found that the front fricatives

were characterized by unpatterned low intensity and smooth spectra, the mid

fricatives by high intensity with significant peaks on the spectra around 3.5 kHz

and the back fricatives by medium intensity and a marked formant like structure

with peaks around 1.5 kHz.

The results reported above for front and mid fricatives were also shown to

be perceptually valid (Heinz and Stevens 1961). Using a synthesized continuum

of white noise with spectral peaks in ranges representative of those found in /S, ç,

s, f, T/, Heinz and Stevens found that participants were consistently shifting the

identification of the fricative from /S/ to /ç/ to /s/ to /f, T/ as the peak of the

resonance frequency increased, with no distinction that could be made between /f,

T/.

17

Similar properties were also found for fricatives in Spanish. In their study of

Spanish fricatives, Manrique and Massone (1981) found that /s/, /f/ and /T/ have

spectral peak values comparable to the English fricatives as reported by Hughes

and Halle (1956). Furthermore, they reported finding that spectral energy in /x/

is concentrated in a low narrow frequency band continuous with the F2 of the

following vowel and that /ç/ spectral frequency is concentrated at a low band

continuous with F3 of the following vowel. Manrique and Massone (1981) also

examined the identification of a subset of Spanish fricatives to see whether changes

in spectral peak location would change the way fricatives are perceived by Spanish

speakers. They synthesized 9 cascade stimuli of the middle 500 ms of each of a

deliberately lengthened /f, s, S, x/ using a set of low- and high-pass filters so that

only certain spectral zones were present for each stimuli. The unfiltered fricatives

had recognition scores ranging from 95% for /f/ and /s/, to 100% for /S/ and /x/.

For the filtered fricatives, they found that the spectral peak location carries the

perceptual load for the identification of /s/, /S/, and /x/. However, the diffused

spectrum of /f/ was believed to be the characterizing factor of its identifiability.

Other studies of English fricatives confirmed that spectral peak location

can classify sibilants from nonsibilants as a class, and only between sibilants.

For example, Behrens and Blumstein (1988b) found that for English voiceless

fricatives, major spectral peaks in ranges within 3.5-5 kHz were apparent for /s/

and within 2.5-3.5 kHz for /S/. On the other hand /f/ and /T/ appeared flat with

a diffused spread of energy from 1.8-8.5 kHz with a good deal of variability in their

spectral shape. The same pattern was also observed across age groups. Pentz et al.

(1979), for example, compared the spectral properties of English fricatives (/f,

v, s, z, S, Z/) produced by preadolescent children to that reported for adults. As

reported for adults elsewhere, they found the same pattern of energy localization

and constriction point. However, the values obtained from children in their study

18

were higher than those obtained for male and female adult speakers in the studies

mentioned above. This difference was attributed in large part to the differences

in vocal tract lengths. Male adult speakers have the longest vocal tract and the

lowest vocal tract resonance, while children have the shortest vocal tract and the

highest vocal tract resonance; female adult speakers fall between the two groups. In

another study, Nissen (2003) investigated, among other metrics, the spectral peak

location of voiceless English obstruents as produced by male and female speakers

of four different age groups. For the fricatives in the study, he found that “the

spectral peak decreased as a function of increased speaker age” (Nissen 2003, p.

139). Beside being age and gender dependent, spectral peak location has also been

found to be vowel dependent (Mann and Repp 1980; Soli 1981) and highly variable

for speakers with neuromotor dysfunction (Chen and Steven 2001) due to their lack

of control over articulatory muscles.

However, in contrast to all the studies mentioned above, Jongman et al.

(2000) found that across all (male and female) speakers and vowel contexts, all

four places of fricative articulation in English were significantly different from

each other in terms of spectral peak location. Further, they found spectral peak

location to reliably differentiate between /T/ and /D/ and between /f/ and /v/.

The researchers justified the use of the larger analysis window they adopted in their

study, as compared to other studies, as a way to obtain better resolution in the

frequency domain at the expense of temporal domain resolution. They argue that

such a compromise is advantageous due to the stationary nature of frication noise.

In summary, spectral peak location for the fricatives increases as the

constriction becomes closer to the open end of the vocal tract. Also, spectral peak

for back fricatives shows a formant-like structure similar to the following vowel.

Both of these generalizations can be accounted for by the source-filter theory of

speech production. Fricatives are characterized by turbulent airflow through a

19

narrow constriction in the oral cavity, with the portion of the vocal tract in the

front of the constriction effectively becoming the resonating chamber. For long

and narrow constrictions, like fricatives, the acoustic theory of speech production

predicts that the only present resonance components in the spectrum are those

related to the area in front of the constriction due to lack of acoustic coupling

from the cavity behind the constriction (Heinz and Stevens 1961). The size of the

resonating cavity, therefore, can be inversely correlated with the frequency of the

most prominent peak in the spectrum (Hughes and Halle 1956). As a result of this

correlation, fricatives produced at or behind the alveolar region are characterized

by a well-defined spectrum with peaks around 2.5-3.5 kHz for /S, Z/ and at 3.5-5

kHz for /s, z/. However, due to the very small area in front of the constriction,

fricatives produced at the labial or labiodental area are characterized with a

flat spectrum and a diffused spread of energy between 1.5 and 8.5 kHz. Since

nonsibilant production creates a cavity in close proximity to the open end of the

vocal tract, different degrees of lip rounding (Shadle, Mair, and Carter 1996), and

the additional turbulence produced by the air stream hitting the teeth (Strevens

1960; Behrens and Blumstein 1988a) will introduce a great amount of variability

in the location of the energy concentration. On the other hand, sibilants usually

have a clearly defined spectral peak location. However, for speakers with limited

precision over the placement of the constriction (Chen and Steven 2001), such

variability also exists for sibilants.

2.3.3.2 Spectral moments

Spectral moments analysis is another metric that has been used for fricative

identification. Unlike spectral peak location analysis, this statistical approach

captures both local (mean frequency and variance) and global (skewness and

kurtosis) aspects of fricative spectra. Spectral mean refers to the average energy

concentration and variance to its range. Skewness, on the other hand, is a measure

20

of spectral tilt that indicate the frequency of the most energy concentration.

Skewness with a positive value indicates a negative spectral tilt with energy

concentration at the lower frequencies, while negative skewness is an indication of

positive tilt with energy concentration at higher frequencies (Jongman et al. 2000).

Kurtosis is an indicator of the distribution’s peakedness.

One of the early applications of spectral moments to classify speech sounds

was the study by Forrest et al. (1988) on English obstruents. For the fricatives

in that study, Forrest et al. generated a series of Fast Fourier Transforms (FFT)

using a 20 ms analysis window with a step-size of 10 ms that started at the

obstruent onset through three pitch periods into the vowel. The FFT-generated

spectra were then treated as a random probability distribution from which the

first four moments (mean, variance, skewness, and kurtosis) were calculated.

The spectral moments obtained from both linear and Bark scales were entered

into a discriminant function analysis in an attempt to classify voiceless fricatives

according to their place of articulation. Classification scores, on both scales, were

good for the sibilants /s/ and /S/ with 85% and 95% respectively. The nonsibilants,

on the other hand, were not as accurately classified using any moment on either of

the two scales (58% for /T/ and 75% for /f/). Subsequent implementations of the

spectral moment analysis tried to extend or replicate Forrest et al. approach with

some modifications. The study by Tomiak (1990) of English voiceless fricatives,

for example, used a different analysis window (100 ms) at different locations of

the English voiceless frication noise. Like in previous research, spectral moments

were successful in classifying sibilants and /h/ data. In the case of nonsibilants, it

was found that the most useful spectral information is contained in the transition

portion of the frication. Additionally, in contrast to Forrest et al., Tomiak found an

advantage for the linearly derived moment profiles over the Bark-scaled ones.

21

Spectral moments were also used by Shadle et al. (1996) to classify voiced

and voiceless English fricatives. The study involved spectral moments measured

from discrete Fourier transform (DFT) analyses performed at different locations

within the frication noise and at different frequency ranges. They found that

spectral moments provided some information about fricative production but did not

discriminate reliably between their different places of articulation. Furthermore,

their results indicated that spectral moments are sensitive to the frequency range

of the analysis. However, the moments were not sensitive to the analysis position

within the fricative. Similar results were also obtained for children (Nittrouer,

Stiddert-Kennedy, and McGowan 1989; Nittrouer 1995). The use of spectral

moments as a tool to distinguish between /s/ and /S/ was also extended to atypical

speech and found to be reliable. Tjaden and Turner (1997), for example, compared

spectral moments obtained from speakers with amyotrophic lateral sclerosis (ALS)

and healthy controls matched for age and gender and found that the first moment

was significantly lower for the ALS group. Tjaden and Turner suggested that the

low means values found among ASL speakers can be attributed to difficulties they

face at making the appropriate degree of constriction required to produce frication,

or to a weaker subglottal sound source due to weak respiratory muscles that are

common with ASL speakers.

The studies mentioned so far demonstrate the ability of spectral moments

to distinguish sibilants from nonsibilants as a class and that they can reliably

distinguish only among sibilants. However, contrary to the studies mentioned

above, Jongman et al. (2000) found that spectral moments were successful in

capturing the differences between all four places of fricative articulation in English.

Jongman et al. study, however, differs from other studies in that it calculated

moments from a 40 ms FFT analysis window placed at four different places in

the frication noise (onset, mid, end, and transition into vowel) and that it uses a

22

larger and more representative number of speakers and tokens (2880 tokens from

20 speakers) as compared to a smaller population in other studies. Across moments

and window locations, variance and skewness at onset and transition were found

to be the most robust classifiers of all four places. Also, on average, variance was

shown to effectively distinguish between voiced and voiceless fricatives with the

former having greater variance.

2.3.4 Formant Transition Cues

2.3.4.1 Second formant at transition

Early research on formant transition focused on perceptual usefulness of such

information in classifying speech sounds. For example, Harris (1958) recorded the

English fricatives /f, v, T, D, s, z, S, Z/ followed by one of each of the vowels /i,

e, o, u/. Then she spliced and recombined vocalic and frication partitions of all

CV combinations. Listeners correctly identified sibilant fricatives regardless of

the source of the cross-spliced vocalic part. Frication noise alone was sufficient for

correct identification of sibilant fricatives. On the other hand, among nonsibilant

fricatives, a correct identification as /f, v/ occurred only when the vocalic part was

matching (i.e. coming from a /f, v/ syllable), and as /T, D/ with mismatching

vocalic parts. Based on these identification patterns, Harris suggested that

the perception of fricatives occurs at two consecutive stages. In the first stage,

cues from frication noise alone determine whether the fricative is a sibilant or

nonsibilant. If sibilant is the determined class, then cues from the frication

noise alone will differentiate among the sibilant fricatives. However, if the class

is determined to be nonsibilant at the first stage, then the formant transition

information is used for the within-class classification. As was the case with cross-

splicing methods previously mentioned (section 2.3.1.1), this method also does not

eliminate the possibility of dynamic coarticulatory information from being colored

into the precut vowel and/or fricative. It is not clear, therefore, that the results

23

obtained can be attributed solely to the mismatching vocalic part of the cross-

spliced signal. To overcome this problem, Heinz and Stevens (1961) synthesized

stimuli consisting of white noise of varying frequency peaks, similar to peaks found

in English fricatives, followed by four synthetic formant transition values. Listeners

were instructed to label these stimuli as one of the four voiceless English fricatives

/f, T, s, S/. Based on identification scores, the researchers concluded that /f/ is

distinguished from /T/ on the basis of the F2 transition in the following vowel.

There was no apparent effect of formant transition on the distinction between /s/

and /S/. These findings support those of Harris (1958), while using more controlled

stimuli.

The role of formant transition, however, was not found to be as crucial in other

studies. LaRiviere, Winitz, and Herriman (1975) used the fricative noise in its

entirety in a perceptual test and obtained high recognition scores for /s, S/, lower

scores for /f/ and poor scores for /T/. More importantly, when vocalic information

was included for the /f, T/ tokens, no significant increase in their recognition was

obtained. Other studies (Manrique and Massone 1981; Jongman 1989) also found

similar results using different methods.

The perceptual experiments thus far mentioned used a forced-choice technique

that might have biased participants’ responses. For that reason Manrique and

Massone (1981) used a tape splicing paradigm to study the effect of formant

transition on the perception of Spanish fricatives by Spanish listeners. They

constructed their stimuli by splicing CV syllables into their respective frication

and vowel parts. Listeners were asked to choose the fricative when presented with

the frication noise alone and to freely guess the sound that preceded the vowel

when presented with the vocalic part. In the latter case, most token were judged

(85% of the responses) to have been preceded with a stop sharing the same place

of articulation as the spliced fricative. Spanish fricatives with no stops sharing

24

the same place of articulation were perceived as /t/, with the exception of /f/

which was perceived as /p/ 50% of the times. The same listeners were able to

identify the fricative accurately from only the frication part in all cases except

for /x/ and /G/. However, another study found that formant transition was not

crucial for correct identification of fricatives (Jongman 1989). Based only on the

frication noise part of fricative-vowel syllables, Jongman (1989) achieved correct

(92%) fricative identification in a perceptual experiment of English fricatives. More

importantly, there was no significant increase in identification accuracy when the

entire fricative-vowel syllable was presented.

As with results obtained from synthetic speech, measures of formant transition

from naturally produced fricatives are also conflicting. Wilde and Huang (1991), for

example, measured the F2 at the vowel onset for fricatives of only one male speaker

and found that the F2 value did not differentiate systematically between /f/ and

/T/. However, in another study, Wilde (1993) found that transitional information

as measured by F2 value at the fricative-vowel boundary can be used to identify

fricative place of articulation. The measurement she obtained from two speakers

showed that as the place of constriction moves back in the vocal tract, the value of

F2 systematically increases and its range becomes smaller.

2.3.4.2 Locus equations

Locus equations provide a method to quantify the role of formant transition

in the identification of fricative place of articulation by relating second formant

frequency at vowel onset (F2onset) to that at vowel midpoint (F2vowel). Locus

equations are straight line regression fits to data points formed by plotting onsets

of F2 transitions along the y axis and their corresponding vowel nuclei F2 along

the x axis in order to obtain the value of the slope and y-intercept. This metric

has been used primarily to classify English stops (Lindblom 1963; Sussman et al.

1991). It was only recently that this measure was applied to fricatives. Fowler

25

(1994) investigated the use of locus equations as cues to place of articulation across

different manners of articulation including the fricatives /v, D, z, Z/ as spoken

by five males and five females speakers of English. In this study, Fowler found

that locus equations (in terms of slope and y-intercept) of a homorganic stop and

fricative were significantly different, while those of a stop and a fricative of different

place of articulation were significantly similar. Nevertheless, locus equations were

able to differentiate between members that share the same manner of articulation.

Slopes for fricatives /v, D, z, Z/, for example, were significantly different (slopes

of 0.73, 0.50, 0.42, and 0.34 respectively). In another study, Sussman (1994)

investigated the use of locus equations to classify consonants across manners of

articulation (approximants, fricatives, and nasals). In contrast to Fowler (1994),

he found that fricatives were not distinguishable based on the slope of their locus

equations. Only /v/ had a distinctive slope.

Results of other studies of English fricatives were similar to those of Sussman

(1994). For example, in their large-scale study of English fricatives, Jongman et al.

(2000) calculated the slope and y-intercept for all English fricatives in six vowel

environments. Specifically, Jongman and colleagues measured F2onset and F2vowel

from a 23.3 ms full Hamming window placed at the onset and midpoint of the

vowel respectively. This was the same method used by the previously mentioned

studies. Similar to Sussman (1994), Jongman et al. (2000) found that only the

slope value for /f, v/ was significantly different and that the y-intercept were

distinct only for /f, v/ and /S, Z/. Locus equations are particularly of interest

here since they have been shown to work across languages (Sussman, Hoemeke,

and Ahmed 1993), gender (Sussman et al. 1991), speaking style (Krull 1989), and

speaking rate (Sussman, Fruchter, Hilbert, and Sirosh 1998).

26

2.4 Studies of Arabic Fricatives

The use of acoustic cues to distinguish between the different fricatives in

Arabic has been underinvestigated in the literature. Furthermore, the very few

studies dealing with acoustic characteristic of Arabic fricatives (see below) have

been predominantly concerned with a single acoustic feature and not with the

way multiple cues can be integrated in order to distinguish among the fricative

place of articulation. While some of the cues mentioned above seem to distinguish

with a relatively good accuracy between English fricatives, the same cues when

used to classify Arabic fricatives need to take into account acoustic characteristics

particular to Arabic. For example, unlike English, Arabic utilizes durational

differences of both vowels and consonants for phonemic distinctions. It is of

interest, therefore, to see how such durational property would affect voicing and

place classification of Arabic fricatives. Another interesting feature of Arabic is the

existence of co-articulated (pharyngealized) fricatives that are phonemically distinct

from their plain counterparts. Due to their double articulation mechanism, it is

expected in our study that pharyngealized fricatives will have two patterns of peaks

emerging at the middle and near the end of frication. Therefore, it seems necessary

to use a second analysis window at the end of frication noise such that its right

shoulder is aligned with the end of frication noise. Additionally, the two window

locations are suggested because studies of spectral peak location have demonstrated

that high frequency peaks are more likely to emerge at the middle and end of

frication noise (Behrens and Blumstein 1988b). Also, the frequency of the most

prominent peak for the pharyngealized fricatives is expected to be lower than their

plain counterparts because of acoustic coupling resulting from co-articulation.

Spectral moments seem to be another promising technique in classifying

Arabic fricatives if the proper size and location of the analysis windows are used.

In fact, in a study of fricatives in Cairo Arabic, Norlin (1983) found that /s,

27

sQ, z, zQ/ are characterized by a sharp peak in higher frequencies, and that the

peak of /sQ, zQ/ are broader than /s, z/. Norlin used Center of Gravity (COG)

and dispersion as ways of quantifying the location of the peak and the spread of

the dispersion respectively. Therefore, it seems that a combination of spectral

mean and variance along with skewness measures would differentiate between

pharyngealized and plain fricatives.

The use of formant transition information was investigated in the literature

in relation to the fricatives articulated at the back of the oral cavity. For example,

El-Halees (1985) found that the F1 value at the transition differentiates between

uvular and pharyngeal fricatives with the former being lower. Also, he found

that listeners can differentiate between the two classes based only on this single

feature. The perceptual salience of F1onset was also demonstrated by Alwan (1989),

who used synthetic speech to test the discrimination between voiced pharyngeal

fricative /Q/ and voiced uvular fricative /X/. She found that the higher F1onset

for the pharyngeal was essential to make the distinction, while F2onset was not.

The relation between back articulation and high F1 was also attested for vowels

following such sounds. Zawaydeh (1997) found that F1 at the middle of the vowel

was raised when preceded by one of the gutturals /sQ, è/ or the glottal /h/ as

compared to non-gutturals.

In addition to first and second formant at transition, locus equations were

also used as a classification metric for Arabic. The first attempt was part of a

cross-linguistic study of locus equations as a cue for stops place of articulation.

Sussman et al. (1993) recorded the voiced stops /b, d, dQ, g/ as produced by

three speakers of the Cairene dialect of Arabic. They found that both slope and

y-intercept for almost all comparisons were significantly different except for the

slope of /d/ and /dQ/, and the y-intercept for /b/ and /g/. The second study

was conducted by Yeou (1997) who elicited both stops and fricatives from nine

28

Moroccan subjects. Yeou found that y-intercept and slope distinguished between

most fricative comparisons. However, neither slope nor y-intercept distinguished

/S/ from /è/ or /f/ from /X/. More importantly, locus equation slopes were able

to group pharyngealized (/DQ, sQ/) together as a distinct group differing from

their non-pharyngealized counterparts and other fricatives with distinctly low

y-intercepts and flat slopes. Yeou argued that unlike their plain counterparts,

pharyngealized fricatives resist the articulatory effects of the following vowel due

to their double articulation. Instead they induce their coarticulatory effect on

the following vowel by raising its F1 and lowering its F2. This change in F2, as

compared to plain fricatives, causes the slope to be flatter and the intercept to be

lower.

To summarize, several acoustic cues related to spectral, temporal and

amplitude information found in the speech signal were used in different languages

to classify fricatives into their places of articulation. Such cues, alone and

collectively, served to distinguish between different places/classes of fricatives

in English. Howeve, the use of these cues to classify Arabic fricatives has not

received much attention. In our study we attempt to examine how each of the

spectral, temporal and amplitude characteristics mentioned in Sections (2.3)

would serve alone and collectively to distinguish between place of articulation of

Arabic fricatives. Additionally, of particular importance to our study is to see if

the acoustic cues found to be effective in fricative classification in other languages

will be affected by the vowel length differences present in Arabic; and if such cues

would distinguish between plain and pharyngealized fricatives. In the following

chapter, we will discuss how such cues are investigated and the modifications

implemented in the measurements techniques if any.

CHAPTER 3METHODOLOGY

Several spectral, amplitude, and temporal measurements have been used

in previous research to describe the acoustic cues that characterize fricatives in

different languages. The current study investigated Arabic fricatives to find such

acoustic cues. This chapter describes the way in which the speech samples were

elicited, recorded and analyzed. For most of the acoustic analyses, this research

followed the procedures commonly used to study fricatives in English as illustrated

in Jongman et al. (2000). Certain modifications were applied to further investigate

characteristics particular to Arabic. All coding and data analysis was carried

out using the PRAAT software (Boersma and Weenink 2004) and a set of scripts

developed at the phonetics lab of the University of Florida by the author.

3.1 Data Collection

3.1.1 Participants

A group of eight adult male speakers of Modern Standard Arabic (MSA)

were recruited to participate in our study from the general undergraduate student

population of King Saud University1. The mean age of participants was 20 years.

They did not have any history of hearing or speaking impairments, and all had a

very limited experience with English as a second language. Participants were given

class credit by their instructors for participating in the study.

1 King Saud University, Riyadh, Saudi Arabia

29

30

3.1.2 Materials

There is a gap that exists in Arabic between MSA and its vernacular varieties.

Arabic has been known as a traditional example of diglossia in which two varieties

of the language are used to fulfill different communicative functions (Ferguson

1959). Although participants were all fluent speakers of MSA, additional care

was taken in eliciting speech material in order to ensure that the participants

would stay within the target MSA register. Therefore fricatives were elicited

using screen prompted speech in conjunction with prerecorded audio prompts. A

trained phonetician, who is also a fluent speaker of MSA, produced CVC syllables

where the initial consonant was a MSA fricative /f, T, D, DQ, s, sQ, z, S, X, K, è, Q,

h/ followed by each of the six vowels /i, i:, a, a:, u, u:/. The final consonant was

always /t/. Each resulting word was repeated three times to yield a total of 234

audio prompts (13 fricatives × 6 vowels × 3 repetitions). The recorded prompts

were then edited to be of equal length (' 1 second) by adding silence to the end

if needed. The written prompts were constructed using fully vowelled Arabic

orthography on a white background. The participants were instructed to repeat

the word presented in the carrier phrase “qul marratajn” (say twice); with

the audio prompt functioning only as a reference. The prompts were presented

randomly in blocks of 39 words with breaks between blocks. Before the actual

recording of any participant, a practice session with 10 words presented in two

blocks was conducted to familiarize the participants with the task.

3.1.3 Recording

The recording was carried out using the facilities of the Computer &

Electronics Research Institute at KACST2. Two adjacent sound-attenuated

booths with a monitoring window between them hosted the data collection process.

2 King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia.

31

In one booth a PC computer running Microsoft PowerPoint was used to present

the synchronized audio-written production prompts via an LCD screen affixed to

the outside of the monitoring window of the other booth. The text was shown on

the LCD screen while the synchronized audio prompt was fed through headphones

(Sennheiser Noisegard mobile HDC 451). A Kay Elemetrics CSL (Computer

Speech Lab) model 4300B which was connected to another PC computer was

used for in-line recording of the participants’ utterances. It should be pointed out

that anti-aliasing is carried out automatically during data capture through CSL

external module. All recordings were done at 22.05 kHz sampling rate and 16 bit

quantization. The participant’s production of the word in the carrier phrase was

captured using a low-impedance, unidirectional head-worn dynamic microphone

(SHURE SM10A) positioned about 20 mm to the left of the participants’ mouth in

order to prevent direct air flow turbulence from impinging on the microphone.

Each word lasted 4 seconds on the screen and then the following word was

shown. In case a participant did not produce the word in the allocated time or

a mispronunciation occurred, the recording was stopped by the author and that

particular word was presented again.

Each block was saved to a separate sound file for easy manipulation. The

resulting sound files were then transfered into PRAAT for segmentation and further

analyses.

3.2 Data Analysis

3.2.1 Segmentation of Speech

Both a wide-band spectrogram and a waveform display were used in the

segmentation of the recorded material into the monosyllabic words containing

the test fricatives. For each token, four points were identified on the waveforms:

the beginning of frication, the offset of fricative/beginning of the vowel, the end

of the vowel, and the end of word. For all these points the nearest zero-crossing

32

was always used. Fricative onset was taken to be the point in time at which high-

frequency energy appeared on the spectrogram and/or a significant increase in

zero-crossings rate occurred. The offset of the voiceless fricative was taken to be

the point of minimum intensity preceding the periodicity of the vowel. For the

voiced fricatives, the offset was taken to be the zero-crossing of the pulse preceding

the earliest pitch period exhibiting a change in the waveform from that seen

throughout the initial frication (Jongman et al. 2000). The vowel offset was taken

to be the end of periodicity while the end of the segmented token was taken to be

the onset of stop burst release. Figure 3–1 shows an example of these points. The

time indices of the segmentation points were written to a PRAAT TextGrid file. Such

files make it easier to handle the signal independently from the segmentation data

and labels.

Fricative onset Fricative offset Vowel offset Stop release

Figure 3–1. Example of Segmentation

33

The only exception to the above mentioned general rules was with the voiced

pharyngeal fricative /Q/, where it was difficult to visually localize the fricative-

vowel boundary. Pharyngeal fricative /Q/ is known to have a formant-like structure

continuous with that of the following vowel, with the lowest frequency of the

fricative matches that of the second formant of the following vowel (Johnson

1997). Therefore, the frication offset for /Q/ was taken to be the point at which

an upwards intensity-shift occurred with reference to the intensity of the fricative

onset. Such point indicates the shift from low intensity founds in the frication

noise towards the higher intensity of the vocalic part. Figure 3–2 shows an example

of the segmentation of /Q/. Due to the absence of voicing during frication, such

modification in segmentation criteria was not necessary for either /è/ nor /h/.

Fricative onset

Fricative offset

Vowel offsetStop release

Figure 3–2. Segmentation of /Q/. The dotted line shows the intensity level.

34

3.2.2 Acoustic Analyses

All measurements described below were obtained using scripts written by the

author for the PRAAT program. All measurements were then entered into a MySQL

database for later querying and statistical analyses. For spectral analyses based

on fast Fourier transform (FFT), a double-Kaiser window was used. A window

is a frequency weighting function applied to the time domain data to reduce

the spectral leakage associated with finite-duration time signals. This process is

achieved by applying a smoothing function that peaks in the middle frequencies

(forming a main lobe) and decreases to near zero at the edges (forming side lobes),

thus reducing the effects of the discontinuities as a result of finite duration. The

ideal window is one that has a narrow main lobe and low sidelobes (Harris 1978).

However, there is a tradoff relationship between these two characteristics as

narrowing the main lobe introduces many levels of sidelobes and vice versa.

Traditionally, in speech research, Hamming and Hann windows were used

for spectral analyses. However, the more optimum Kaiser window is used in our

study. The Kaiser window is the best approximation to a Gaussian window given

a certain ratio between physical length and effective length. More precisely, when

weighting is used, a Kaiser window of double physical length is applied to the

signal (Boersma and Weenink 2004). Such windowing function produces similar

bandwidth as compared to a Hamming window with comparable effective width.

However, with a Hamming window, we end up with sidelobes of about −42 dB on

each side of the main lobe while such windowing artifacts are at a level of −190 dB

for the Kaiser window (Figure 3–3). Most speech analysis software uses a Hamming

(or Hann) window because evaluating a Kaiser window as explained above is slower

by a factor of two since the analysis is performed on twice as many samples per

frame. With modern computers, such speed/performance tradeoff is minimal and

hence the adaptation of the weighting function for our study.

35

Frequency (Hz)980 1020

Soun

d pr

essu

re le

vel (

dB/H

z)

40

60

80

main lobe

side lobes

Frequency (Hz)980 1020

Soun

d pr

essu

re le

vel (

dB/

Hz)

40

60

80

A B

Figure 3–3. Two Window functions. A)The 0.1-seconds Hamming Window. B)The0.2-seconds Kaiser Window.

Pre-emphasis of each spectral analysis interval was carried out in order to

correct for the −6 dB per octave falloff in production of voiced speech. This falloff

is a result of the 12 dB per octave decrease due to excitation source and 6 dB per

octave increase due to the radiation compensation at the lips. With pre-emphasis

applied, the flattened spectrum would be a function of the vocal tract alone. Pre-

emphasis was applied as described in the PRAAT manual as a filter changing each

sample xj of the sound (except for x1) starting from the last sample according

to Equation (3–1) where 4t is the sampling period of the sound and F is the

frequency above which the change is applied. In our study α was set to 0.98 and F

to 50 Hz. The pre-emphasis filter was applied to the signal before windowing.

α = exp (−2 π F 4t)

xj = xj − αxj−1

(3–1)

36

3.2.2.1 Duration

Three temporal measurements were extracted based on the segmentation

criteria mentioned above: fricative, vowel and word duration. Since different tokens

of the same fricative included different stop burst durations, word duration was

measured from fricative onset to the point where the release of stop burst is visible

on the spectrogram (Figure 3–4).

FricativeVowel

Word

Figure 3–4. Duration

3.2.2.2 Spectral Moments

Spectral Moments measurements were modeled after those of Forrest et al.

(1988) with the window length modification employed by Jongman et al. (2000).

After pre-emphasis is applied to the signal, FFT spectra were calculated from

four different locations in the fricative with a 40 ms double-Kaiser window. The

first three windows were aligned so that the first covered the initial 40 ms of the

fricative, the second the middle 40 ms and the third the final 40 ms of frication

noise. The fourth window was centered over the fricative-vowel boundary so that

it covered 20 ms of each, capturing any transitional information. The analysis

37

windows may or may not overlap based on the length of the frication noise.

Following Forrest et al. (1988), each FFT was treated as a random probability

distribution from which the first four moments (mean, variance, skewness, and

kurtosis) were calculated. Only moments from linear spectra were calculated since

previous research on fricatives (Jongman et al. 2000) reported that there was no

substantial difference between the linear and bark-transformed spectra. The PRAAT

program measures the first moment (center of gravity) as in Equation (3–2) where

S(f) is the complex spectrum, f is the frequency and the denominator is the

energy. The quantity p was set to 2 in order to weigh the average frequency by the

power spectrum (not by the absolute spectrum).∫∞0

f |S(f)|p df∫∞0|S(f)|p df

(3–2)

The other three moments were first calculated using Equation (3–3) where n

denotes the nth moment. To normalize skewness with regard to different levels of

variance, the product of Equation (3–3), with n = 3, was divided by 1.5 power of

the second moment. Likewise, to normalize kurtosis, the product of Equation (3–3),

with n = 4, was divided by the square of the second moment and then a value of 3

was subtracted (Forrest et al. 1988).∫∞0

(f − fc)n |S(f)|p df∫∞

0|S(f)|p df

(3–3)

3.2.2.3 RMS Amplitude

Root-Mean-Square (RMS) amplitude in dB was measured from the entire

frication noise. Since different speakers and recording sessions may result in

different intensities, direct measures of amplitude cannot be compared across

speakers. Therefore, fricative amplitude was normalized using the method

described by Behrens and Blumstein (1988b). Basically, the average RMS

amplitude (in dB) of three consecutive pitch periods at the point of maximum

38

vowel amplitude was subtracted from the RMS amplitude of the entire frication

noise. In PRAAT, RMS amplitude was given in units of Pascal and were then

changed into dB following Equation(3–4).

RMS Amplitude dB = 20× log10

{Amplitudepascal

2× 10−5

}(3–4)

3.2.2.4 Spectral Peak Location

Spectral Peak Location of the fricative was estimated using a 40 ms double-

Kaiser window positioned over the middle of the frication noise. The analysis

window was set this large in order to gain better frequency resolution (Jongman

et al. 2000). Another window was placed at the end of the frication noise such

that its right shoulder was aligned with the end of frication noise. The two window

locations were used because studies of spectral peak location have demonstrated

that high frequency peaks are more likely to emerge at the middle and end of

frication noise (Behrens and Blumstein 1988a). Further, as explained in Section

(2.3.3.1), it is anticipated that two patterns of peaks will emerge: one at middle of

the frication noise and the other at the end of the co-articulated pharyngealized

fricatives due to their coarticulatory nature. After applying pre-emphasis and

windowing, an FFT spectrum was derived. A script written for PRAAT searched

each spectrum to find the highest amplitude peak and its associated frequency. As

before, the amplitude was converted into dB using Equation (3–4).

3.2.2.5 Relative Amplitude

Relative Amplitude was measured as described in Hedrick and Ohde (1993)

and later in Jongman et al. (2000) with one more modification. An FFT spectrum

was derived at vowel onset with a 23.3 ms double-Kaiser window. The mean value

of the first six formants in the windowed selection were estimated based on the

FFT spectrum. Each spectrum was then filtered using a pass-band Hann filter to

39

isolate regions of the second, third and fifth formants based on the mean values

obtained above. Each region spanned from the mean frequency of the target

formant to half the distance to the two adjacent formants. A schematic example of

the upper and lower limits of such region is presented in Equation (3–5).

maxFi = meanFi + [(meanFi −meanFi−1)/2]

minFi = meanFi − [(meanFi+1 −meanFi)/2](3–5)

A script written for PRAAT searched each frequency region of the spectrum

to find its spectral peak and associated amplitude as mentioned in Section 3.2.2.4

above. Similar to previous research with (English) fricatives, spectral peak at the

F5 region was used for non-sibilant fricatives /f, T, D/ and spectral peak at F3

region for sibilant fricatives /s, z, S/. However, for the remaining fricatives (/X, K,

Q, h, sQ, DQ/), spectral peak of the F2 region was used.

Another FFT spectrum was derived at the middle of frication noise and

subsequently filtered into frequency regions based on the frequency of amplitude

peaks of F2, F3 and F5 regions of the vowel. Each region spanned 128 Hz on

each of the two sides around the vowel’s frequency regions. The amplitude of the

spectral peak in the said regions was measured using the same procedure outlined

above for the vowel. Relative amplitude was then defined for each frequency region

as the ratio between fricative amplitude and vowel amplitude at that frequency

range. Ratios in log scale are expressed as the difference between the two values.

3.2.2.6 Locus Equations

Following previous research on locus equations (for example Sussman et al.

1991, 1993; Fowler 1994; Sussman 1994; Yeou 1997; Govindarajan 1998; Jongman

1998; Jongman et al. 2000; Tabain 2002), coefficients of locus equations were

derived from scatterplots of F2 values measured at vowel onset and vowel nucleus

for each speaker and place of articulation combination. Specifically, the second

formant at vowel onset as well as at the middle of the vowel were estimated using

40

the formant tracking procedure implemented in PRAAT. At first, the sound was

resampled to 10 kHz and then pre-emphasized using the algorithm mentioned

above Equation (3–1). After a Gaussian-like window of 25 ms length was applied to

the signal, the LPC coefficients were calculated for each analysis window using the

algorithm by Burg, as given in Anderson (1978) and Press, Flannery, Teukolsky,

and Vetterling (1992). For each speaker and place combination, linear regression

fits were applied on scatterplots with F2 averaged across all vowel contexts. Each

scatterplot had F2 measured at the onset of the vowel represented on the y-axes

and F2 measured at the mid-point of the vowel represented on the x-axes. The

coefficients of each regression line (the slope ‘k’ and the y-intercept ‘c’) were taken

to be the terms of locus equations.

3.2.2.7 F2 at Transition

Second Formant at the transition was also measured from the first window (at

vowel onset) used to derive F2 for the locus equations above.

3.3 Statistical Analyses

Along with reporting the descriptive statistics for the acoustic measures

mentioned above, measures of significant differences between different places

of articulation for these measures were obtained using appropriate Analysis of

Variance (ANOVA) methods. All reported statistics were calculated from data

points aggregated across the three repetitions for each speaker.

Discriminant function analysis (DFA) was used to measure the contribution

of different cues towards the classification of fricatives into their respective classes.

The DFA procedure reduces the physical space, built by extracted cues, into

subspaces corresponding to the sound classes under consideration (Jassem 1979).

This classification method works first by forming vectors of the metrics mentioned

above. Recall that each cue mentioned above, except for locus equations, represents

a value of some single feature at a given point in time. Therefore, each token can

41

be represented as a combination of values (a vector) from all these cues. All the

tokens, then, are represented as points defined by their respective vectors in a

multidimensional space. The dimensions of such space depend on the number of

parameters in use.

The goal of DFA is to find the optimal number of parameters that provide the

optimal classification accuracy of tokens into their pre-defined classes. This process

involves calculating three types of probabilities: the probability of observing a

particular parameter p for a token t (P [ p | t ]), the probability of observing a token

t in the data (P [ t ]) and finally the probability of observing a specific value for

a parameter (P [ p ]). All these probabilities are calculated from training data to

predict the membership of an unknown token in testing data using the Bayesian

Theorem (3–6). The value P [ t |p ] is the probability that an unknown token belongs

to class t given a value for parameter p (Harrington and Cassidy 1999).

P [ t | p ] =P [ p | t ] P [ t ]

P [ p ](3–6)

The unknown token then is classified as belonging to class A (ta) not class B

(tb) if the condition P [p|ta ]P [ta ] > P [p|tb ]P [tb ] is satisfied (Harrington and Cassidy

1999). The traditional way of applying this method to fricatives classification (see

for example Shadle and Mair 1996; Tabain 1998; Jongman et al. 2000; Nissen 2003)

involves all-but-one speakers as the training data and tokens from the remaining

speaker as the testing data. The process is repeated so that each speaker will be

in the testing data at a given time. The DFA procedure produces a classification

accuracy score along with a set of coefficients that represent the contribution of the

parameters in the classification.

CHAPTER 4AMPLITUDE AND DURATION

This chapter reports results of the amplitude and duration measurements.

These results were derived from a three-way ANOVA with place of articulation,

voicing, and vowel context as between-subject factors. Post hoc tests of significant

effects were adjusted for multiple comparisons using the Bonferroni method. All

data were aggregated across the three repetitions of each speaker prior to any

statistical analysis.

4.1 Amplitude Measurements

4.1.1 Normalized Frication Noise RMS Amplitude

Normalized frication RMS amplitude was calculated as the difference

between frication noise RMS amplitude and the average RMS amplitude of

three consecutive pitch periods at the point of maximum vowel amplitude.

A three-way Analysis of Variance (ANOVA) with normailized frication noise

RMS as the dependent factor and the place of articulation, voicing, and vowel

context as between subject factors revealed a significant main effect of Place

[F (8, 561) = 75.241, p < 0.001; η2 = 0.518]. Due to a lack of voicing contrast

at some places of fricative articulation in Arabic (Labiodental, Post-Alveolar, and

Glottal), differences within voiceless fricatives and within voiced fricatives will

be interpreted separately. For both voiced and voiceless fricatives, subsequent

Bonferroni post hoc tests showed that plain fricatives and their pharyngealized

counterparts (/D - DQ/ and /s - sQ/) did not differ in normalized RMS amplitude

(mean normalized RMS values are reported in Figure 4–1). However, with the

exception of the contrast between voiced alveolar and uvular fricatives (/z -

K/), normalized RMS amplitude significantly (p < 0.0001) distinguished all

42

43

places of voiced fricative articulation. Additionally, within voiceless fricatives,

nonsibilant fricatives /f, T/ had the lowest normalized RMS amplitude (−23.94

and −22.50 dB respectively). While such RMS amplitude values for /f/ and /T/

were not statistically different from each other, normalized RMS amplitude values

of both /f/ and /T/ were significantly lower than all other voiceless fricatives.

Additionally, no differences were obtained between /s, S, h/ or between /X, è/. All

other contrasts were significant (Figure 4–1).

-17.26

-14.53

-16.55

-13.66

-7.52

-18.15

-14.40

-20.17

-19.09

-14.01

-15.38

-22.50

-23.94Labiodental

Dental

Pharyngealized

Dental

Alveolar

Pharyngealized

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Glottal

Pla

ce o

f A

rtic

ula

tion

Normalized RMS Amplitude (dB)

voiced voiceless

Normalized RMS Amplitude (dB)

Pla

ceofA

rtic

ula

tion

Figure 4–1. Mean frication noise normalized RMS amplitude (dB) by place ofarticulation and voice.

There was also a significant main effect of Vowel context [F (5, 561) =

16.185, p < 0.001; η2 = 0.126]. For short vowels, normalized frication RMS

amplitude tended to be lower as the vowel context changed from /i/ to /u/ to

44

/a/ with means of −16.51 dB, −17.03 dB, and −17.81 dB respectively. The same

pattern was also observed with long vowels (/i:/ to /u:/ to /a:/ with means of

−14.30 dB, −16 dB, and −18.58 dB respectively). However, statistically significant

differences in terms of vowel context effect, as suggested by post hoc tests, were

observed with long vowels only with p = 0.004 for the /i: -u:/ contrast and

p < 0.001 for all other contrasts. Additionally, as can be seen from Figure 4–2,

when comparing a short vowel to its long variant, we find that only the front

long vowel /i:/ resulted in a significantly (p < 0.001) lower value for normalized

frication RMS amplitude than its short counterpart /i/.

-20

-18

-16

-14

-12

-10

-8

-6

-4

-2

0

/ i / / u / / a /

Vowel Context

Norm

aliz

ed R

MS A

mplit

ude (

dB)

Short Vowels Long Vowels

Vowel Context

Norm

alize

dR

MS

Am

plitu

de

(dB

)

Figure 4–2. Mean frication noise normalized RMS amplitude (dB) by vowelcontext.

Finally, a significant main effect of Voicing [F (1, 518) = 315.204, p <

0.001; η2 = 0.36] was also found. Normalized RMS amplitude of voiced fricatives

45

(mean = −14.22 dB) was greater than that of voiceless fricatives (mean = −18.26

dB). In addition to this main effect, there was a significant Place by Voicing

interaction [F (3, 561) = 41.9, p < 0.001; η2 = 0.183]. As can be seen in Figure

4–3, Bonferroni post hoc tests showed that the significant difference in normalized

frication RMS amplitude between voiced and voiceless fricatives noted above was

not present for alveolar fricatives /s, z/.

-25

-20

-15

-10

-5

0

Dental Alveolar Uvular Pharyngeal

Place of Articulation

Norm

aliz

ed R

MS A

mplit

ude (

dB)

Voiced

Voiceless


Norm

alize

dR

MS

Am

plitu

de

(dB

)

Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.

1

Figure 4–3. Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.

4.1.2 Relative Amplitude of Frication Noise

Relative amplitude is defined here as the ratio between the amplitude of

a specific frequency (F3 for /f, T, D/, F5 for /s, z, S/, and F2 for /X, K, sQ, DQ,

è, Q, h/) measured at the frication noise midpoint and the amplitude of the

corresponding frequency measured at vowel onset. Results of a three-way ANOVA

46

(place × voice × vowel) with relative amplitude as the dependent variable showed a

significant main effect of Place [F (8, 561) = 104.525, p < 0.001; η2 = 0.598].

In general, relative amplitude of a fricative becomes greater as the place of

articulation advances towards the lips (Figure 4–4). The only notable exception

was the post-alveolar fricative (/S/). It was the only fricative in which the frication

amplitude measured at the region of F3 was greater than the amplitude of the

same frequency region at the following vowel onset (i.e., giving a value for relative

amplitude above zero). Collapsed across voicing, differences in relative amplitude

between all places of fricative articulation were significant with the exception of all

possible pairwise comparisons between the following three places: alveolar /s, z/,

pharyngeal /è, Q/, and glottal /X, K/ fricatives. However, since voicing contrast

is not present at all places, Bonferroni post hoc tests carried out on voiced and

voiceless fricatives showed a different pattern. Within voiced fricatives, relative

amplitude of pharyngealized dental fricative /DQ/ was significantly lower than those

of all other voiced fricatives, while those of alveolar /z/, dental /D/, and uvular

/K/ fricatives were not statistically different from one another. Furthermore, the

difference in relative amplitude between /D/ and /Q/ was not significant. All other

contrasts between voiced fricatives were significant (Figure 4–4). Within voiceless

fricatives, relative amplitude differentiated /f/ (−5.22 dB) and /T/ (−5.45 dB)

from all other fricatives; however, no significant difference was observed between

these two nonsibilant fricatives. Additionally, relative amplitude differentiated

between all other voiceless fricatives with the exception of the contrasts between

/s/–/è/, /s /–/h/, and /è/–/h/.

There was also a significant main effect for Vowel context [F (5, 561) =

11.642, p < 0.001; η2 = 0.094]. However, the source of this main effect as revealed

by Bonferroni post hoc tests can be solely attributed to differences in the context of

long vowels. Specifically, relative amplitude of fricatives followed by the high back

47

-14.95

-28.03

-20.05

-11.78

-5.22

-5.45

-31.23

-22.66

-17.32

-14.27

-16.28

0.90

-15.76

Labiodental

Dental

Pharyngealized

Dental

Alveolar

Pharyngealized

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Glottal

Pla

ce o

f A

rtic

ula

tion

Relative Amplitude (dB)

Voiced Voiceless

Relative Amplitude (dB)

Pla

ceofA

rtic

ula

tion

Figure 4–4. Mean relative amplitude of fricatives.

48

vowel /u:/ (mean = −11.31 dB) was significantly higher (p < 0.0001) than relative

amplitude of fricative in front of any other vowel except /i:/ which has similar

height and length as /u:/. Another source for the obtained main effect above was

the significantly low (p < 0.016) relative amplitude of fricatives preceding the low

vowel /a:/ (mean = −17.02 dB) in relation to other long vowels. Furthermore,

there was a general trend such that a short vowel would result in a lower relative

amplitude than its long counterpart with only /u, u:/ contrast reaching significance

level (p < 0.05). Mean values for relative amplitude of fricatives in different vowel

contexts are presented in Table 4–1 where cells with significant differences are

shaded.

Table 4–1. Relative amplitude in different Vowel contexts. Means are arranged indescending order.

Mean /i/ /u/ /a/ /i:/ /u:/ /a://u:/ -11.31 ∗ ∗ ∗ ∗/i:/ -13.85 ∗ ∗/i/ -16.17 ∗/u/ -16.33 ∗/a:/ -17.02 ∗ ∗/a/ -18.61 ∗ ∗∗ significant difference at p < 0.05

The ANOVA also revealed a significant Place by Voicing interaction

[F (3, 561) = 20.834, p < 0.001; η2 = 0.10]. Bonferroni post hoc tests showed

that only the differences between voiceless and voiced dental fricatives /T, D/ (9.5

dB) and between voiceless and voiced pharyngeal fricatives/è, Q/ (−5.5 dB) were

significant (Figure 4–5). However, no main effect of voicing was obtained.

A Place by Vowel context interaction was also significant [F (40, 561) =

4.101, p < 0.001; η2 = 0.226]. Multiple one-way ANOVAs, with Bonferroni post

hoc tests corrected for multiple comparisons, were conducted for each place of

articulation in which vowel context was separated as long and short vowels. The

results of these ANOVAs showed that for long vowels, the significant increase

49

-25

-20

-15

-10

-5

0



Norm

aliz

ed R

MS A

mplit

ude (

dB)

Voiced

Voiceless


Rel

ati

ve

Am

plitu

de

(dB

)


1

Figure 4–5. Relative amplitude as a function of Place and Voicing.

50

of relative amplitude in front of /u:/ mentioned above was present only within

labiodental (/f/) (mean = 5.34 dB) and alveolar (/s, z/) (mean = −6.37 dB)

fricatives. In addition, relative amplitude within pharyngealized alveolars (/sQ/) in

the context of low vowel /a:/ was significantly lower (mean = −38.21 dB) than in

the context of high vowels /i:/ (mean = −21.36 dB) and /u:/ (mean = −22.54 dB).

Finally, unlike the absence of differences between long vowels of the same height

observed above, the relative amplitude of glottal fricative (/h/) in the context

of the front vowel /i:/ (mean = −10.21 dB) was significantly higher than in the

context of back vowel /u:/ (mean = −20 dB) (Figure 4–6). As for short vowels,

a similar pattern of significant differences was obtained. Specifically, the relative

amplitude of labiodental (/f/) and alveolar (/s, z/) fricatives was significantly

higher in the context of /u/ (mean = −1.31 and −10.64 dB respectively) than

either /i/ (mean = −9.77 and −21.58 dB respectively) or /a/ (mean = −9.83

and −20.79 dB respectively). Moreover, the relative amplitude of pharyngealized

Alveolar (/sQ/) in the context of low vowel /a/ (mean = −39.07 dB) was only

significantly lower than in the context of high vowel /i/ (mean = −28.02 dB)

(Figure 4–7). Mean values for relative amplitude of fricatives in different vowel

context are also presented in Table (4–2).

Finally, a Vowel context by Voicing interaction was also found to be significant

[F (5, 561) = 4.574, p < 0.001; η2 = 0.039]. Bonferroni post hoc tests were carried out

on long and short vowels separately. In general the relative amplitude of voiceless

fricatives in a given vowel context is higher than that of voiced fricatives in the

same context (Figure 4–8 and Figure 4–9), however this difference was significant

only with /i:/ (mean = −10.80 dB for voiceless and −18.71 dB for voiced).

51

-50

-40

-30

-20

-10

0

10

/ i // u // a /

Place of ArticulationR

ela

tive

Am

plitu

de

(dB

)

/h//è, Q//X, K//S//sQ//s, z//DQ//T, D//f/

Figure 4–6. Relative amplitude (dB) as a function of place of articulation and shortvowels.

52

-50

-40

-30

-20

-10

0

10

/ i : // u: // a: /

`

Place of ArticulationR

ela

tive

Am

plitu

de

(dB

)


Figure 4–7. Relative amplitude (dB) as a function of place of articulation and longvowels.

53

Tab

le4–

2.M

ean

rela

tive

amplitu

de

offr

icat

ion

noi

se.

/i/

/u/

/a/

shor

tlo

ng

shor

tlo

ng

shor

tlo

ng

Lab

ioden

tal

Voi

cele

ss-9

.77

-7.1

2-1

.31

5.34

-9.8

3-8

.64

Den

tal

Voi

ced

-18.

88-1

5.22

-14.

49-9

.36

-15.

85-1

5.88

Voi

cele

ss-7

.13

-5.2

6-6

.51

0.87

-7.5

5-7

.12

Alv

eola

rVoi

ced

-21.

54-1

8.67

-9.8

3-6

.91

-22.

28-1

8.44

Voi

cele

ss-2

1.62

-17.

87-1

1.46

-5.8

4-1

9.30

-18.

49

Pos

-Alv

eola

rVoi

cele

ss-2

.09

-1.0

53.

737.

96-3

.16

0.01

Uvula

rVoi

ced

-21.

31-2

2.71

-18.

45-1

5.10

-22.

58-2

0.15

Voi

cele

ss-1

6.67

-16.

52-2

9.88

-22.

51-2

7.48

-22.

90

Phar

ynge

alVoi

ced

-12.

98-1

2.05

-14.

65-1

0.58

-10.

78-9

.66

Voi

cele

ss-1

0.60

-7.0

4-2

4.63

-21.

55-1

9.76

-20.

35

Phar

ynge

aliz

edD

enta

lV

oice

d-2

6.30

-24.

91-2

8.53

-26.

76-3

2.02

-29.

67

Phar

ynge

aliz

edA

lveo

lar

Voi

cele

ss-2

8.02

-21.

36-3

8.21

-22.

54-3

9.07

-38.

21

Glo

ttal

Voi

cele

ss-1

3.30

-10.

21-1

8.09

-20.

00-1

2.20

-11.

80

54

-25

-20

-15

-10

-5

Voiced Voiceless

/ i /

/ u /

/ a /

`

VoicelessVoiced

Rela

tive

Am

plitu

de

(dB

)Figure 4–8. Relative amplitude (dB) as a function of voicing and vowel context

(short vowels).

55

-25

-20

-15

-10

-5

Voiced Voiceless

/ i : /

/ u: /

/ a: /

VoicelessVoiced

Rela

tive

Am

plitu

de

(dB

)Figure 4–9. Relative amplitude (dB) as a function of voicing and vowel context

(long vowels).

56

4.2 Temporal Measurements

Two measures of fricative noise duration are reported here: absolute fricative

duration and normalized fricative duration. For the latter, the ratio between word

and fricative durations was calculated to normalize and account for the different

speaking rates that might have occurred. For each measure, a three-way ANOVA

(place × voice × vowel context) was carried out. Subsequent post hoc tests were

corrected for multiple comparisons using the Bonferroni method.

4.2.1 Absolute Duration of Frication Noise

A three-way ANOVA (place × voice × vowel context) with the duration

of the frication noise as the dependent factor revealed a main effect of Place

[F (8, 561) = 50.092, p < 0.001; η2 = 0.417] with mean frication noise duration

of 117.99 ms. Mean duration of frication noise as a function of place of articulation

and voicing are presented in Figure 4–10. Averaged across voicing and vowel

context, pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest

duration with a mean of 86.47 and 98.55 ms respectively. Due to the well known

effect of voicing on segmental duration (Cole and Cooper 1975; Manrique and

Massone 1981; Baum and Blumstein 1987; Behrens and Blumstein 1988b; Crystal

and House 1988; Pirello et al. 1997, among others), two sets of comparisons were

mad, one fore voiced and the other for voiceless fricatives. Among voiced fricatives,

alveolar fricative /z/ was significantly longer than all other voiced fricatives with a

mean duration of 110.12 ms. No other differences among voiced fricatives reached

the significance level of p < 0.05.

On the other hand, contrasts within voiceless fricatives revealed that glottal

fricative /h/, with a mean duration of 98.55 ms, was significantly shorter than all

other voiceless fricatives. Although no significant difference between nonsibilants

was observed, each of the nonsibilants /f/ and /T/ (127.86 and 131.68 ms

respectively) were significantly shorter than each of the sibilants /s/, /sQ/, and

57

/S/. Additionally, alveolar /s/ and it pharyngealized counterpart /sQ/ (mean =

149.86 and 149.70 ms) were significantly longer than all other voiceless fricatives

excluding /S/. As in the case of voiced fricatives, no significant differences were

found among voiceless labiodental, dental, uvular, and pharyngeal fricatives or

between pharyngealized fricatives and their plain counterparts (/sQ-s/).

91.36

86.47

110.21

88.39

83.82

127.86

131.68

149.86

138.59

149.70

142.59

134.84

98.55

Labiodental

Dental

Pharyngealized

Dental

Alveolar

Pharyngealized

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Glottal

Pla

ce o

f A

rtic

ula

tion

Frication Noise Duration (ms)

Voiceless

Voiced

Frication Noise Duration (ms)

Pla

ceofA

rtic

ula

tion

Figure 4–10. Absolute Frication noise duration as a function of place and voiceaveraged across all vowel context and speakers.

Also, as expected, a main effect of Voicing was found [F (1, 561) = 721.75, p <

0.001; η2 = 0.563], with voiceless fricatives (mean 134.21 ms) being significantly

longer than voiced fricatives (mean 92.05 ms). A Place by Voice interaction was

also significant [F (3, 561) = 3.327, p < 0.05; η2 = 0.017]. Subsequent Bonferroni post

hoc tests showed that this difference was significant across all places of articulation

58

with a voicing contrast (Figure 4–11). The source of this interaction is probably

due to variation in the magnitude of duration differences between a voiced and

voiceless fricative in a given place. As is apparent from Figure 4–11, the difference

between voiced and voiceless fricatives was greater for uvular and pharyngeal than

for dental and alveolar fricatives.

60

70

80

90

100

110

120

130

140

150

160



Dura

tion o

f Fricati

on N

ois

e (

ms)

Voiced

Voiceless


Fri

cati

on

Nois

eD

ura

tion

(ms)


1

Figure 4–11. Mean absolute frication noise duration for places with a voicingcontrast.

Finally, a main effect of Vowel context [F (5, 561) = 4.708, p < 0.001; η2 = 0.04]

was significant. However, post hoc tests showed that differences in frication

noise duration measured in the context of vowels of the same length were not

significantly different from each other. Moreover, the source of the main effect was

due to the significantly increased duration of fricatives measured in the context of

/i:/ (mean 123.25 ms) as compared to all short vowels; and the significantly longer

59

duration of frication noise in the context of /u:/ (mean 122.80 ms) when compared

to /a, u/ (Figure 4–12).

0

20

40

60

80

100

120

140

/ i / / u / / a /

Vowel Context

Dura

tion o

f Fricati

on N

ois

e (

ms)


Vowel Context

Fri

cati

on

nois

eD

ura

tion

(ms)

Figure 4–12. Mean absolute frication noise duration in different vowel contexts.

4.2.2 Normalized Duration of Frication Noise

Normalized frication noise duration is defined here as the ratio between

fricative duration and word duration. As can be seen from Figure 4–13, normalized

frication noise followed a pattern similar to the one observed with absolute

frication noise duration. Specifically, averaged across voicing and vowel context,

pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest normalized

duration with means of 0.27 and 0.31 respectively. The results of the three-way

ANOVA revealed a main effect of Place [F (8, 561) = 49.82, p < 0.001; η2 = 0.415].

Separated according to voicing, Bonjferroni post hoc tests showed, as was the case

60

with absolute duration, that /z/ (mean 0.34) was significantly longer than all other

voiced fricatives. No significant differences were observed among voiced dental,

uvular, and pharyngeal fricatives or between pharyngealized dental and their plain

counterparts (i.e., /DQ - D/).

As for contrasts within voiceless fricatives, glottal fricative /h/, with the

mean duration of 0.307, was significantly shorter than all other voiceless fricatives.

Moreover, voiceless alveolar /s/ was significantly longer than all other voiceless

fricatives excluding the post-alveolar and pharyngealized alveolar fricatives/S, sQ/,

which in themselves were significantly longer than labiodental, pharyngeal, and

glottal fricatives /f, è, h/. No difference among voiceless fricatives reached the

significance level of p < 0.05.

0.284

0.266

0.335

0.276

0.263

0.307

0.379

0.401

0.405

0.388

0.412

0.370

0.375Labiodental

Dental

Pharyngealized

Dental

Alveolar

Pharyngealized

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Glottal

Pla

ce o

f A

rtic

ula

tion

Mean Normalized Frication Duration

Voiceless

Voiced

Normalized Frication Noise Duration

Pla

ceofA

rtic

ula

tion

Figure 4–13. Mean normalized frication noise duration as a function of place andvoice averaged across all vowel contexts and speakers.

61

The effect of Voicing on normalized fricative duration was also significant

[F (1, 561) = 724.74, p < 0.001; η2 = 0.564]. Averaged across other conditions,

voiced fricatives had significantly shorter normalized durations (mean = 0.29)

than voiceless fricatives (mean = 0.38). In addition, a significant Place by Voicing

interaction [F (3, 561) = 7.079, p < 0.001; η2 = 0.036] and subsequent Bonferroni

post hoc tests showed that this difference was greater for uvular and pharyngeal

than for dental and alveolar fricatives (Figure 4–14).

0.15

0.20

0.25

0.30

0.35

0.40

0.45



Norm

aliz

ed D

ura

tion o

f Fricati

on N

ois

e

Voiced

Voiceless


Norm

alize

dFri

cati

on

Nois

eD

ura

tion


1

Figure 4–14. Mean of normalized frication noise duration for places with a voicingcontrast.

Finally, as shown in Figure 4–15, normalized frication noise duration was

significantly affected by the Vowel context [F (5, 561) = 8.862, p < 0.001; η2 =

0.073]. However, such effect as suggested by Bonferroni post hoc tests was localized

only with reference to contrasts involving long vowels. Specifically, while no

62

significant differences were observed within short vowels, normalized frication noise

duration was significantly shorter (mean = 0.32) in the context of /a:/ than all

other vowels. On the other hand, fricatives preceding /i:/ had significantly longer

normalized duration (mean 0.35) than in the context of other long vowels.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

/ i / / u / / a /

Vowel Context

Norm

aliz

ed D

ura

tion o

f Fricati

on N

ois

e


Vowel Context

Norm

alize

dFri

cati

on

Nois

eD

ura

tion

Figure 4–15. Mean normalized frication noise duration in different vowel contexts.

CHAPTER 5SPECTRAL MEASUREMENTS

5.1 Spectral Peak Location

This chapter reports on results of the spectral measurements which include

spectral peak location (frequency region of eneregy maximum in frication noise)

and spectral moments (mean, variance, skewness, and kurtosis). As mentioned in

Section (3.2.2.4), spectral peak frequencies were measured at eh center as well as

the end of frication noise. First, mean spectral peak location obtained from the two

locations was used in a one-way ANOVA as dependent variable to test for the effect

of the analysis window location. The ANOVA showed a main effect for Window

Location [F (1, 1246) = 1022.9, p < 0.001; η2 = 0.451]. Mean spectral peak location

when measured at the middle of the frication noise (4323 Hz) was higher than when

measured at the end of frication noise. However, a three-way ANOVA (place ×

vowel × voicing) with spectral peak measured at the end of the frication noise as

the dependent variable showed no significant effect for place. Therefore only the

results of measurements derived from the middle of frication noise will be reported

in details below.

Table 5–1 represents the mean frequency of spectral peak location obtained

from a 40-ms Kaiser window placed at the middle of frication noise of all fricatives

in different vowel contexts averaged across speakers and repetitions. Results of

a three-way ANOVA (place × vowel × voicing) with spectral peak measured at

the middle of frication noise as the dependent variable revealed a main effect for

Place [F (8, 561) = 143.402, p < 0.001; η2 = 0.672]. The observed general trend

of spectral peak location is that, when averaged across speakers and vowel context,

63

64

the frequency of the peak tends to decrease as the place of articulation moves

backwards in the oral cavity.

Since voicing contrast is not present for some places of fricative articulation

in Arabic, Bonferroni post hoc tests conducted to test for the simple main effect

for place will be conducted separately for voiced and voiceless fricatives. That is,

differences within voiceless fricatives and within voiced fricatives will be interpreted

separately. Mean frequencies of spectral peak of fricatives separated by place

and voicing are presented in Figure (5–1). Among voiceless fricatives, three

homogeneous groups of fricatives articulated at adjacent places emerged, with

differences in spectral peak location significant only for contrasts between members

of different groups. The first group included labiodental, dental, and alveolar

fricatives (/f, T, s/); the second included post-alveolar and uvular fricatives (/S,

X/); and finally the third group consisted of pharyngeal and glottal fricatives (/è,

h/). As for voiced fricatives, only the difference between /K/ and /Q/ was not

significant. Moreover, no significant difference was observed between plain fricatives

and their pharyngealized counterpart (/D - DQ/ or /s - sQ/).

Another main effect was observed for Voicing [F (1, 561) = 152.388, p <

0.001; η2 = 0.214], in which the frequency of spectral peak location for

voiceless fricatives (mean =4957 Hz) was significantly greater than that of voiced

fricatives (mean =3279 Hz). However, a significant Place by Voicing interaction

[F (3, 562) = 26.48, p < 0.001; η2 = 0.124] and subsequent Bonferroni post hoc

comparisons within places that have a voicing contrast showed that the difference

between voiceless and voiced fricatives was not significant for alveolar fricatives (/s,

z/). Also, as apparent from Figure (5–2), the difference was most prominent for the

nonsibilant dental fricatives (/T, D/).

A main effect for Vowel context was also significant [F (5, 561) = 8.473, p <

0.001; η2 = 0.07]. While no significant differences between vowels differing only in

65

Tab

le5–

1.M

ean

freq

uen

cy(H

z)of

amplitu

de

pea

kas

mea

sure

dat

the

mid

dle

offr

icat

ion

noi

se.

/i/

/u/

/a/

shor

tlo

ng

shor

tlo

ng

shor

tlo

ng

Lab

ioden

tal

Voi

cele

ss81

4472

1070

3162

4176

1379

40

Den

tal

Voi

ced

4115

5838

2559

3823

2942

1788

Voi

cele

ss76

8682

7574

2675

1378

7972

48

Alv

eola

rVoi

ced

6720

8079

5228

5283

7124

7237

Voi

cele

ss80

1676

8655

8358

0173

8972

70

Pos

t-A

lveo

lar

Voi

cele

ss34

8636

9033

2736

6833

4834

95

Uvula

rVoi

ced

1872

2153

1414

1368

2186

2104

Voi

cele

ss32

0632

3839

2733

9833

2337

67

Phar

ynge

alVoi

ced

763

1139

640

641

900

1162

Voi

cele

ss24

9325

4526

5124

1422

0322

98

Phar

ynge

aliz

edD

enta

lV

oice

d34

1342

4931

0127

6737

0240

47

Phar

ynge

aliz

edA

lveo

lar

Voi

cele

ss71

3568

7547

3861

4769

7271

37

Glo

ttal

Voi

cele

ss22

4323

6393

511

4917

7620

42

66

3511

3547

6612

1850

874

7363

7671

6958

1751

2434

3476

3502

6501

Labiodental

Dental

Pharyngealized

Dental

Alveolar

Pharyngealized

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Glottal

Pla

ce o

f A

rtic

ula

tion

Spectral Peak Location (Hz)

Voiceless

Voiced

Spectral Peak Location (Hz)

Pla

ceofA

rtic

ula

tion

Figure 5–1. Mean spectral peak location as a function of place and voicing

67

0

1000

2000

3000

4000

5000

6000

7000

8000

9000



Spectr

al peak locati

on (

Hz)

Voiced

Voiceless


Spect

ralPeak

Loca

tion

(Hz)

Figure 5–2. Place of articulation and voicing interaction for spectral peak location

68

length were present (Figure 5–3), subsequent post hoc tests adjusted for multiple

comparisons using the Bonferroni method showed that frequency of spectral peak

location measured in the context of either /u/ or /u:/ was significantly lower than

spectral peak location measured in the context of either /i/ or /i:/. Moreover,

spectral peak location of fricatives preceding /u/ had significantly lower frequencies

than in the context of all other vowels except as noted above for the /u-u:/

contrast.

0

1000

2000

3000

4000

5000

6000

/ i / / a / / u /

Vowel Context

Spectr

al peak locati

on (

Hz)

short long


Spect

ralPeak

Loca

tion

(Hz)

Figure 5–3. Frequency of spectral peak location in different vowel contexts

A significant [F (40, 561) = 1.441, p < 0.05; η2 = 0.093] Place by Vowel context

interaction with subsequent Bonferroni post hoc tests showed that the effect of

vowel context mentioned above was confined only to alveolar and glottal fricatives.

As apparent from Figure (5–4) and Figure (5–5), both /u/ and /u:/ resulted in a

significantly lower frequency of spectral peak location in alveolar fricatives than

69

all other vowels. In the case of glottal fricative /h/, the short high back vowel /u/

(mean =935 Hz) introduced a significantly lower spectral peak frequency only when

compared to /i/ and /i:/ (mean =2243 Hz and 2363 Hz respectively). Although

the frequency of the spectral peak location of /sQ/ in the context of /u/ was about

2396 Hz lower than that of /a, i/, such a difference was only marginally significant

(p = 0.051).

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

/ i // u // a /


Spect

ralpeak

loca

tion

(Hz)


Figure 5–4. Mean frequency of spectral peak location as a function of place andshort vowels

5.2 Spectral Moments

The first four statistical moments were computed from three 40 ms windows

located at the onset, middle, and offset of the frication and from a 40 ms window

centered at the fricative offset to capture any transitional information into the

vowel. In this section, two analyses are presented for each moment. Specifically, to

capture the general trend of spectral moments, separate one-way ANOVAs were

70

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

/ i: // u: // a: /


Spect

ralPeak

Loca

tion

(Hz)


Figure 5–5. Mean frequency of spectral peak location as a function of place andlong vowels

71

conducted for place and voice with moments across window locations as dependent

variables. Additionally, a preliminary one-way ANOVA test of differences between

moments computed at different windows showed a main effect for window location

for all moments. Therefore, separate three-way ANOVAs (place × vowel × voicing)

with subsequent Bonferroni post hoc tests were conducted for each moment and

window location combination. A summary of the spectral moments collapsed across

speakers, vowel context, and window locations are presented in Table (5–2).

5.2.1 Spectral Mean

One-way ANOVAs for place and voicing were carried out utilizing spectral

mean measurements across the four window locations as the dependent

variable. The ANOVA revealed a main effect for Place of articulation

[F (8, 2487) = 210.567, p < 0.001; η2 = 0.403]. Subsequent Bonferroni post

hoc tests were conducted for voiceless and voiced fricatives separately. For voiced

fricatives, spectral mean was highest for alveolar /z/ (5935 Hz) and lowest for

pharyngeal /Q/ (1547 Hz). Differences in spectral means for all contrasts within

voiced fricatives were significant, with the exception of the contrast between plain

dental /D/ and its pharyngealized counterpart/DQ/. As for voiceless fricatives,

alveolar /s/ had the highest spectral mean (5546 Hz), while glottal /h/ had the

lowest (2513 Hz). Also, with the exception of the nonsibilants (/f, T/), spectral

mean tends to decrease as the fricative articulation moves towards the back

of the mouth. Additionally, as was the case in spectral peak location (Section

5.1), three categories containing fricatives articulated in adjacent places (/f, T,

s, sQ/, /S, X/ and /Q, h/) were observed to have no within-group differences that

were statistically significant. Only comparisons involving members of different

groups were significant. The only exception to this general observation was with

the first group in which the contrast between labiodental /f/ (4802 Hz) and

alveolar /s/ (5546 Hz) was significant. A main effect was also obtained for Voicing

72

Tab

le5–

2.Spec

tral

mom

ents

for

pla

cean

dvo

ice

aver

aged

acro

ssal

lw

indow

loca

tion

s.

Pla

ceSpec

tral

Mea

nV

aria

nce

Ske

wnes

sK

urt

osis

ofA

rtic

ula

tion

(Hz)

(MH

z)Lab

ioden

tal

Voi

cele

ss48

025.

970.

702.

96

Den

tal

Voi

ced

3999

6.91

0.65

1.15

Voi

cele

ss52

665.

990.

250.

7246

336.

450.

450.

93

Alv

eola

rVoi

ced

5935

5.26

-0.0

60.

74Voi

cele

ss55

464.

390.

441.

0557

404.

830.

190.

89Pos

t-A

lveo

lar

Voi

cele

ss38

883.

611.

332.

38

Uvula

rVoi

ced

2396

4.38

1.79

6.48

Voi

cele

ss36

524.

401.

363.

9730

244.

391.

575.

23

Phar

ynge

alVoi

ced

1547

1.46

2.25

13.6

9Voi

cele

ss25

222.

452.

429.

7920

341.

962.

3411

.74

Phar

ynge

aliz

edD

enta

lV

oice

d39

107.

450.

842.

10

Phar

ynge

aliz

edA

lveo

lar

Voi

cele

ss52

574.

390.

691.

51

Glo

ttal

Voi

cele

ss25

134.

431.

764.

56

73

[F (1, 2494) = 59.025, p < 0.001; η2 = 0.023]. Collapsed across all speakers, place

and vowel contexts, voiceless fricatives had higher values for spectral mean (4181

Hz) than voiced fricatives (3557 Hz).

As mentioned above, values for spectral mean measured at different window

locations were statistically different [F (3, 2492) = 326.978, p < 0.001; η2 = 0.28].

Therefore, separate three-way ANOVAs (place × vowel × voicing) were carried

out for spectral mean at each window location. There was a main effect for place

of articulation for all window locations with η2 values of 0.736 (window 1), 0.830

(window 2), 0.790 (window 3) and 0.602 (window 4). The range of η2 indicates

that spectral information measured at these windows contributed with varying

degrees to the separation of fricatives according to their place of articulation. This

observation was confirmed by post hoc tests for differences performed on voiced

and voiceless fricatives separately. For voiced fricatives, across all windows, alveolar

fricative /z/ had the highest spectral mean while pharyngeal /Q/ had the lowest.

Additionally, spectral mean distinguished between all places of voiced fricatives in

all windows, with the exception of the contrasts between (/D/ and /DQ/) in the first

three windows and between any combination of (/K/, /Q/ and /DQ/) in the fourth

window (Figure 5–6). On the other hand, differences between voiceless fricatives

in terms of spectral mean measured at different windows were not as categorically

distinguishing as in the case of voiced fricatives. Nevertheless, as noted above,

three clusters containing fricatives articulated in adjacent places (/f, T, s, sQ/, /S, X/

and /è, h/) emerged as distinct groups for which no within-group differences were

significant with regard to spectral mean measured at the second (middle) and third

(offset) windows. However, all comparisons between members of different groups

were significant with spectral mean decreasing as the articulation moved backwards

in the mouth (Figure 5–6). Furthermore, spectral mean as measured at the first

(onset) window significantly differentiated between all places with the exception

74

of all possible contrast involving (/T, s, sQ/) and the contrast between (/è- h/).

Only alveolar /s/ was significantly different than all other voiceless fricatives at

the fourth (transitional) window. Moreover, at the onset and transitional windows,

differences observed elsewhere between /f/ and /T/ were not significant (Figure

5–6).

There was also a main effect for Voicing in all four windows. As can be seen

from Figure (5–7), spectral mean for voiceless fricatives was significantly higher

than voiced fricatives in the first three windows and significantly lower at the last

(transitional) window. Additionally, a significant Place by Voicing interaction

(Figure 5–8) revealed that alveolar fricatives /s, z/ were not significantly different

from each other in terms of spectral mean in all but the fourth window at which

the /s - z/ contrast was the only one reaching significance level (p < 0.05).

Finally, there was a main effect for Vowel context at all four windows. Spectral

mean was highest for fricatives preceding /i/ and /i:/, and lowest for fricatives

preceding either /u/ or /u:/. Pairwise comparisons for the different vowel contexts

at each window showed that the difference between any of the high front vowels (/i,

i:/) and either of /u/ and /u:/ was significant at all window locations. Additionally,

spectral mean of fricatives in the context of both /i, i:/ was significantly higher

than that in the context of either /a, a:/ at the fourth (transitional) window

(Figure 5–9).

5.2.2 Spectral Variance

One-way ANOVAs for Place and Voice were conducted with spectral variance

averaged across all window locations. A main effect for Place of articulation was

obtained [F (8, 2487) = 206.936, p < 0.001; η2 = 0.399], with the lowest variance

observed for sibilants and back articulated fricatives while the highest variance

was observed for nonsibilants. Table (5–2) shows mean variance values for all

fricatives measured in Megahertz (MHz). Bonferroni post hoc tests showed that

75

A LabiodentalB DentalC AlveolarD Post-AlveolarE UvularN Pharyngeal

G Pharyngealized

Dental

H Pharyngealized

AlveolarM Glottal


2000

3000

4000

5000

6000

7000

Spectr

al M

ean (

Hz)

B

B B

B

C

CC

C

E

EE

EN N N N

G

GG

G

Window Location

2000

3000

4000

5000

6000

7000

Spectr

al M

ean (

Hz)

A

A

A

A

B

B

B

B

C

C

C

C

D D

D

D

E

E

E

E

N N N

N

H

H

H

HM M M

M

onset middle o!set transition

A

B

Spect

ralM

ean

(Hz)

Spect

ralM

ean

(Hz)

Window Location

onset middle offset transition

Figure 5–6. Spectral mean (Hz) averaged across vowel contexts for each window asa function of place of articulation. A) voiced. B) voiceless.

76

0

1000

2000

3000

4000

5000

6000

1 2 3 4Window Location

Spectr

al M

ean (

Hz)

Voiced

Voiceless

Window Locationonset middle offset transition

Figure 5–7. Spectral mean (Hz) averaged across place and vowel contexts for eachwindow as a function of voicing.

77

0

2000

4000

6000

8000


Spec

tral

Mea

n (H

z)

0

2000

4000

6000

8000


Spec

tral

Mea

n (H

z)

voicedvoicceless

A B

0

2000

4000

6000

8000


Spec

tral

Mea

n (H

z)

0

2000

4000

6000

8000


Spec

tral

Mea

n (H

z)

C D

Figure 5–8. Place of articulation and voicing interaction for spectral mean at fourwindow locations. A) onset, B) middle, C) offset, and D) transition.

78

0

2000

4000

6000

/ i / / u / / a /

Spec

tral

Mea

n (H

z)

0

2000

4000

6000

/ i / / u / / a /Sp

ectr

al M

ean

(Hz)

short long

A B

0

2000

4000

6000

/ i / / u / / a /

Spec

tral

Mea

n (H

z)

0

2000

4000

6000

/ i / / u / / a /

Spec

tral

Mea

n (H

z)

C D

Figure 5–9. Spectral mean as a function of vowel context at four window locations.A) onset, B) middle, C) offset, and D) transition.

79

within voiced fricatives, spectral variance did not differentiate between plain dental

(/D/) and its pharyngealized counterpart (/DQ/). However, all other comparisons

within voiced fricatives were significant (p < 0.001). As for voiceless fricatives,

spectral variance for the nonsibilants /f, T/ was significantly higher than those of

all other places. However, spectral variance for the /f/ and /T/ themselves was not

significantly different. Moreover, spectral variance for /S/ and /è/ was significantly

lower than that of all other places. Another main effect was observed for Voicing

[F (1, 2494) = 39.778, p < 0.001; η2 = 0.016] with voiced fricatives having higher

variance (5.09 MHz) than voiceless fricatives (4.45 MHz).

Since a one-way ANOVA showed that overall spectral variance differed

significantly as a function of Window Location [F (3, 2492) = 33.742, p <

0.001; η2 = 0.04], multiple three-way ANOVAs (place × vowel × voicing) were

carried out for spectral variance at each window location. The ANOVAs revealed a

main effect for Place of Articulation [F (8, 561) = 104.502 (onset), 98.597 (middle),

137.024 (offset), 55.05 (transition); p < 0.001; η2 = 0.6 (onset), 0.58 (middle),

0.66 (offset), 0.44 (transition)]. As apparent from Figure (5–10), for both voiced

and voiceless fricatives, nonsibilants (/f, T, D, DQ/) had the highest variance while

pharyngeal fricatives (/è, Q/) had the lowest variance. Pairwise comparisons

within voiced fricatives showed that only the difference between /D - DQ/ was not

significant at all windows. With the exception of the /D - DQ/ contrast, spectral

variance differentiated between all places of articulation within voiced fricatives

at all window locations. On the other hand, spectral variance did not differentiate

between voiceless fricatives in the same manner as it did with voiced fricatives.

Specifically, spectral variance was able to distinguish between any combination

of voiceless fricatives either at the second or the third window (Figure 5–10).

The only exceptions are the expected lack of difference between /s, sQ/ and the

insignificant difference between /h, sQ/ at all windows. Additionally, as with voiced

80

fricatives, nonsibilant fricatives (/f, T/) had significantly higher variance than all

other voiceless fricatives in at least three of the four analysis windows.

As mentioned previously, a main effect of Voicing was observed with the

overall spectral variance. However, ANOVA’s conducted for individual windows

revealed that such effect was only present at the second (middle) window

[F (1, 561) = 9.973, p < 0.001; η2 = 0.017] with the expected increase in variance

for voiced fricatives (5.4 MHz compared to 4.5 MHz for voiceless fricatives).

Nevertheless, a significant Place by Voicing interaction was present at all analysis

windows. Bonferroni post hoc tests showed that the increase in spectral variance for

voiced fricatives as compared to voiceless fricatives was significant only for dentals

(/T, D/) at the second window; and for alveolars (/s, z/) at fourth window. Another

source of the interaction, as can be seen from Figure (5–11), is due to an increase

in spectral variance for voiceless, rather than voiced, pharyngeal fricatives. Such an

increase, and subsequent shift in the voicing effect, was present at all windows but

significant only at the fricative-vowel boundary (windows three and four).

There was also a main effect for Vowel context (p < 0.0001) in all but the first

analysis window. The source for this effect as revealed by post hoc tests is twofold:

first, there was a significant increase in spectral variance for fricatives preceding

either /u/ or /u:/ as compared to all other vowels in the second (middle) and third

(offset) windows (Figures 5–12A and B); and second, the variance of fricatives

preceding /i/ and /i:/ was significantly higher than that of either /a/ or /a:/ in the

fourth window (Figure 5–12C).

5.2.3 Spectral Skewness

A one-way ANOVA for spectral skewness across all window locations showed a

significant main effect for Place [F (8, 2487) = 137.975, p < 0.001; η2 = 0.31], with

skewness ranging from 2.34 for pharyngeal (/è, Q/) to 0.19 for alveolar fricatives

(/s, z/). Subsequent Bonferroni post hoc tests indicated that for both voiced and

81


G Pharyngealized

Dental

H Pharyngealized

AlveolarM Glottal

Place of Articulation2

4

6

8

Spectr

al V

ari

ance (

MH

z)

BB B

BC

CC

C

E

EE

E

N N NN

GG

G

G

Window Location

2

4

6

8

Spectr

al V

ari

ance (

MH

z)

A A

A

A

B

B

B

BC

C

C

C

D DD

D

E

E

E

EN

N

N N

H

H HH

M M M

M


A

B

Spect

ralV

ari

ance

(MH

z)Spect

ralV

ari

ance

(MH

z)

Window Location


Figure 5–10. Spectral variance (MHz) averaged across vowel contexts for eachwindow as a function of place of articulation. A) voiced. B) voiceless.

82

0

1

2

3

4

5

6

7

8


Spec

tral

Var

ianc

e (M

Hz)

0

1

2

3

4

5

6

7

8

Dental Alveolar Uvular PharyngealSp

ectr

al V

aria

nce

(MHz

) voiced

voiceless

A B

0

1

2

3

4

5

6

7

8


Spec

tral

Var

ianc

e (M

Hz)

0

1

2

3

4

5

6

7

8


Spec

tral

Var

ianc

e (M

Hz)

C D

Figure 5–11. Place of articulation and voicing interaction for spectral variance atfour window locations. A) onset, B) middle, C) offset, and D)transition.

83

0

1

2

3

4

5

6

7

/ i / / u / / a /

Spec

tral

Var

ianc

e (M

Hz)

Short Long

0

1

2

3

4

5

6

7

/ i / / u / / a /Sp

ectr

al V

aria

nce

(MHz

)

A B

0

1

2

3

4

5

6

7

/ i / / u / / a /

Spec

tral

Var

ianc

e (M

Hz)

C

Figure 5–12. Spectral variance as a function of vowel context at three windowlocations. A) middle, B) offset, and C) transition.

84

voiceless fricatives, skewness did not differentiate between plain fricatives and

their pharyngealized counterparts (/D - DQ, s - sQ/). However, besides the exception

noted above, all voiced fricatives were significantly different from each other in

terms of skewness (means are reported in Table (5–2). Within voiceless fricatives,

skewness significantly differentiated among nonsibilants /f/ and /T/ (0.7 and 0.25

respectively). However, skewness did not distinguish nonsibilants from either /s/

or /sQ/ or between /S/ and / X/. All other voiceless fricatives were significantly

different from each other in terms of spectral skewness. The effect of voicing on

spectral skewness was not significant (p = 0.67).

Due to the previously mentioned significant differences between skewness

measured at different windows [F (3, 2492) = 145.382, p < 0.001; η2 = 0.15], a

three-way ANOVA (place × vowel × voicing) was conducted for spectral skewness

at each window location. A main effect for Place was obtained at all window

locations. With the exception of /D - DQ/ contrast, pairwise comparisons showed

that all voiced fricatives were significantly different from each other in term of

spectral skewness at the second (middle) and third (offset) windows (Figure

5–13). Pharyngeal /Q/ had the highest skewness, indicating a concentration of

energy at frequencies lower than for all other voiced fricatives, while the negative

skewness obtained for /z/ indicates a concentration of energy at higher frequencies.

Interestingly the difference in skewness between dental and pharyngealized dental

(/D - DQ/) reached significance (p = 0.008) only at the fourth window located at

fricative-vowel transition (Table 5–3). The lack of a significant difference between

plain fricatives and their pharyngealized counterparts was also present for voiceless

fricatives /s - sQ/ at all window locations. As can be seen in Table (5–4), skewness

differentiated between all voiceless fricatives in at least two windows with the

notable exception of the /S - h/ contrast, which was significant only at the fourth

window (transition). If the number of places distinguished in term of skewness

85

A LabiodentalB

DentalC

AlveolarD

Post-AlveolarE

UvularN Pharyngeal

G Pharyngealized

Dental

H PharyngealizedAlveolar

M Glottal


-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

2.50

3.00

Skew

ness

B

B B

B

C

CC

C

E

E

E

EN N N N

G

G G

G

Window Location

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

2.50

3.00

Skew

ness

A

A

A

A

B

B

B

B

C

C

C

CDD

DD

E

EE

E

N NN

N

H

H

H

HM M M

M

A

B


Spect

ralSkew

ness

Spect

ralSkew

ness

Window Location


Figure 5–13. Spectral skewness averaged across vowel contexts for each window asa function of place of articulation. A) voiced. B) voiceless.

86

differences at a given window is used as an indicator to that window’s distinctive

spectral information, windows placed at the middle and offset of frication noise

were more successful in distinguishing between voiceless fricatives than others

(Tables 5–3 and 5–4).

Table 5–3. Window locations at which a difference between voiced fricatives interms of spectral skewness are significant.

/D/ /z/ /K/ /Q//z/ 1 2 3 4/K/ 1 2 3 4 1 2 3 4/Q/ 1 2 3 4 1 2 3 4 φ 2 3 φ/DQ/ φ φ φ 4 1 2 3 4 1 2 3 φ 1 2 3 φ

φ indicates absence of significant differences

Table 5–4. Window locations at which a difference between voiceless fricatives interms of spectral skewness are significant.

/f/ /T/ /s/ /S/ /X/ /è/ /sQ//T/ 1 φ φ 4/s/ φ 2 φ 4 1 φ φ 4/S/ 1 2 3 4 1 2 3 φ 1 2 3 φ/X/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ 3 4/è/ 1 2 3 φ 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 φ/sQ/ φ 2 φ 4 1 2 3 φ φ φ φ φ 1 2 3 φ φ 2 φ 4 1 2 3 4/h/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ φ 4 φ 2 3 φ 1 2 3 φ 1 2 3 4

φ indicates absence of significant differences

Although the effect of voicing was not significant for the overall skewness, a

main effect for Voicing was obtained at all but the third (offset) window. At both

frication onset and middle windows, voiceless fricatives had significantly (p < 0.001)

lower skewness than voiced fricatives; while skewness measured at the fricative-

vowel transition was significantly (p < 0.0001) higher for voiceless fricatives than

voiced ones (Figure 5–14). Also, a Place by Voicing interaction was significant

at all but the last (transition) window. In general, the reduction in skewness for

voiceless fricatives when compared to voiced fricatives as noted in the main effect

above was reversed for alveolar and pharyngeal fricatives in the first three windows;

and for all fricatives in the fourth window (Figure 5–15). However, this increase in

87

skewness for voiceless fricatives was only significant (p < 0.05) for alveolar fricatives

at the fourth (transition) window.

0

0.5

1

1.5

2

2.5

1 2 3 4

Window Location

Spectr

al Skew

ness

Voiced Voiceless

Window Location

Spect

ralSkew

ness

0

0.5

1

1.5

2

2.5


Figure 5–14. Spectral skewness averaged across place and vowel contexts for eachwindow as a function of voicing.

The ANOVAs also revealed a main effect of Vowel context at all window

locations. The magnitude of the effect becomes larger as the window moves closer

to the vowel (η2 = 0.028 at frication mid-piont, 0.037 at frication offset and 0.31 at

fricative-vowel transition). The source of such effect, as illustrated in Figure (5–16)

and associated Bonferroni post hoc tests, is attributed to the significant decrease

in fricative skewness in the context of short /i/ and long /i:/. Specifically, long

/i:/ resulted in significantly lower skewness than long /u:/ in all but the second

window, while short /i/ resulted in significantly lower skewness than short /u/

in the first and fourth windows. Additionally, differences between high front and

88

-1

-0.5

0

0.5

1

1.5

2

2.5

3


Spec

tral

Ske

wne

ss

-1

-0.5

0

0.5

1

1.5

2

2.5

3


ectr

al S

kew

ness

voiced voiceless

A B

-1

-0.5

0

0.5

1

1.5

2

2.5

3


Spec

tral

Ske

wne

ss

-1

-0.5

0

0.5

1

1.5

2

2.5

3


Spec

tral

Ske

wne

ss

C D

Figure 5–15. Place of articulation and voicing interaction for spectral skewness atfour window locations. A) onset, B) middle, C) offset, and D)transition.

89

low front vowels (/i, i:/ and /a, a:/) were significant only at the transition window

(Figure 5–16D).

0

0.2

0.4

0.6

0.8

1

1.2

1.4

/ i / / u / / a /

Spec

tral

Ske

wne

ss

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

/ i / / u / / a /

Spec

tral

Ske

wne

ss

shortlong

A B

0

0.2

0.4

0.6

0.8

1

1.2

/ i / / u / / a /

Spec

tral

Ske

wne

ss

0

0.5

1

1.5

2

2.5

3

/ i / / u / / a /

Spec

tral

Ske

wne

ss

C D

Figure 5–16. Spectral skewness as a function of vowel context at four windowlocations. A) onset, B) middle, C) offset, and D) transition.

5.2.4 Spectral Kurtosis

One-way ANOVAs testing for effects of place and voice with spectral kurtosis

measurements across the four windows as the dependent variable revealed a main

effect of Place [F (8, 2487) = 99.567, p < 0.001; η2 = 0.24]. Bonferroni post

hoc tests conducted on voiced fricatives showed that only kurtosis of uvular /K/

(6.5) and pharyngeal /Q/ (13.7) were significantly higher than all other voiced

90

fricatives. As for within voiceless fricatives, kurtosis significantly differentiated

between the nonsibilants /f/ and /T/ with a mean of 2.96 and 0.72 respectively.

Moreover, pharyngeal /è/ with kurtosis of 9.8 was significantly higher than all

other voiceless fricatives. The ANOVA also revealed a main effect of Voicing

[F (1, 2494) = 22.922, p < 0.001; η2 = 0.01] in which voiceless fricatives

had significantly lower kurtosis than voiced fricatives (mean of 3.376 and 4.83

respectively).

A one-way ANOVA showed that kurtosis differed significantly as a function

of Window location [F (3, 2492) = 67.968, p < 0.001; η2 = 0.076], with the

fourth (transition) window registering the highest values for kurtosis. Therefore, a

three-way ANOVA (place × vowel × voicing) was conducted for spectral kurtosis

at each window location. The results of the three-way ANOVAs showed a main

effect of Place at all window locations. With the exception of the fourth window,

the magnitude of the effect becomes larger as the window advances towards the

fricative-vowel boundary (η2 of the first three windows was 0.34, 0.46 and 0.51

respectively). Subsequent Bonferroni post hoc tests at each window were carried

out for voiced and voiceless fricatives separately (Figure 5–17). Within voiced

fricatives, no significant differences were observed with all possible contrasts

between /D, DQ, z/ at all windows with the exception of the /DQ - z/ contrast,

which reached significance level (p < 0.05) at the fourth window only. Moreover,

while kurtosis of pharyngeal /Q/ was significantly higher than uvular /K/ in

all but the last (transition) window, each of the two fricatives had significantly

higher (p < 0.01) kurtosis than all other voiced fricatives in the first and third

window. A similar pattern was also observed with voiceless fricatives. Specifically,

voiceless pharyngeal fricative /è/ had significantly higher kurtosis than all other

voiceless fricatives in the second (mean =11.6) and third analysis windows (mean

=10.8). Also, as was the case with /D - DQ/ contrast, no difference was obtained

91


G Pharyngealized

Dental

H Pharyngealized

AlveolarM Glottal


0

5

10

15

Kurt

osis

BB

B

B

C C CC

E

E

E

EN

N N

N

G GG

G

Window Location

0

5

10

15

Kurt

osis

A AA

A

B BB

B

CC

C

CDD

D

D

E

EE

E

NN

N

N

HH H

H

M M M

M

A


B

Spect

ralK

urt

osi

sSpect

ralK

urt

osi

s


Figure 5–17. Spectral kurtosis averaged across vowel contexts for each window as afunction of place of articulation. A) voiced. B) voiceless.

92

between plain alveolar /s/ and its pharyngealized counterpart /sQ/ at all windows.

Additionally, while kurtosis of glottal /h/ was significantly lower than that of

pharyngeal /è/ at all windows, it was significantly higher than kurtosis of /S/

in the fourth window and significantlly higher than all other remaining voiceless

fricatives in the second and third windows (Figure 5–17).

A main effect of Voicing was also obtained at all but the fourth window.

Similar to the effect observed with the overall kurtosis, voiceless fricatives in the

aforementioned windows had significantly lower kurtosis than voiced fricatives

(Figure 5–18). The size of this effect was rather small and generally decreased

in the middle window (η2 of the first three windows was 0.05, 0.03 and 0.06

respectively). Moreover, a Place by Voicing interaction was also significant at the

first three windows. Basically, as suggested by the corrosponding post hoc tests

shown in Figure (5–19), the effect of voicing was significant (p < 0.05) for uvulars

/K, X/ at frication onset, for pharyngeals /è, Q/ at the middle of frication noise and

for both uvular and pharyngeal places of articulation at the frication offset.

Finally the effect of vowel context was observed only at the edges of the

frication noise: frication onset [F (5, 561) = 3.068, p < 0.001; η2 = 0.03]; and

transition into the vowel [F (5, 561) = 17.406, p < 0.001; η2 = 0.134]. Subsequent

Bonferroni post hoc tests carried out at these windows showed that the source of

the main effect is due to the significant decrease in kurtosis for a fricative preceding

/i:/ as compared only to /u/ at the onset window (Figure 5–20A); and due to

the greater decrease in kurtosis for fricatives preceding short /i/ and long /i:/

as compared to all other vowels at the transition window (Figure 5–20B). The

difference between long /i:/ and long /u:/ was marginally significant (p = 0.056) at

the onset window.

93

0

1

2

3

4

5

6

7

8

9

1 2 3 4Window Location

Spectr

al Kurt

osis

voiced voiceless

Spect

ralK

urt

osi

s

0

1

2

3

4

5

6

7

8

9


Figure 5–18. Spectral kurtosis averaged across place and vowel contexts for eachwindow as a function of voicing.

94

-2

0

2

4

6

8

10

12

14

16


Spec

tral

Kur

tosi

s

-2

0

2

4

6

8

10

12

14

16


ectr

al K

urto

sis

voiced voiceless

A B

-2

0

2

4

6

8

10

12

14

16


Spec

tral

Kur

tosi

s

C

Figure 5–19. Place of articulation and voicing interaction for spectral kurtosis atfour window locations. A) onset, B) middle, and C) offset.

95

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

/ i / / u / / a /

Spec

tral

Kur

tosi

s

0

2

4

6

8

10

12

/ i / / u / / a /

Spec

tral

Kur

tosi

s

short long

A B

Figure 5–20. Spectral kurtosis as a function of vowel context at two windowlocations: A) onset and B) transition.

CHAPTER 6FORMANT TRANSITION

This chapter reports on acoustic measurements related to spectral information

at the fricative-vowel transition that might help distinguish between the different

places of fricative articulation. The first measurement reported is the frequency of

the second formant (F2) measured in Hertz from a 25-ms kaiser window placed at

the vowel onset. The second measurement is the coefficients of regression line fits

with scatterplots of F2 at the vowel’s onset (y-axes) and mid-point (x-axes) derived

for each place and speaker and averaged across voicing and vowel context.

6.1 Second Formant (F2) at Transition

Table (6–1) presents the F2 values at the onset of the vowel for each place of

articulation and voicing, averaged across speakers and vowel context. The results of

a three-way ANOVA (place × voicing × vowel) showed a significant main effect for

Place of articulation [F (8, 561) = 97.988, p < 0.0001; η2 = 0.58]. Subsequent post

hoc tests were carried out separately on voiced and voiceless fricatives. For both

voiced and voiceless fricatives, pharyngealized fricatives (/DQ/ 1164 Hz and /sQ/

1288 Hz) had significantly lower F2 frequencies than their plain counterparts (/D/:

1603 Hz and /s/: 1636 Hz). In fact, within voiced fricatives /DQ/ had a significantly

lower frequency than all voiced fricatives with the exception of uvular /K/. While

upholding the lack of significance between /DQ - K/, voiced uvular /K/ also had

a significantly lower F2 frequency (1171 Hz) than all other voiced fricatives. No

other contrasts within voiced fricatives were statistically significant.

A similar pattern was also observed within voiceless fricatives. Specifically, as

was the case for voiced fricatives, there was a lack of significant difference between

pharyngealized and uvular fricatives (/sQ - X/ in this case), and between dental and

96

97

alveolar fricatives (/T - s/). Moreover, the F2 frequencies of both pharyngeal /è/

and glottal /h/ were statistically similar to /f/, /T/ and /s/ (means are reported in

Table (6–1)). Additionally, no significant difference was obtained between uvular

and pharyngeal (/X - è/). All other contrasts between voicless fricatives were

significant (p < 0.05 for within non-sibilants and p < 0.0001 for other contrasts).

Table 6–1. Mean values of F2 (Hz) at transition averaged across speakers andvowel context as a function of place and voicing.

Place of Articulation F2 at transition (Hz) meanLabiodental Voiceless 1496

Dental Voiced 1603Voiceless 1602

1602Alveolar Voiced 1633

Voiceless 16361634

Post-Alveolar Voiceless 1742

Uvular Voiced 1171Voiceless 1325

1248Pharyngeal Voiced 1555

Voiceless 15891572

Pharyngealized Dental Voiced 1164

Pharyngealized Alveolar Voiceless 1288

Glottal Voiceless 1565

The ANOVA also revealed a main effect of Voicing [F (1, 561) = 9.145, p <

0.005; η2 = 0.016], with voiceless fricatives registering higher F2 frequencies than

voiced fricatives (mean 1530 and 1425 respectively). However, a significant Place

by Voicing interaction [F (3, 561) = 5.337, p < 0.002; η2 = 0.028] and subsequent

Bonferroni post hoc tests (Figure 6–1) showed that such effect was limited to uvular

fricatives.

98

1000

1100

1200

1300

1400

1500

1600

1700


F2

at

Vow

el

On

set

(Hz)

voiced

voiceless

Figure 6–1. Place of articulation and voicing interaction for F2 (Hz) measured atvowel onset.

99

There was also a main effect of Vowel context [F (5, 561) = 221.237, p <

0.0001; η2 = 0.66]. As expected, F2 (measured at the onset of high front vowels /i,

i:/ with mean frequency of 1708 and 1919 Hz respectively) were significantly higher

than all other vowels (p < 0.0001). Also, the F2 frequencies of back vowels (/u, u:/

with means of 1209 and 1259 Hz respectively) were significantly lower than those of

all other vowel contexts (p < 0.0001). The mean frequency of F2 at /a/ onset was

1435 Hz and 1409 Hz for /a:/. The effect of vowel length on F2 frequency was not

significant except for the /i -i:/ contrast, for which long vowels introduced higher

F2 frequencies.

0

500

1000

1500

2000

2500

/ i / / a / / u /

F2 a

t V

ow

el

On

set

(Hz)

short long

Figure 6–2. F2 (Hz) measured at vowel onset as a function of vowel context.

100

6.2 Locus Equation

Locus equation coefficients for every place of articulation were obtained for

each of the eight speakers in our study (8 speakers × 9 places of articulation).

Specifically, a linear regression fit was applied on scatterplots with F2 values

averaged across all vowel contexts. Each scatterplot had F2 measured at the onset

of the vowel represented on the y-axes and F2 measured at the mid-point of the

vowel represented on the x-axes. The coefficients of each regression line (the slope

‘k’ and the y-intercept ‘c’) were taken to be the terms of locus equations. An

example plot is presented in Figure (6–3).

y = k x + cy = 0.5837 x + 666.25

0

500

1000

1500

2000

2500

0 500 1000 1500 2000 2500

F2 Frequency (Hz) at Vowel mid-point

F2 F

req

uen

cy (

Hz)

at

Vow

el o

nse

t

Figure 6–3. An example of a scatterplot to derive coefficients of locus equation.

Table (6–2) presents mean slope and y-intercept values for each place of

articulation averaged across vowel contexts. A one-way ANOVA for slope showed

101

a main effect for Place of Articulation [F (8, 63) = 15.092, p < 0.001; η2 = 0.66].

Pharyngealized fricatives had the lowest slope (0.168 for /DQ/ and 0.399 for /sQ/),

while glottal /h/ had the highest (mean slope of 0.924). However, post hoc tests

revealed that the slope for pharyngealized dental /DQ/ was significantly different

from all other plain (non-pharyngealized) fricatives. Furthermore, the high slope

of /h/ was significantly different from all other fricatives with the exception

of uvular fricatives /X, K/. The slope of pharyngealized alveolar /sQ/ was only

significantly different from uvular fricatives. No other contrasts were significant.

On the other hand, a one-way ANOVA for y-intercept revealed a main effect for

place [F (8, 63) = 10.313, p < 0.001; η2 = 0.57]. Glottal /h/ and uvular fricatives

/X, K/ had the lowest y-intercept values (160 and 289 Hz respectively), while the

highest y-intercept value was observed for post-alveolar fricative /S/ (956 Hz).

Although no significant differences between y-intercept of /h/ and /X, K/ were

observed, Bonferroni post hoc tests showed that y-intercept for /h/ was significantly

lower than all other places of articulation. Additionally, the y-intercept values for

uvular fricatives were significantly lower than all other places of articulation with

the exception of labiodental and pharyngeal fricatives (/f/ and /Q, è/). No other

significant differences were obtained.

Table 6–2. Mean slope and y-intercept values for each place of articulationaveraged across vowel contexts.

Placeslope y-intercept

of ArticulationLabiodental 0.565 652

Dental 0.507 825Alveolar 0.451 930

Post-Alveolar 0.502 956Uvular 0.692 289

Pharyngeal 0.579 665Pharyngealized Dental 0.168 938

Pharyngealized Alveolar 0.399 751Glottal 0.925 160

CHAPTER 7STATISTICAL CLASSIFICATION OF FRICATIVES

Discriminant Function Analysis (DFA) was used to determine the most

parsimonious way to distinguish among the different places of articulation using the

acoustic cues investigated in our study (descriptive DFA). Furthermore, DFA was

used here to assess the contribution of each selected cue to the overall classification

of fricatives into their places of articulation. Also, to get a more realistic indication

of the use of these cues in distinguishing unknown tokens, a cross-validation

method was used with the obtained discriminant functions (predictive DFA).

All acoustic variables investigated in our study were used in the DFA procedure

with the exception of locus equations since they do not reflect measures of single

tokens, but rather the coefficients of linear regression fits on aggregated data points

representing places of articulation for each speaker.

7.1 Discriminant Function Analysis

Discriminant function analysis is a statistical procedure that classifies tokens

into two or more mutually exclusive a priori groups (i.e., place of articulation)

using a set of predictors (i.e., acoustic cues) (Klecka 1980; Hair, Anderson,

and Tatham 1987; Stevens 2002). A discrimination function consists of a linear

combination of one or more variables that maximizes the distance (i.e., differences)

between the groups being classified. In our study, for both descriptive and

predictive DFA, predictors were entered into the analysis using a step-wise method

in which only the predictor that minimized Wilks’ Lambda (Λ) statistic, also known

as U-statistic, would be entered at any given step. The criteria for entry was set

at p = 0.05 and at p = 0.10 for removal. Also, since the levels of the dependent

variables (i.e., places of articulation) have unequal numbers of cases due to lack of

102

103

voicing contrast in some places, the prior probabilities for group membership were

calculated from the group size (Table 7–1).

Table 7–1. Prior probabilities for group membership

Cases UsedPlace Prior in AnalysisLabiodental 0.077 48Dental 0.154 96Alveolar 0.154 96Post-Alveolar 0.077 48Uvular 0.154 96Pharyngeal 0.154 96Pharyngealized Dental 0.077 48Pharyngealized Alveolar 0.077 48Glottal 0.077 48Total 1 624

The number of discriminant functions obtained by the DFA procedure is the

smallest of (g − 1), where g is the number of groups, or (k), where k is the number

of predictors. In our study the number of discriminant functions obtained was

eight and all were significant (p < 0.001). Table (7–2) shows the percentage of

variance accounted for by each of the eight functions. Although all functions were

significant, we limited our interpretation to the first three functions since they were

the ones contributing the most to the accumulative variance as inferred from their

eigenvalues and the canonical correlation associated with these functions (Table

7–2).

7.2 Classification Accuracy of DFA

Before interpreting the classification results obtained from DFA procedure,

an assessment of the validity of the current model and its accuracy was carried

out. For any classification method, a certain percentage of any performance can be

attributed solely to random chance. Therefore, for the current classification model

derived from DFA to be valid, it needs to classify cases in a manner better than

if the classification was done based on chance. Since the group sizes are unequal

104

Table 7–2. The amount of the variance accounted for by each of the functionscalculated by the DFA.

Function Eigenvalue % of Variance Cumulative % Canonical Correlation1 5.224 43.0 43.0 0.9162 3.651 30.1 73.1 0.8863 1.894 15.6 88.7 0.8094 0.470 3.9 92.5 0.5665 0.387 3.2 95.7 0.5286 0.244 2.0 97.7 0.4437 0.177 1.5 99.2 0.3888 0.098 0.8 100.0 0.298

in our study, the determination of the chance classification were done using two

criteria: the proportional chance criterion (Cpro) and maximum chance criterion

(MCC) (Hair et al. 1987). The proportional chance criterion is a measure of the

average probability of classification calculated considering all group sizes, while the

MCC is the percentage of the total sample represented by the largest group. Given

the total number of cases and groups in our study, MCC was estimated to be 15.4%

and Cpro to be 12.4%. However, both measures serve only as subjective reference

points for model accuracy. In fact, there is no general consensus on how high the

classification accuracy should be in relation to chance. However, Hair et al. (1987)

suggest that it should be at least one fourth greater than classification by chance.

Subsequently, the current model should achieve an overall classification rate higher

than 19.25% (1.25 × MCC) to be valid. Proportional and maximum chance criteria

were calculated as in Equations (7–1) and Equation (7–2), respectively, where N =

total number of cases, g = number of groups, n = number of cases in a group and

gmax = group with largest number of cases.

Cpro = 100×g∑

i=1

(ni

N

)2

(7–1)

MCC = 100× ngmax

N(7–2)

105

It is important to note that both proportional and maximum chance criteria

are subjective in nature. To circumvent this issue, Press’ Q statistic (Equation 7–3)

was used as an additional measurement of model accuracy. Significance of Press’ Q

statistic is assessed using a chi-square (χ2) distributed with one degree of freedom.

This value will be calculated below for both sets of classification results (descriptive

and predictive DFAs). The value ncorrect in Equation (7–3) denotes the number of

correctly classified cases.

Q =

(N −

(ncorrect × g

))2

N − (g − 1)(7–3)

7.3 Classification Power of Predictors

The standardized canonical function coefficients indicate the partial

contribution of each variable to the discriminant function(s), controlling for

other independents entered in the equation and are used to assess each independent

variable’s unique contribution to the discriminant function (Klecka 1980; Hair et al.

1987). Based on these coefficients, spectral mean (frication noise onset, middle,

and offset), skewness (onset, offset of frication and transition into the vowel),

second formant at vowel onset, normalized RMS amplitude and spectral peak

location were identified to be the variables contributing the most to the overall

classification.

7.4 Classification Results

As mentioned above, the first goal of DFA implementation in our study was

to find the degree to which the acoustic cues investigated here would successfully

classify fricatives. To that effect, DFA revealed that 83.2% of the original grouped

cases were successfully classified into their respective places of articulation using

discriminant functions derived from the acoustic measurements investigated in our

study. Furthermore, when the data was split into voiced and voiceless subgroups,

106

the overall classification accuracy was 92.9% for voiced and 93.5% for voiceless

fricatives. This classification ratio exceeded both the maximum likelihood and the

proportional chance value. Additionally, the Press’s Q statistic (Q = 17.99) was

significant at 0.0001. Therefore, it can be concluded that the model investigated

was valid. In general, three groups can be identified using a two-dimensional

discrimination plane (Figure 7–1 and Figure 7–2).

A leave-one-out (also known as jackknife) classification procedure was also

used to cross-validate the discrimination functions derived above. In this procedure,

the data was split into two sets with discrimination functions obtained from all-

but-one subjects (training set) and then used to classify the cases of the remaining

subject (testing set). The procedure was repeated until each speaker was included

in the testing phase. The overall performance of the discrimination function was

taken to be the averaged score across all speakers. An overall correct classification

ratio of 79.3% was obtained using the cross-validation method outlined above.

When voicing was specified in the model, cross-validated correct classification ratios

of 87.9% and 89.8% were obtained for voiced and voiceless fricatives respectively.

Both procedures satisfy the criteria mentioned in Section (7.2) for model validity

(Cpro, MCC and Press’ Q).

The confusion matrices presented in Tables (7–3) to (7–8) show the percentage

of predicted class membership in terms of the fricative place of articulation.

Numbers in boldface represent correct classification rates while other numbers

represent misclassification rates. Generally speaking, DFA clustered the nine places

of fricative articulation into three groups: non-sibilants (/f, T, D, DQ/), sibilants (/s,

sQ, z, S/ and back-articulated fricatives (/K, X, è, Q, h/) with misclassification rarely

crossing the boundaries of these groups. Such observation was true even when

fricatives are partitioned according to voicing.

107

Table 7–3. Overall classification results of all fricatives.

Predicted Group MembershipPlace /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h//f/ 88 10 0 0 0 2 0 0 0/T, D/ 6 76 7 0 0 0 3 0 7/DQ/ 0 2 88 2 0 0 6 0 2/s, z/ 0 2 0 89 6 2 1 0 0/sQ/ 0 0 0 17 83 0 0 0 0/S/ 0 0 0 0 0 98 2 0 0/X, K/ 0 2 6 0 0 2 72 10 7/è, Q/ 0 0 0 0 0 0 5 87 8/h/ 0 0 2 0 0 0 8 10 79

Table 7–4. Cross-validated classification results of all fricatives.

Predicted Group MembershipPlace /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h//f/ 79 17 0 0 0 2 2 0 0/T, D/ 8 72 8 0 0 0 3 0 8/DQ/ 0 6 77 2 0 0 10 0 4/s, z/ 0 2 0 84 9 3 1 0 0/sQ/ 2 0 0 15 83 0 0 0 0/S/ 0 0 0 0 0 98 2.1 0 0/X, K/ 0 2 7 0 0 2 70 12 7/è, Q/ 0 0 0 0 0 1 7 82 9/h/ 0 0 2 0 0 0 8 13 77

108

A A

A

AAAA

A

AA

AA

A

A

A

A

A

AAA

A

AA A

A

AAA AA A

A AAA

AA

A A

AA

A

A

A

AA

AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

AA

AA

AA

AA

A

AAA

A

A

A

A

A

A

A

A A AA

AA

AA

A

A

A

A

A

AA

A

AAA

AA

A

A

A

AA A

AA

A

A

A

AA

AA

AA

A

AA

A

A

A A

A

A

A

A

AA

AA

AAA

A

A

A

A

A

AA

A A

A

AA

AA

A AAAA

A

A

AA

A

A

AA

A

A

A

A

A

A AA

A

AA

A

A

A

A

A

AAA A A

A

AA

AA

A

A

A A

AA AAA

AA

AA A

A

AAA

AA

AA

A

A

A

AA

A A

A

A

A

A

A

AA

A

AA

A

AA

A A

A

A

A

A

AA

A

AA

A

AA

AA

A

A

AA

AA

AA

A A

A AAA

A

AA A AA

AAAA

AA

A

A

AA

AA

AA

A A

A

A

AA

AA

AA

A

A

A

AA

AA

AA

A

A

A

A

A

A

A

A

A

AA

A

AAAA

A

AA

AA

AA

A

A

AAA

AA

AA

A

AAA

A

AA AA

A

A

AA

A

A

A

A A

A

A

A A

A

A

A

AA

AA A

AAA AA A

A A

A

A

A

A

AA

A

AA

A

A

A

A A

A

A

AA

A

A

A

A

AA AA A

A

A

A

A

A AAA

A

A

A

A

AAA

A

A

A

A

A

A A

A

A

AA

A

A

AA

A

A

A

AA

AA

AA

A AA

A A

AA

A

AA

AAA

A

A

AA AAA

A

A

A

A

A

A

A

A

AA

A

A

A

AA

A

A

A

A

A

A

AA

A

A

A

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

AA

A

A

A

A

A

AA

AA

A

A

A

A

A

A

AA

A

A

A

A

A

A

A A

AA

AA

AAA

A

A A

A AA

A

A

AA AAAA

A

A

AA

A

A

A

A

AA

A

AA AA

AA

A

A

A

AAA

A

AA

AA

A

A

AA AA

A

A

A

A

AA

A AA

AA

A

AA

AA

A A A

A AAA

A

AA

A AA

AA

AA A

A

A

A

A

A

AA

A

Labiodental

Dental

Alveolar

Post-Alveolar

Uvular

Pharyngeal

Pharyngealized Dental

Pharyngealized Alveolar

Glottal

Predicted Group

Figure 7–1. Discrimination plane for all fricatives.

109

Table 7–5. Overall classification results of voiced fricatives.

Predicted Group MembershipPlace /D/ /DQ/ /z/ /K/ /Q//D/ 89.6 8.3 0 2.1 0/DQ/ 8.3 87.5 0 4.2 0/z/ 0 0 100 0 0/K/ 6.3 4.2 0 89.6 0/Q/ 0 0 0 2.1 97.9

Table 7–6. Cross-validated classification results of voiced fricatives.

Predicted Group MembershipPlace /D/ /DQ/ /z/ /K/ /Q//D/ 83.3 8.3 2.1 6.3 0/DQ/ 14.6 75 0 10.4 0/z/ 0 0 100 0 0/K/ 6.3 6.3 0 83.3 4.2/Q/ 0 0 0 2.1 97.9

Table 7–7. Overall classification results of voiceless fricatives.

Predicted Group MembershipPlace /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h//f/ 79.2 16.7 0 0 2.1 2.1 0 0/T 8.3 91.7 0 0 0 0 0 0/s 0 2.1 87.5 8.3 2.1 0 0 0/sQ/ 0 0 18.8 81.3 0 0 0 0/S/ 0 0 0 0 100 0 0 0/X 0 2.1 0 0 2.1 91.7 4.2 0/è/ 0 0 0 0 0 6.3 93.8 0/h/ 0 0 0 0 0 0 6.3 93.8

110

Table 7–8. Cross-validated classification results of voiceless fricatives.

Predicted Group MembershipPlace /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h//f/ 83.3 12.5 0 0 2.1 2.1 0 0/T 6.3 93.8 0 0 0 0 0 0/s 0 0 91.7 8.3 0 0 0 0/sQ/ 0 0 10.4 89.6 0 0 0 0/S/ 0 0 0 0 100 0 0 0/X 0 0 0 0 0 97.9 2.1 0/è/ 0 0 0 0 0 2.1 97.9 0/h/ 0 0 0 0 0 0 6.3 93.8

AAA

AAAA

AAA

AAAA

AA

AAAA

AA

A A

A

AAA AA AA A

AA

AAA A

AAA

A

A

AAAA

AAA A

A

A AA

AAAAA

AA

AAAA

AAAAAAA

A

AA

A A

AA

AA

AAAA

AAA

AAA

AA

AA

AA

AA

AAAA

AAAAAA

AA AA

A A

AA AAAAA

AA A

A

AAA

AAAAA

AAA

AA

AA

A

A

A

A A

AA

AA

AA A AA

AA

A

AAA

AAA

AAA

A

AAAA

AA

AA

A AAAAAA

AAA AA

AAAA

AAA

AA A

A

AAAAAA AAA

AAA

A

AAAA

A

AA

AAAA

A

AA AAA

A

AA

AA

A

A AAA

AA

AA AA

A AA

A A

A

A

A A

A

A

AA

A

A

AAA

A AAA

A AAA AAAA

AAA

AAAAAA A AAA

A

AA

A

AA

A

AA

A

AA

AA

AAAA

A AAA

A A

AAA

AA

A A AAAAA

A

AAAA

AA

AA

A

AAAA

A AA

A

A

AAA

AAAA

AA

AAAA

A AAAA

AA

AAA

A AA

AAAA A

AA

AAA

AA

A

AA

A

A

A

A A

AA

A

A

AAAA

AA

AA

A

AAA

A

AA

AA

A

A

A

AAA A

AAAAAA

A

AAA

AAA A

AA

AAAA AA AA

A

AAA

AA

AAA

AA

A

AA AA

A

AA

AA

AAA

A

AAA AA

AAA

AA

AA

A

A

AA

A AAAA

A

AA

AA

A

A

AA A

A

A

A AAAA AAA

AA

AAA

AAA

A

AAAA

A

A

AAA

A

AAA

AAA A

A

AAA

AAAA

AA

A

AAAAA

AAA

A

A

A

A

AA

AA

A AAAA

A

A

A

A

AA

A

A

A

A

A

AA

A

AA

A

A

A

A

A

AAA

AA

A

A

A

A

A

AA

AA

A

A

AA

A

AAAA

AAAA

AA A

AA

AA

A

A

LabiodentalDentalAlveolarPost-AlveolarUvularPharyngealPharyngealized DentalPharyngealized AlveolarGlottal

Predicted Group

A

B

Figure 7–2. Discrimination plane for voiced and voiceless fricatives. A) voiced. B)voiceless.

CHAPTER 8GENERAL DISCUSSION

Several acoustic measurements were investigated in our study with the aim of

describing the acoustic characteristics of fricatives as produced by native speakers

of Arabic. The use of Arabic was motivated by three reasons. First, fricative

articulation in Arabic spans most of the places of articulation in the vocal tract,

starting from the lips and ending at the glottis. Second, for certain fricatives

in Arabic, a phonemic distinction exists between plain fricatives (/D/ and /s/)

and their pharyngealized counterparts (/DQ/ and /sQ/); and between short and

long vowels (/i - i:, u - u:, a - a:/). Third, the majority of studies dealing with

the acoustic characteristics of fricatives have been carried out predominantly

with reference to English fricatives. Therefore, our study aimed at describing

the acoustic characteristics of Arabic fricatives utilizing many of the acoustic

measurements investigated in other related studies, with specific interest in finding

cues that would differentiate between plain and pharyngealized fricatives.

The cues investigated in our study were amplitude measurements (relative

and normalized frication noise amplitude), spectral measurements (spectral

peak location and spectral moments), temporal measurements (absolute and

normalized frication noise duration) and formant information at the fricative-vowel

transition (F2 at vowel onset and locus equation). Along with reporting these

cues, an attempt was also made to classify fricatives into their respective places

of articulation using statistical modeling (discriminant function analysis) with an

optimum combination of the measurements mentioned above.

111

112

8.1 Temporal Measurement

Findings of the present study were in agreement with previous research dealing

with the effect of place of articulation to the frication noise duration. Specifically,

in agreement with previous research (Behrens and Blumstein 1988b; Jongman

1989; Pirello et al. 1997), our study found that the overall absolute frication noise

duration of sibilant fricatives (mean 138.09 ms) was longer than nonsibilants (mean

109.34 ms). The longer duration of sibilants can be attributed to the greater

articulatory effort needed to force air through the narrow constriction required for

sibilant articulation. Additionally, frication noise duration of voiceless fricatives

(mean 134.21 ms) was longer on average than that of voiced fricatives (mean 92.05

ms). Such effect of voicing was also found in previous studies of English (Cole and

Cooper 1975; Baum and Blumstein 1987; Crystal and House 1988; Fox, Nissen,

McGory, and Rosenbauer 2001; Nissen 2003) and Spanish fricatives (Manrique and

Massone 1981). The effect of voicing on the reduction of segmental duration can

be attributed in part to the decrease in air flow due to higher glottal impedance

during voicing.

Contrary to what was reported in previous research (Nissen 2003), our study

did not find an effect of vowel context for vowels of the same length. However,

fricative duration was significantly longer when it was followed by long high

vowels (/i:, u:/) than when followed by their short counterparts (/i/ and /u/

respectively). Similar results with regard to sibilant/nonsibilant duration and

effect of voicing were obtained when the duration of the fricatives was normalized

relative to word duration. However, a different pattern of vowel context effect

emerged with normalized frication duration. Specifically, within long vowels, high

vowels (/i:, u:/) induced a longer normalized frication duration than the low vowel

/a:/. Additionally, the normalized frication noise duration of fricatives was longer

preceding the front vowel /i:/ than preceding the back vowel /u:/. Such effects

113

of vowel context are not surprising if intrinsic differences between vowel duration

is taken into consideration. Vowel duration has been shown to corrolate with the

degree of jaw lowering associated with its production such that the lower the vowel

the longer its duration. (Fant 1960; Lindblom 1967; Beckman 1986).

8.2 Amplitude Measurement

Both normalized frication noise amplitude and relative amplitude were

investigated in our study. Normalized frication RMS amplitude was defined as

the difference between the RMS amplitude of frication noise and the average

RMS amplitude of three consecutive pitch periods at the point of maximum vowel

amplitude. The findings of our study are consistent with findings from previous

research in that such measurements differentiated nonsibilants (/f, T, D, DQ/) as a

class from sibilant fricatives (/s, sQ, z, S/) while failing to distinguish within each

of the two classes. Although Jongman et al. (2000) study of English fricatives

found noise amplitude to differentiate within sibilants and within nonsibilants,

other research on frication noise amplitude (Strevens 1960; Heinz and Stevens 1961;

Manrique and Massone 1979; Behrens and Blumstein 1988a) reported that while

frication noise amplitude distinguished between sibilant and nonsibilants fricatives,

it could not distinguish within sibilant or within nonsibilant fricatives.

The decrease in nonsibilant frication noise normalized RMS amplitude as

compared with sibilant fricatives was expected given the intrinsic amplitude

associated with the two classes. Specifically, sibilant articulation, as explained in

Section (8.1), involves a greater articulatory effort to force the air through the

narrow constriction needed for sibilant articulation, giving rise to an increase in

noise amplitude. The same reasoning can be used to explain the lower frication

noise RMS amplitude of voiceless fricatives (mean −14.22 dB) as compared to their

voiced counterparts (mean −18.26 dB). An additional source for this difference

is the presence of two sources of acoustic energy during the production of voiced

114

fricative. The energy resulting from glottal vibration during voicing, in addition to

acoustic energy resulting from frication at an oral constriction, results in an overall

increase in the RMS amplitude of voiced fricatives.

Not surprising also was the finding that normalized frication noise RMS

amplitude increased proportional to the height of the vowel. Recall here that

frication noise RMS amplitude is normalized by subtracting the vowel RMS

amplitude, so when the intrinsic vowel amplitude increases, the overall normalized

noise frication RMS amplitude decreases. Additionally, such intrinsic vowel

amplitude is controlled by the degree of openness/closeness (height) of the

vowel. In the articulation of /a (:)/, the oral cavity is wide open giving rise to

an acoustic waveform of intrinsically higher amplitude (Lehiste and Peterson 1959;

Beckman 1986). The opposite is true with high vowels. Interestingly, intrinsic

vowel amplitude, as well as duration (see above), led to significant differences in the

overall frication noise RMS amplitude only when the comparisons are confined to

long vowels.

Previous research on relative amplitude generally involved the perceptual

effect of this cue on distinguishing places of articulation with Jongman et al.

(2000) as the only notable exception. Our study found relative amplitude to

be a reliable acoustic cue that differentiates among some, but not all, places of

fricative articulation. On the other hand, the trend in our data was parallel to

previously reported values in the literature (Hedrick and Ohde 1993; Jongman

et al. 2000). Specifically, the voiceless post-alveolar fricative (/S/, mean = 0.9 dB)

had the greatest relative amplitude, indicating a stronger concentration of energy

above the F3 region. Furthermore, in line with Jongman et al. (2000) findings,

our study found that nonsibilants, especially voiceless ones, have the highest

relative amplitude. More importantly, pharyngealized fricatives /DQ/ and /sQ/ had

significantly lower relative amplitude than their plain counterparts.

115

The difference in relative amplitude between plain and pharyngealized

fricatives can be attributed to the lowering of vowel’s F2 frequency caused by

pharyngealization (Stevens 1998) with the increase in amplitude associated with

it. Recall here that for pharyngealized fricatives, relative amplitude was defined as

the difference between the fricative’s and the vowel’s amplitude at the F2 region.

Therefore, an increase in vowel amplitude at such frequency will lead to a lowering

of the relative amplitude value. There was also an effect of vowel context parallel to

that obtained for normalized frication noise RMS amplitude. As before, such effect

of vowel context is related to vowels’ intrinsic amplitude. With relative amplitude,

our study revealed that relative amplitude measured for fricatives preceding low

vowel /a:/ was significantly lower than those preceding high vowels /i:, u:/, due to

the inherent higher amplitude of /a:/.

8.3 Spectral Measurement

Spectral peak location of fricatives, as was the case in previous studies

(Hughes and Halle 1956; Strevens 1960; Manrique and Massone 1981; Behrens

and Blumstein 1988b; Jongman et al. 2000), tends to decrease as the place of

articulation moves backwards in the oral cavity. Furthermore, the results of the

current study were in line with previous research in that spectral peak location

distinguished nonsibilant from sibilant fricatives, with the only exception being

the similar values obtained for /s/ and voiceless nonsibilants /f, T/. Although

spectral peak location distinguished between post-alveolar /S/ and alveolar

fricatives /s, z/, it failed to distinguish among nonsibilants. Moreover, plain and

pharyngealized fricatives did not differ in terms of the frequency of the amplitude

peak as measured at the midpoint of frication noise.

Of interest here, however, is the fact that three mutually exclusive regions of

fricative place of articulation can be identified based on spectral peak location. For

voiceless fricatives, the first group includes fricatives articulated at or anterior to

116

the alveolar ridge, the second includes post-alveolar and uvular fricatives, while

the third group consists of pharyngeal and glottal fricatives. For voiced fricatives,

the groups followed the more traditional division of nonsibilants, sibilant and

back-articulated fricatives. Spectral peak location was found not to be affected by

vowel length but rather by its degree of roundedness such that rounded vowel /u/

introduced a lower spectral peak location than unrounded vowels /i, a/.

Spectral moments (spectral mean, variance, kurtosis and skewness) were

estimated in our study from four windows centered at frication noise onset,

midpoint, offset and transition into the vowel. Albeit lower due to the male

population from which the data were sampled, the average values for spectral mean

in our study were consistent with those reported for similar fricatives in Jongman

et al. (2000); Nissen (2003): alveolar fricatives had the highest while the lowest

spectral mean was observed for pharyngeal and glottal fricatives. Furthermore,

spectral mean, averaged across all windows, served to distinguish all places of

voiced fricatives articulation, and, as was the case with spectral peak location,

identified three mutually exclusive groups of voiceless fricatives (/f, T, s, sQ/,

/S, K/ and /Q, h/). Such classification ability of spectral mean, for both voiced

and voiceless fricatives, was present at the second (frication noise midpoint) and

third (transition) windows. It was also found that voiceless fricatives had higher

spectral means than voiced fricatives in the first three windows, while the effect was

reversed when the vocalic part (transition window) was used to measure spectral

mean.

Similar to the effects explained above for spectral peak location, vowel context

also influenced the measured spectral mean in all four windows; with rounded vowel

/u(:)/ introducing lower spectral mean for the fricatives. Specifically of interest

here is the fact that it was only when the fricative’s transition into the vowel was

used to derive spectral mean values that a significant difference between plain

117

and pharyngealized fricatives was observed in part due to pharyngealization effect

on the vocalic part of the window. As mentioned above, the general pattern of

the obtained spectral mean values was parallel to that of Jongman et al. (2000).

Contrary to this similarity, in our study spectral mean was more effective at the

frication midpoint and offset in separating fricatives into their respective places of

articulation as compared to Jongman et al. onset and transition windows.

The results obtained for the second statistical moment (variance) were parallel

in nature to that of spectral mean and very similar to values reported by Nissen

(2003). No direct comparison could be made with variance values reported in

Jongman et al. (2000) since in that study values were averaged across voicing.

However, like both studies, our study found spectral variance of sibilants to be

significantly lower than sibilants in the first three windows for voiceless fricatives

and at all windows for voiced fricatives. Nevertheless, no differences were found

within nonsibilant fricatives. Jongman et al. (2000) reported similar results for all

but the second window. Another finding consistent with previous research is the

lower variance of voiceless fricatives as compared to voiced fricatives (4.5 MHz and

5.4 MHz respectively) at the middle of frication noise. Although variance served to

distinguish many of fricative place of articulation, it failed at all of the four analysis

windows to statistically distinguish between plain and pharyngealized fricatives, or

between fricatives in the vocalic contexts differing in length.

Skewness measured at all window locations did not differentiate between plain

fricatives and their pharyngealized counterparts. However, skewness measured at

the second and third windows differentiated between all voiced fricatives. With

the exception of alveolar /z/ that had the only negatively skewed distribution

among voiced fricatives, skewness became positively skewed and increased as the

place of articulation advances backwards in the oral cavity. For voiceless fricatives,

skewness distinguished between sibilants and nonsibilants; and within sibilants at

118

the second analysis window. In general, alveolar fricatives had the lowest skewness

indicating a concentration of energy at higher frequencies, while such concentration

of energy was at lower frequencies for pharyngeal and glottal fricatives. Although

the number of places investigated here is greater than in either Jongman et al.

(2000) or Nissen (2003), our results are in general agreement with both studies for

alveolar and post-alveolar fricatives. Also, our study is in agreement with Jongman

et al. in that skewness increases substantially at the fricative-vowel transition due

to “the predominance of low-frequency over high-frequency energy as the vowel

begins” (Jongman et al. 2000, p. 1257). The effect of the vowel context became

more pronounced at this transition window with rounded vowels /u, u:/ with their

inherently lower frequencies.

Kurtosis was used previously in the literature as a measure of the peakedness

if the spectral distribution. In our study, kurtosis was substantially higher for

pharyngeal fricatives /è, Q/ at the first three windows than all other fricatives.

Furthermore, the peakedness of alveolar fricatives observed elsewhere in the

literature (Tomiak 1990; Jongman et al. 2000; Nissen 2003) was not observed in our

results.

8.4 Transition Information

Formant transitions at the fricative-vowel boundary were investigated in our

study using measures of the second formant at transition and locus equations. For

F2 values, the results obtained were consistent with predictions of the Source-Filter

theory of speech production. Specifically, F2 values of pharyngealized fricatives

were significantly lower than their plain counterparts. As mentioned previously,

such values are expected due to the lowering effect of second formant in pharyngeal

co-articulation (Stevens 1998). Also of interest was the finding that, within the

back articulated fricatives, only the uvular fricatives had similar (and significantly

lower) F2 values than sibilants and nonsibilants.

119

The similar grouping of uvular and pharyngealized fricatives suggests similar

articulatory processes in their production. The reasoning behind this grouping is

twofold: first, values of F2 are inversely related to the height of the tongue; and

second, the secondary constriction involved in the /DQ, sQ/ production is in a higher

position than that of plain pharyngeal fricatives (Al-Ani 1970; McCarthy 1994;

Ladefoged and Maddieson 1996). Therefore, the fact that both pharyngealized

and uvular fricatives shared similar F2 properties, that were distinct from all

other fricatives, supports McCarthy (1994)’s proposal to name co-articulated

emphatics in Arabic as “uvularized” rather than “pharyngealized”. However, such

a generalization should be taken cautiously since the realization of emphatics as

either uvularized or pharyngealized is dependent on the dialect of Arabic used

(Keating 1988; Zawaydeh 1997; Watson 1999).

Both the slope and y-intercept of locus equations in our study, in general, did

not distinguish between all the different places of fricative articulation. However,

both measurements served to distinguish uvular and glottal fricatives /X, K, h/

as a group having a higher slope and a lower y-intercept than all other fricatives.

More importantly and in contrast to findings reported in Yeou (1997), y-intercept

of pharyngealized fricatives did not differ from their plain counterparts, while only

the slope of /DQ/ was different from /D/.

8.5 Discriminant Analysis

The various acoustical cues, except for locus equations, were used in a

discriminant function analysis to identify the cues maximally contributing to the

classification of fricatives into places of articulation. It was found that the spectral

mean (at frication noise onset, middle, and offset), skewness (at onset, offset of

frication and transition into the vowel), second formant at vowel onset, normalized

RMS amplitude and spectral peak location were the variables contributing the

most to the overall classification with a success rate of 83.2% . When voicing was

120

specified in the model the correct classification rate increased to 92.9% for voiced

and 93.5% for voiceless fricatives. It is worth mentioning, however, that if rate of

misclassification was taken into consideration, then fricatives could be clustered

into three groups, namely nonsibilants, sibilants and gutturals with pharyngealized

fricatives grouped with their plain counterparts in the same natural class.

8.6 Conclusion

Our study investigated the acoustic characteristics of Arabic fricatives. Results

obtained from most of the cues used were consistent with results obtained in

previous research for fricatives in other languages. Among the cues investigated,

spectral measures were the most efficient in distinguishing among the different

places of fricative articulation. Further research should focus on the perceptual

reality of the acoustic cues investigated in this study and how changes in the

acoustic cue effect the perceptually of fricative place of articulation.

REFERENCES

Abdelatty Ali, A. M., J. Van der Spiegel, and P. Mueller (2001). Acoustic-phoneticfeatures for the automatic classification of fricatives. J Acoust Soc Am 109 (5 Pt1), 2217–2235.

Al-Ani, S. H. (1970). Arabic Phonology. Paris: Mouton, The Hague.

Alwan, A. (1989). Perceptual cues for place of articulation for the voicedpharyngeal and uvular consonants. J Acoust Soc Am 86 (2), 549–556.

Anderson, N. (1978). On the calculation of filter coefficients for maximum entropyspectral analysis. In D. G. Childers (Ed.), Modern spectrum analysis, pp.252–255. New York, NY: IEEE Press.

Baum, S. R. and S. E. Blumstein (1987). Preliminary observations on the use ofduration as a cue to syllable-initial fricative consonant voicing in English. JAcoust Soc Am 82 (3), 1073–1077.

Beckman, M. E. (1986). Stress and non-stress accent. Dordrecht, Holland: Foris.

Behrens, S. and S. E. Blumstein (1988a). Acoustic characteristics of Englishvoiceless fricatives:a descriptive analysis. J Phonetics 16, 295–298.

Behrens, S. and S. E. Blumstein (1988b). On the role of the amplitude of thefricative noise in the perception of place of articulation in voiceless fricativeconsonants. J Acoust Soc Am 84 (3), 861–867.

Boersma, P. and D. Weenink (2004). Praat: a system for doing phonetics bycomputer. Amsterdam: Institute of Phonetic Sciences of the University ofAmsterdam.

Chen, H. and K. N. Steven (2001). An acoustical study of the fricative /s/ in thespeech of individuals with dysarthria. J Speech Lang Hear Res 44 (6), 1300–1314.

Cole, R. A. and W. E. Cooper (1975). Perception of voicing in English affricatesand fricaitves. J Acoust Soc Am 58 (6), 1280–1287.

Crystal, T. and A. House (1988). Segmental durations in connected-speech signals:Current results. J Acoust Soc Am 83, 1553–1573.

El-Halees, Y. (1985). The role of F1 in the place-of-articulation distinction inArabic. J Phonetics 13 (3), 287–298.

Fant, G. (1960). Acoustic theory of speech production. Mouton: The Hague.

121

122

Ferguson, C. A. (1959). Diglossia. Word 15, 325–340.

Forrest, K., G. Weismer, P. Milenkovic, and R. N. Dougall (1988). Statisticalanalysis of word-initial voiceless obstruents: preliminary data. J Acoust SocAm 84 (1), 115–123.

Fowler, C. A. (1994). Invariants, specifiers, cues: An investigation oflocus equations as information for place of articulation. Perception &Psychophysics 55, 597–611.

Fox, R. A., S. Nissen, J. McGory, and K. Rosenbauer (2001). Age-related changesin the acoustic characteristics of voiceless English fricative. J Acoust SocAm 110, 2704.

Govindarajan, K. (1998). Listeners’ perceptual mapping of locus equations andvariability. Behav Brain Sci 21 (2), 266–267.

Gurlekian, J. A. (1981). Recognition of the Spanish fricatives /s/ and /f/. J AcoustSoc Am 70 (6), 1624–1627.

Hair, J., R. Anderson, and R. Tatham (1987). Multivariate data analysis withreadings. New York, NY: MacMillan.

Harrington, J. and S. Cassidy (1999). Techniques in Speech Acoustics. Norwell,MA: Kluwer Academic Publisher.

Harris, F. J. (1978). On the use of windows for harmonic analysis with the discretefourier transform. Proceedings of IEEE 66, 51–83.

Harris, K. S. (1958). Cues for the discrimination of American English fricatives inspoken syllables. Lang Speech 1, 1–7.

Hedrick, M. (1997). Effect of acoustic cues on labeling fricatives and affricates. JSpeech Lang Hear Res 40 (4), 925–938.

Hedrick, M. S. and R. N. Ohde (1993). Effect of relative amplitude of frication onperception of place of articulation. J Acoust Soc Am 94 (4), 2005–2027.

Heinz, J. M. and K. N. Stevens (1961). On the properties of voiceless fricativeconsonants. J Acoust Soc Am 33, 589–596.

Hughes, G. W. and M. Halle (1956). Spectral properties of fricative consonants. JAcoust Soc Am 28, 303–310.

Jassem, W. (1979). Classification of fricative spectra using statistical discriminantfunctions. In B. Lindblom and S. Ohman (Eds.), Fronteirs of Speech Research.London: Academic Press.

Johnson, K. (1997). Acoustic and Auditory Phonetics. Oxford: Blackwell.

123

Jongman, A. (1989). Duration of fricative noise required for identification ofEnglish fricatives. J Acoust Soc Am 85, 1718–1725.

Jongman, A. (1998). Are locus equations sufficient or necessary for obstruentperception? Behav Brain Sci 21 (2), 271–272.

Jongman, A., R. Wayland, and S. Wong (2000). Acoustic characteristics of Englishfricatives. J Acoust Soc Am 108 (3 Pt 1), 1252–1263.

Kaye, A. S. (1972). Arabic /z/: A synchronic and diachronic study. Linguistics 79,31–63.

Keating, P. (1988). A Survey of Phonological Features. Bloomington, IN: IndianaUniversity Linguistics Club.

Kent, R. D. and C. Read (2002). The Acoustic Analysis of Speech. San Diego:Singular Publishing Group.

Klecka, W. (1980). Discriminant Analysis. London: Sage.

Krull, D. (1989). Second formant locus pattern and consonant-vowel coarticulationin spontaneous speech. Perilus 10, 87–108.

Ladefoged, P. and I. Maddieson (1996). The sounds of the world’s languages.Oxford: Blackwell.

LaRiviere, C., H. Winitz, and F. Herriman (1975). The distribution of perceptualcues in English prevocalic fricatives. J Speech Hear Res 18, 613–622.

Lehiste, I. and G. Peterson (1959). Vowel amplitude and phonemic stress inamerican english. J Acoust Soc Am 31, 428–435.

Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy(1967). Perception of the speech code. Psychol Review 74 (6), 431–461.

Lindblom, B. (1963). A spectrographic study of vowel reduction. J Acoust SocAm 35, 1773–1781.

Lindblom, B. (1967). Vowel duration and a model of lip mandible coordination.STL-QPSR 8 (4), 1–29.

Mann, V. A. and B. H. Repp (1980). Influence of vocalic context on perception ofthe [s] - [sh] distinction. Perception & Psychophysics 28, 213–228.

Manrique, A. M. and M. I. Massone (1979). On the identification of ArgentineSpanish voiceless fricatives. In Proceedings of the Ninth International Congress ofPhonetic Sciences, Volume 1, Copenhagen, Denmark, pp. 237.

Manrique, A. M. and M. I. Massone (1981). Acoustic analysis and perception ofSpanish fricative consonants. J Acoust Soc Am 69 (4), 1145–1153.

124

McCarthy, J. (1994). The phonetics and phonology of semitic pharyngeals. InP. Keating (Ed.), Papers in laboratory phonology 3: Phonological structure andphonetic form, pp. 191–233. Cambridge: Cambridge University Press.

McCasland, G. P. (1979). Noise intensity and spectrtuirt cues for spoken fricatives.J Acoust Soc Am Suppl 165, S78–79.

Nissen, S. (2003). An accoustic analysis of voicless obstruents produced by adultsand typically developing children. Ph. D. thesis, Ohio State University, Columbus,OH.

Nittrouer, S. (1995). Children learn separate aspects of speech production atdifferent rates: evidence from spectral moments. J Acoust Soc Am 97 (1),520–530.

Nittrouer, S., M. Stiddert-Kennedy, and R. McGowan (1989). The emergenceof phonetic segments: evidence from the spectral structure of fricative-vowelsyllables spoken by children and adults. J Speech Hear Res 32, 120–132.

Norlin, K. (1983). Acoustic analysis of fricatives in cairo Arabic. Working Papers,Phonetics Laboratory, Lund University 25, 113–137.

Pentz, A., H. R. Gilbert, and P. Zawadzki (1979). Spectral properties of fricativeconsonants in children. J Acoust Soc Am 66 (6), 1891–1893.

Pirello, K., S. E. Blumstein, and K. Kurowski (1997). The characteristics of voicingin syllable-initial fricatives in American English. J Acoust Soc Am 101 (6),3754–3765.

Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992).Numerical recipes in C: the art of scientific computing. Cambridge: CambridgeUniversity Press.

Shadle, C., S. J. Mair, and J. N. Carter (1996). Acoustic characteristics of the frontfricatives [f, v, T, D]. In Proceedings of ETRW - 4th Speech Production Seminar,Aturans, France, pp. 193–169.

Shadle, C. H. (1985). The acoustics of fricative consonants. Ph. D. thesis, M.I.T.,Cambridge, MA.

Shadle, C. H. (1990). Articulatory-acoustic relationships in fricative consonants. InW. J. Hardcastle and A. Marchal (Eds.), Speech Production and speech modelling,pp. 187–209. Dordrecht, Netherlands: Kluwer Academic Publishers.

Shadle, C. H. and S. J. Mair (1996, October). Quantifying spectral characteristicsof fricatives. In Proceedings of the Fourth International Conference on SpokenLanguage Processing, Volume 3, Philadelphia, PA., pp. 1521–1524.

125

Soli, S. D. (1981). Second formants in fricatives: acoustic consequences of fricative-vowel coarticulation. J Acoust Soc Am 70 (4), 976–984.

Stevens, J. (2002). Applied multivariate statistics for the social sciences. Mahwah,NJ: Erlbaum.

Stevens, K. N. (1971). Airflow and turbulence noise for fricative and stopconsonants: Static considerations. J Acoust Soc Am 50, 1182–1192.

Stevens, K. N. (1985). Evidence for the role of acoustic boundaries in theperception of speech sounds. In V. Fromkin (Ed.), Phonetic Linguistics., pp.243–256. New York, NY: Academic Press.

Stevens, K. N. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press.

Stevens, K. N. and S. E. Blumstein (1981). The search for invariant acousticcorrelates of phonetic features. In P. D. Eimas and J. L. Miller (Eds.),Perspectives of the Study of Speech. Hillsdale, NJ: Erlbaum.

Strevens, P. (1960). Spectra of fricative noise in human speech. Lang Speech 3,32–49.

Sussman, H. M. (1994). The phonological reality of locus equations across mannerclass distinctions: Preliminary observations. Phonetica 51, 119–131.

Sussman, H. M., D. Fruchter, J. Hilbert, and J. Sirosh (1998). Linear correlatesin the speech signal: the orderly output constraint. Behav Brain Sci 21 (2),241–299.

Sussman, H. M., K. A. Hoemeke, and F. S. Ahmed (1993). A cross-linguisticinvestigation of locus equations as a phonetic descriptor for place of articulation.J Acoust Soc Am 94 (3 Pt 1), 1256–1268.

Sussman, H. M., H. A. McCaffrey, and S. A. Matthews (1991). An investigation oflocus equations as a source of relational invariance for stop place categorization.J Acoust Soc Am 90, 1309–1325.

Tabain, M. (1998). Non-sibilant fricatives in English: spectral information above 10khz. Phonetica 55 (3), 107–130.

Tabain, M. (2001). Variability in fricative production and spectra: implicationsfor the hyper- and hypo- and quantal theories of speech production. LangSpeech 44 (Pt 1), 57–94.

Tabain, M. (2002). Voiceless consonants and locus equations: a comparison withelectropalatographic data on coarticulation. Phonetica 59 (1), 20–37.

Tjaden, K. and G. S. Turner (1997). Spectral properties of fricatives inamyotrophic lateral sclerosis. J Speech Lang Hear Res 40 (6), 1358–1372.

126

Tomiak, G. R. (1990). An acoustic and perceptual analysis of the spectral momentsinvariant with voiceless fricative obstruents. Ph. D. thesis, State University ofNew York, Buffalo, NY.

Watson, J. C. (1999). The directionality of emphasis spread in arabic. LinguisticInquiry 30, 289–300.

Wilde, L. (1993). Inferring articulatory movements from acoustic properties atfricative-vowel boundaries. J Acoust Soc Am 94, 1881.

Wilde, L. F. and C. B. Huang (1991). Acoustic properties at fricative-vowelboundaries in American English. In Proceedings of the of the 12th InternationalCongress of Phonetics Sciences, Aix-en-Provence, pp. 394–401.

Yeou, M. (1997). Locus equations and the degree of coarticulation of Arabicconsonants. Phonetica 54, 187–202.

Zawaydeh, B. A. (1997). An acoustic analysis of uvularization spread in Ammani-Jordanian Arabic. Studies in the Linguistic Sciences 27 (1), 185–200.

BIOGRAPHICAL SKETCH

Mohamed Ali Al-Khairy was born in Makkah, Saudi Arabia. He went to Umm

Al-Qura University and earned his B.A. in English Literature and Linguistics. At

the University of Florida, he started graduate study in linguistics in Fall 1998. He

completed an M.A. in linguistics in Fall 2000 and then embarked on a Ph.D. degree

in linguistics. During his study, he taught for the Department of African and Asian

Languages and Literature from 1999 to 2004. He received an Alec Courtelis Award

for Exceptional International Students in 2002 and a College of Liberal Arts and

Sciences Award for International Student with Outstanding Academic Achievement

in the same year. He was also awarded a McLaughlin Dissertation Fellowship in

Spring 2005.

127

acoustic characteristics of arabic fricatives · 2010-05-07 · problem of variability in the...

Documents