acoustic characteristics of arabic fricatives · 2010-05-07 · problem of variability in the...
TRANSCRIPT
ACOUSTIC CHARACTERISTICSOF ARABIC FRICATIVES
By
MOHAMED ALI AL-KHAIRY
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2005
Copyright 2005
by
Mohamed Ali Al-Khairy
To my father who did not live to see the fruit of his work.
ACKNOWLEDGMENTS
After finishing writing this dissertation on a rainy summer night I decided
not to bother with a lengthy acknowledgment section. After all I was the one who
wrote it. Well, leaving ego and false pride aside, this work could not have been
done without the help of many. First and foremost, thanks go to The Almighty
GOD for His guidance and blessings without which graduate school would have
been a worse nightmare. My gratitude goes also to my wonderful supervisor and
mentor Dr. Ratree Wayland whose dedication to her students, teaching, and
research is beyond highest expectations. Without her help, guidelines, constant
encouragement, and support, this work would not have been possible. Members
of my supervisory committee (Dr. Gillian Lord and Dr. Caroline Wiltshire
from Linguistics, and Dr. Rahul Shirvastav from Communication Sciences and
Disorders) were of the utmost help in the process of finishing this work.
My stay in Gainesville introduced me to many people. Most were nice and
cheerful and some one could definitively live without. I will skip the latter
group to save space. However, among such nice and wonderful people I got
to know during this journey are the wonderful students, faculty, and staff of
the Linguistics Department who were of tremendous help both personally and
academically. My special thanks and gratitude go also to Dr. Aida Bamia and Dr.
Haig Der-Houssikian from the Department of African and Asian Languages and
Literature. Their supervision, friendship, and encouragement went far beyond the
responsibilities of mentors to those of parents. For that I will be eternally grateful.
I also would like to thank my study partners, Yousef Al-Dlaigan, who was unjustly
forced to change his career, and AbdulWaheed Al-Saadi, who was brave enough
iv
to finish his Ph.D. I regret to say that I am still unclear of the process of gene
transformation in strawberry and citrus. I hope though you learned from me how
to read a spectrogram. I tried my best.
Now is the fun part: thanking my friends in the phonetics lab. Listed in
chronological order of their liberation from school are Rebecca Hill, Jodi Bray,
Philip Monahan, Sang-Hee Yeon, HeeNam Park, Victor Prieto, and Manjula
Shinge. Yet to feel the wonderful breeze outside Turlington basement are my great
friends Andrea Dallas, Bin Li, and Priyankoo Sarmah. I thank them for all the
cheerful moments and laughs we shared at the University of Florida. Although life
might take us into different routes, our friendship is eternal.
Although they are in a different time zone, I thank my friends on the west
cost and across the Atlantic for their great advice and emotional support, without
which long nights would definitely have been longer. I will send them my phone
bills later. I am sure that I left out some names; for those unintentionally missed I
extend my apologies and sincere thanks.
The acoustic analyses in this dissertaion were carried out in a timely manner
thanks to the existence of the wonderful free PRAAT program and the abundant
help and suggestion from its authors and the PRAAT user community. Also, I was
extremely fortunate to escape the nightmare of typesetting using the popular-
but-not-really-friendly commercial software. I thank Ron Smith for making his
ufthesis LATEX class freely available.
Across oceans and continents, the prayers and encouragement of my parents
and siblings were a driving force and endless motivation to finish and join them
back home. Although God had other plans for my father and older brother, I am
sure they are proud of what their prayers from high above have accomplished.
Finally, words fall short in describing my gratitude and thanks toward my wife,
Nadaa; and kids, Faisal and Farah. They have suffered through this dissertation
v
almost as much as I have; maybe even more. Through the many nights I spent at
the lab, they have shown endless patience, love, and understanding. I truly cannot
imagine having gone through this process without such amazing love and support.
Parts of this work were supported by a McLaughlin Dissertation Fellowship
from the College of Liberal Arts and Sciences, University of Florida.
vi
TABLE OF CONTENTSpage
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Fricative Production . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Acoustic Cues to Fricative Place of Articulation . . . . . . . . . . 7
2.3.1 Amplitude Cues . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Duration Cues . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Spectral Cues . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.4 Formant Transition Cues . . . . . . . . . . . . . . . . . . . 22
2.4 Studies of Arabic Fricatives . . . . . . . . . . . . . . . . . . . . . 26
3 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1.3 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.1 Segmentation of Speech . . . . . . . . . . . . . . . . . . . . 313.2.2 Acoustic Analyses . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Statistical Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 AMPLITUDE AND DURATION . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Amplitude Measurements . . . . . . . . . . . . . . . . . . . . . . . 424.1.1 Normalized Frication Noise RMS Amplitude . . . . . . . . 424.1.2 Relative Amplitude of Frication Noise . . . . . . . . . . . . 45
vii
4.2 Temporal Measurements . . . . . . . . . . . . . . . . . . . . . . . 564.2.1 Absolute Duration of Frication Noise . . . . . . . . . . . . 564.2.2 Normalized Duration of Frication Noise . . . . . . . . . . . 59
5 SPECTRAL MEASUREMENTS . . . . . . . . . . . . . . . . . . . . . . 63
5.1 Spectral Peak Location . . . . . . . . . . . . . . . . . . . . . . . . 635.2 Spectral Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Spectral Mean . . . . . . . . . . . . . . . . . . . . . . . . . 715.2.2 Spectral Variance . . . . . . . . . . . . . . . . . . . . . . . 745.2.3 Spectral Skewness . . . . . . . . . . . . . . . . . . . . . . . 805.2.4 Spectral Kurtosis . . . . . . . . . . . . . . . . . . . . . . . 89
6 FORMANT TRANSITION . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.1 Second Formant (F2) at Transition . . . . . . . . . . . . . . . . . 966.2 Locus Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 STATISTICAL CLASSIFICATION OF FRICATIVES . . . . . . . . . . 102
7.1 Discriminant Function Analysis . . . . . . . . . . . . . . . . . . . 1027.2 Classification Accuracy of DFA . . . . . . . . . . . . . . . . . . . 1037.3 Classification Power of Predictors . . . . . . . . . . . . . . . . . . 1057.4 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . 105
8 GENERAL DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.1 Temporal Measurement . . . . . . . . . . . . . . . . . . . . . . . . 1128.2 Amplitude Measurement . . . . . . . . . . . . . . . . . . . . . . . 1138.3 Spectral Measurement . . . . . . . . . . . . . . . . . . . . . . . . 1158.4 Transition Information . . . . . . . . . . . . . . . . . . . . . . . . 1188.5 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1198.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
viii
LIST OF TABLESTable page
1–1 Arabic Fricatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4–1 Relative Amplitude: Vowel Context . . . . . . . . . . . . . . . . . . . 48
4–2 Mean Relative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . 53
5–1 Spectral Peak Location . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5–2 Spectral Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5–3 Spectral Skewness: Significant Contrasts for Voiced Fricatives . . . . . 86
5–4 Spectral Skewness: Significant Contrasts for Voiceless Fricatives . . . . 86
6–1 Second Formant at Transition . . . . . . . . . . . . . . . . . . . . . . 97
6–2 Locus Equation: Slope and y-intercept . . . . . . . . . . . . . . . . . . 101
7–1 Prior Probabilities for Group Membership . . . . . . . . . . . . . . . . 103
7–2 Variance Accounted for by DFA Functions . . . . . . . . . . . . . . . . 104
7–3 Overall Voiceless Classification . . . . . . . . . . . . . . . . . . . . . . 107
7–4 Cross-Validated Classification Results . . . . . . . . . . . . . . . . . . 107
7–5 Overall Voiced Classification . . . . . . . . . . . . . . . . . . . . . . . 109
7–6 Cross-Validated Voiced Classification . . . . . . . . . . . . . . . . . . 109
7–7 Overall Voiceless Classification . . . . . . . . . . . . . . . . . . . . . . 109
7–8 Cross-Validated Voiceless Classification . . . . . . . . . . . . . . . . . 110
ix
LIST OF FIGURESFigure page
3–1 Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 32
3–2 Segmentation of /Q/ . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3–3 Hamming vs. Kaiser Window . . . . . . . . . . . . . . . . . . . . . . 35
3–4 Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4–1 Frication Noise RMS Amplitude . . . . . . . . . . . . . . . . . . . . . 43
4–2 Frication Noise RMS Amplitude: Vowel Context . . . . . . . . . . . . 44
4–3 Frication Noise RMS Amplitude: Place and Voicing . . . . . . . . . . 45
4–4 Relative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4–5 Relative Amplitude: Place and Voicing . . . . . . . . . . . . . . . . . 49
4–6 Relative Amplitude; Place and Short Vowels . . . . . . . . . . . . . . 51
4–7 Relative Amplitude; Place and Long Vowels . . . . . . . . . . . . . . 52
4–8 Relative Amplitude: Voicing and Short Vowels . . . . . . . . . . . . . 54
4–9 Relative Amplitude: Voicing and Long Vowels . . . . . . . . . . . . . 55
4–10 Fricative Duration: Place and Voicing . . . . . . . . . . . . . . . . . . 57
4–11 Fricative Duration: Place and Voicing Interactions . . . . . . . . . . . 58
4–12 Fricative Duration: Vowel Context . . . . . . . . . . . . . . . . . . . . 59
4–13 Normalized Frication Noise: Place and Voicing . . . . . . . . . . . . . 60
4–14 Normalized Fricative Duration: Place and Voicing Interactions . . . . 61
4–15 Normalized Frication Noise: Vowel Context . . . . . . . . . . . . . . . 62
5–1 Spectral Peak Location: Place and Voicing . . . . . . . . . . . . . . . 66
5–2 Spectral Peak Location: Place × Voicing Interaction . . . . . . . . . . 67
5–3 Spectral Peak Location: Place × Vowels . . . . . . . . . . . . . . . . 68
5–4 Spectral Peak Location: Place × Short Vowel Interaction . . . . . . . 69
x
5–5 Spectral Peak Location: Place × Long Vowel Interaction . . . . . . . 70
5–6 Spectral Mean: Place and Voicing . . . . . . . . . . . . . . . . . . . . 75
5–7 Spectral Mean: Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5–8 Spectral Mean: Place × Voicing Interaction . . . . . . . . . . . . . . 77
5–9 Spectral Mean: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5–10 Spectral Variance: Place and Voicing . . . . . . . . . . . . . . . . . . 81
5–11 Spectral Variance: Place × Voicing Interaction . . . . . . . . . . . . . 82
5–12 Spectral Variance: Vowel . . . . . . . . . . . . . . . . . . . . . . . . 83
5–13 Spectral Skewness: Place and Voicing . . . . . . . . . . . . . . . . . . 85
5–14 Spectral Skewness: Voice . . . . . . . . . . . . . . . . . . . . . . . . . 87
5–15 Spectral Skewness: Place × Voicing Interaction . . . . . . . . . . . . 88
5–16 Spectral Skewness: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . 89
5–17 Spectral Kurtosis: Place and Voicing . . . . . . . . . . . . . . . . . . 91
5–18 Spectral Kurtosis: Voicing . . . . . . . . . . . . . . . . . . . . . . . . 93
5–19 Spectral Kurtosis: Place × Voice interaction . . . . . . . . . . . . . . 94
5–20 Spectral Kurtosis: Vowel . . . . . . . . . . . . . . . . . . . . . . . . . 95
6–1 Second Formant: Place × Voicing Interaction . . . . . . . . . . . . . 98
6–2 Second Formant: Vowel Context . . . . . . . . . . . . . . . . . . . . . 99
6–3 Locus Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7–1 Discrimination Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7–2 Discrimination Plane by Voicing . . . . . . . . . . . . . . . . . . . . . 110
xi
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
ACOUSTIC CHARACTERISTICSOF ARABIC FRICATIVES
By
Mohamed Ali Al-Khairy
August 2005
Chair: Ratree WaylandMajor Department: Linguistics
The acoustic characteristics of fricatives were investigated with the aim
of finding invariant cues that classify fricatives into their place of articulation.
However, such invariant cues are hard to recognize because of the long-noticed
problem of variability in the acoustic signal. Both intrinsic and extrinsic sources
of variability in the speech signal lead to a defective match between a signal
and its percept. Nevertheless, such invariance can be circumvented by using
appropriate analysis methods. The 13 fricatives of Modern Standard Arabic
(/f, T, D, DQ, s, sQ, z, S, X, K, è, Q, h/) were elicited from 8 male adult speakers
in 6 vowel contexts (/i, i:, a, a:, u, u:/). The acoustic cues investigated included
amplitude measurements (normalized and relative frication noise amplitude),
spectral measurements (spectral peak location and spectral moments), temporal
measurements (absolute and normalized frication noise duration), and formant
information at fricative-vowel transition (F2 at vowel onset and locus equation).
For the most part, fricatives in Arabic had patterns similar to those reported
for similar fricatives in other languages (e.g., English, Spanish, Portuguese) . A
discriminant function analysis showed that among all the cues investigated, spectral
xii
mean, skewness, second formant at vowel onset, normalized RMS amplitude,
relative amplitude, and spectral peak location were the variables contributing
the most to overall classification with a success rate of 83.2%. When voicing was
specified in the model, the correct classification rate increased to 92.9% for voiced
and 93.5% for voiceless fricatives.
xiii
CHAPTER 1INTRODUCTION
Since the early years of speech research, studies (using various models and
methods) have focused on finding the properties that distinguish among naturally
produced speech sound. Many such studies investigated the properties of the
acoustic signal through which sound is transmitted from speaker to hearer.
However, the task is complicated by the long-noticed problem of variability in
the acoustic signal resulting in a defective match between a signal and its percept
(Liberman, Cooper, Shankweiler, and Studdert-Kennedy 1967). The production
mechanism of speech sounds, particularly fricatives, involves intrinsic sources of
variability arising from changes in the shape of the vocal tract and the rate of air
flow (Strevens 1960; Tjaden and Turner 1997). Variability in the speech signal also
arises from extrinsic sources including speaker age (Pentz, Gilbert, and Zawadzki
1979), vocal tract size (Hughes and Halle 1956), speaking rate (Nittrouer 1995),
and linguistic context (Tabain 2001). Variability in speech also is often a result of a
combination of these factors.
Withstanding the variability found in the speech signal, numerous studies
(Stevens 1985; Behrens and Blumstein 1988a,b; Forrest, Weismer, Milenkovic, and
Dougall 1988; Sussman, McCaffrey, and Matthews 1991; Hedrick and Ohde 1993;
Jongman, Wayland, and Wong 2000; Abdelatty Ali, Van der Spiegel, and Mueller
2001; Nissen 2003) found invariant cues in the speech signal when the appropriate
analyses are carried out. Along this line of research, our study investigated the
defining properties of fricative sounds as produced in Modern Standard Arabic
(MSA).
1
2
We used Arabic fricatives for three equally important reasons. First, the
articulatory space of fricatives in Arabic spans across most of the places of
articulation in the vocal tract, starting from the lips and ending at the glottis.
Second, unlike most of the languages used in acoustic studies of fricatives,
Arabic has two unique features that serve a phonemic distinction: pharyngeal
co-articulation and segment length. Specifically, a phonemic distinction exists
between plain fricatives (/D/ and /s/) and their pharyngealized counterparts
/DQ/ and /sQ/ in Arabic. Furthermore, although governed by some phonological
distribution rules, consonant and vowel length in Arabic are phonemic. Third, most
studies on the acoustic characteristics of fricatives were conducted predominantly
with reference to English fricatives. Given the phonetic status of Arabic and
the gap in the literature due to the lack of Arabic-related research, our study is
theoretically and empirically important. Our findings will contribute generally
to the way fricative production is viewed and specifically to the way languages
differ in that respect. Further, such findings will aid speech synthesis and parsing
softwares related to the less-understood, yet important, Arabic language.
As mentioned, both consonant and vowel length are phonemic in Arabic.
However, to compare and contrast the performance of cues used in our study with
those reported in the literature for other languages, we examined only vowel–length
variations. The inventory of fricatives in Arabic is shown in Table 1–1. Arabic
has 11 fricatives, with only 4 pairs in voicing contrast. Also, for voiced dental
and voiceless alveolar fricatives, a pharyngealized counterpart also exists. The
voiced post-alveolar fricative /Z/ was excluded, since it was articulated in most
of the elicited data as an affricate /Ã/. Studies of Standard Arabic and Arabic
dialectology suggest that /Z/ is realized as either /Z, Ã, g/ or /j/ depending on the
geographical region in which Arabic is spoken (Kaye 1972).
3
Table 1–1. Place of articulation of Arabic fricatives
Labio-Dental Alveolar
Post-Uvular Pharyngeal Glottal
dental alveolarvoiceless f T s S X è hvoiced D z K Q
/D/ and /s/ have pharyngealized counterparts /DQ/ and /sQ/.
Both local (static) and global (dynamic) cues have been shown to participate
in the identification of (English) fricatives. Specifically, three main acoustic features
have been examined in research aimed to distinguish fricatives: the spectral
properties of the frication noise, the relation between the frequency characteristics
of frication noise versus the vowel, and duration of frication noise. Our study
aimed to describe the acoustic characteristics of Arabic fricatives using many of
the acoustic measurements used in other related studies with specific interest in
finding cues that differentiate between plain and pharyngealized fricatives. Our
study also aimed to see if phonemic differences in vowel length affect the acoustic
cues measured. Our data were elicited from 8 male adult speakers (mean age =
20) who had no history of hearing or speaking impairments and who had limited
experience with English as a second language.
Cues investigated in our study were amplitude measurements (normalized and
relative frication noise amplitude), spectral measurements (spectral peak location
and spectral moments), temporal measurements (absolute and normalized frication
noise duration), and formant information at fricative-vowel transition (F2 at vowel
onset and locus equation). Normalized amplitude is defined here as the ratio
between the average RMS amplitude (in dB) of three consecutive pitch periods
at the point of maximum vowel amplitude and the RMS amplitude of the entire
frication noise. Relative amplitude, on the other hand, is defined as the amplitude
of the frication noise relative to the vowel amplitude measured in certain frequency
regions. Spectral peak location relates the fricative place of articulation to the
4
frequency location of energy maximum in the frication noise. Spectral moments
analysis is a statistical approach that treats FFT spectra as a random probability
distribution from which the first four moments (mean, variance, skewness, and
kurtosis) are calculated. Spectral mean refers to the average energy concentration
and variance to its range. Skewness, on the other hand, is a measure of spectral tilt
that indicates the frequency of most energy concentration. Kurtosis is an indicator
of the distribution peakedness. Formant transitions were assessed using locus
equations that relate second formant frequency at vowel onset (F2onset) to that at
vowel midpoint (F2vowel).
Along with reporting how each of the acoustic measures mentioned above
differentiates between different places of fricatives articulation, we used a statistical
method (discriminant function analysis) to find the most parsimonious combination
of acoustic cues that distinguish among the different places of fricative articulation
and the contribution of each selected cue to the overall classification of fricatives
into their places of articulation.
CHAPTER 2LITERATURE REVIEW
2.1 Introduction
In this chapter we review relevant literature that deals with the acoustic
characteristics that have been shown to be effective in differentiating among
fricative place of articulation and voicing in the world’s languages. Given the
fact that certain fricatives that exist in Standard Arabic (e.g., pharyngealized vs.
non-pharyngealized) do not occur in other languages of the world, in this chapter,
we also discuss whether these acoustic cues will be effective in differentiating
acoustically among Standard Arabic fricatives.
2.2 Fricative Production
Fricative production is best described in terms of the source-filter theory of
speech production (Fant 1960). According to that theory, speech can be modeled
as a result of two independent components: a source signal (which could be the
glottal source, or noise generated at a compressed level in the vocal tract); and a
filter (reflecting the resonance in the cavities of the vocal tract downstream from
the glottis, or the constriction).
The basic mechanism for fricative production is that a turbulence forms in
the air flow at a point in the oral cavity. To generate such turbulence, a steady
air flow with velocity greater than a critical number1 passes through a narrow
constriction in the oral cavity and forms a jet that mixes with surrounding air in
1 This number is Reynold’s Number (Re) which is a dimensionless quantity thatrelates the constriction size to the volume velocity needed to produce turbulence inthe air. For speech Re > 1800 (Kent and Read 2002).
5
6
the vicinity of a constriction to generate eddies. These eddies, which are random
velocity fluctuations in the air flow, act as the source for frication noise (Stevens
1971). Depending on the nature of the constriction, frication noise can also be
generated at either an obstacle or a wall (Shadle 1990). According to Shadle,
obstacle source refers to fricatives in which sound is generated primarily at a rigid
body perpendicular to the air flow. An example is the production of voiceless
alveolar and voiceless post-alveolar fricatives (/s, S/): the upper and lower teeth,
respectively, act as the spoiler for the airflow. Such sources are characterized by
a maximum source amplitude for a given velocity. On the other hand, wall source
occurs when sound is generated primarily along a rigid body parallel to the air
flow. Spectrums of sounds generated by a wall source, like voiced and voiceless
velar fricatives (/x, G/), are characterized by a flat broad peak with less amplitude
than sounds of obstacle sources (Shadle 1990). Vibration of the vocal folds also
adds to the sources responsible for voiced fricative production.
Whatever the source, the resulting turbulence is then modified by the
resonance characteristics of the vocal tract (filter). The spectrum of the product
of such a filter represents the effect of transfer function of the vocal tract which
in turn depends on 1) the natural frequencies of the cavities anterior to the
constriction (poles), 2) the radiation characteristics of the sound leaving the mouth,
and 3) the resonant frequency of the posterior cavity (zeros). For fricatives, the
vocal tract is tightly constricted and hence the coupling between the front and back
cavities is small (Johnson 1997). Therefore, the transfer function of the vocal tract
for fricatives is largely dependent on the resonances of the front cavity. The nth
resonance can be calculated using Equation (2–1) where c is the speed of sound and
l is the length of the vocal tract. In case a strong coupling occurs between the front
and back cavities, such as when the “constriction is gradually tapered” (Kent and
Read 2002, p. 43), the resonances of the back cavity are calculated using Equation
7
(2–2). Resonances of the back and front cavities sharing the same frequency and
bandwidth cancel each other out.
fnfront =(2n− 1) c
4l(2–1)
fnback =(n) c
2l(2–2)
2.3 Acoustic Cues to Fricative Place of Articulation
Both local (static) and global (dynamic) cues have been shown to participate
with different degrees in the identification of (English) fricatives. The three main
acoustic cues that have been of most interest in the literature on fricatives are the
amplitude and spectral properties of the frication noise, the relationship between
the frequency characteristics of frication noise and those of the vowel, and the role
of duration of frication noise in distinguishing fricative place and voicing.
2.3.1 Amplitude Cues
2.3.1.1 Frication amplitude
Most studies of frication noise amplitude have focused on (English) voiceless
fricatives, and found similar results: sibilants (/s, z, S, Z/) have higher amplitude
than nonsibilants (/f, v, T, D/) with no differences within each class. This difference
in amplitude between sibilants and nonsibilants is predictable if one looks into the
aerodynamics of producing these fricatives. For example, to examine fricative
production mechanisms, Shadle (1985) used a mechanical model in which
constriction area, length, location can vary, and the presence or absence of an
obstacle can be manipulated. Based on results from spectra produced using such a
model, Shadle (1985) concluded that the lower teeth act as an obstacle at some 3
cm downstream from the noise source of sibilant constriction. Such configuration
results in an increase in turbulence of the airflow, which in turn causes an increase
in the sibilant amplitude. Nonsibilant fricatives, on the other hand, have no such
obstacle, resulting in very low energy levels. The difference between the sibilant
8
and nonsibilant fricatives with regard to frication amplitude was also found to have
auditory salience. McCasland (1979) studied the role of amplitude as a perceptual
cue to fricative place of articulation. He cross-spliced naturally spoken syllables
of English /f, T, s, S/ and /i/ such that the fricative part in /si/ and /Si/ was
cross-spliced to the vocalic part of both /fi/ and /Ti/. The overall amplitude of the
spliced-in frication noise was attuned to the same level of intensity as that of the
original nonsibilant fricative by reducing /s, S/ amplitude to that of /f/ and /T/.
The resulting fricative-vowel syllables sounded like /fi/ and /Ti/ when the vocalic
part of the utterance was coming from an original /fi, Ti/, respectively. These
findings led McCasland to conclude that the low amplitude of nonsibilant fricatives
was used as a perceptual cue to distinguish them from the sibilants /s, S/. However,
because of the cross-splicing method used, it is not clear whether the results can
be attributed solely to the reduction of /s, S/ amplitude. In fact, Behrens and
Blumstein (1988a) pointed out that the results of McCasland’s method are not
conclusive since the method involves mismatching information from frication noise
and vocalic transition. Specifically, it is not clear whether listeners were using the
reduced noise amplitude of sibilants as a cue for nonsibilants, or they were using
transitional information in the original vocalic part of the nonsibilant to judge the
token to be /f, T/. Listeners might be using either one of those cues, or both; and
there was no way of telling which, using the cross-splicing methodology.
One way to remedy the shortcomings of the cross-splicing method is to use
synthetic speech. Gurlekian (1981) used synthetic /sa, fa/ syllables in which the
frequency and the amplitude of the vowel were kept constant in order to test
whether the distinction between sibilant and nonsibilant fricatives could be based
solely on differences in their noise amplitude. For fricatives, the center frequency of
the noise was kept fixed at 4500 Hz, while its amplitude was manipulated to vary
relative to the fixed vowel amplitude. The central frequency used was similar to the
9
range at which /s/ was correctly identified 90% of the time by Argentine Spanish
listeners (Manrique and Massone 1979), and within the range described for English
/s/ (Heinz and Stevens 1961). An identification test with 6 Argentine Spanish and
6 English listeners showed that both groups assigned a /fa/ percept to the tokens
with low noise amplitude and a /sa/ percept to those with high noise amplitude.
Also, Behrens and Blumstein (1988a) investigated the role of fricative noise
amplitude in distinguishing place of articulation among fricatives. Basically,
Behrens and Blumstein altered the amplitude of the frication part of CV syllables,
with the C being one of /f, T, s, S/, while preserving the vocalic part of the
utterance. This matching was done by raising the noise amplitude of /f, T/ to
that of /s, S/ and conversely, lowering the noise amplitude of /s, S/ to that of
/f, T/ without substituting or changing the vocalic part of the utterance. They
found, contrary to previous studies, that the overall amplitude of the fricative noise
relative to the amplitude of the following vowel does not constitute the primary cue
for sibilant/nonsibilants distinction. Therefore, Behrens and Blumstein called for
an integration of spectral properties and amplitude characteristics of fricatives in
order to successfully discriminate among their places of articulation.
Another way to capture classification information found in frication noise
amplitude is to measure the Root-Mean-Square (RMS) amplitude of the fricative
noise normalized relative to the vowel. Jongman et al. (2000) used this method
in their large-scale study of English fricatives. Among the many measures used to
characterize fricatives, Jongman et al. measured the difference between the average
RMS amplitude (in dB) of three consecutive pitch periods at the point of maximum
vowel amplitude and the RMS amplitude of the entire frication noise. Results were
derived from 20 native speakers of American English (10 females and 10 males).
The speakers produced all 8 English fricatives in the onset of CVC syllables with
the rhyme consisting of each of six vowels /i, e, æ, A, o, u/ and /p/. The authors
10
found that this “normalized RMS amplitude” can differentiate among all four
places of fricatives in English with voiced fricatives having a smaller amplitude
than their voiceless counterparts.
The integration of fricative and vowel amplitude as a way of normalization
was also used for automatic recognition of continuous speech. Abdelatty Ali et al.
(2001) used Maximum Normalized Spectral Slope (MNSS), which relates the
spectral slope of the frication noise spectrum to the maximum total energy in the
utterance, thus capturing the spectral shape of the fricative and its amplitude in
addition to the vowel amplitude features in one quantity. It differs, however, from
Jongman and colleagues’ normalized amplitude in two ways: first it uses peak
amplitude instead of RMS amplitude for the vowel and the fricative; and second, it
uses only the strongest peak of the fricative (as opposed to whole frication noise)
and normalizes that in relation to the strongest peak of the vowel (as opposed
to the average of the strongest three pitch periods). For MNSS, a statistically
determined threshold (0.01 for voiced and 0.02 for voiceless fricatives) is used
to classify the fricative as nonsibilant if MSNN falls below the threshold, and as
sibilant if it is above it. Using such criteria, Abdelatty Ali et al. obtained a 94%
recognition accuracy of sibilant vs. nonsibilants fricatives. No further information
was given on using MSNN to classify fricatives within these classes.
2.3.1.2 Relative amplitude
Since amplitude cues from the frication noise and spectral cues of the vocalic
part in a syllable depend on each other (Behrens and Blumstein 1988a; Jongman
et al. 2000); changes in amplitude might carry more perceptual weight if the
frequency range over which such changes occur is taken into consideration. Such
integration was presented by Stevens and Blumstein (1981) as an invariant
property of speech production. They demonstrated theoretically that different
amplitude changes that occur at the consonant-vowel boundary in certain frequency
11
ranges are related to articulatory mechanisms associated with certain places in the
vocal tract. Therefore, listeners might be using these relational values as a cue for
the place of a consonant production. To test this claim, Stevens (1985) synthesized
sibilant/nonsibilant and anterior/nonanterior continua such that the frication noise
amplitude at certain frequency ranges on the continuum was gradually changed
from one stimuli to the other. Listeners’ judgments abruptly shift from /T/ to
/s/ when the amplitude of frication noise in the fifth and sixth formant frequency
regions (F5 & F6 ) is increased relative to the amplitude in the same frequency
regions at vowel onset. On the other hand, listeners identified the consonant to be
/s/ rather than /S/ when the frication noise amplitude at the F3 region, relative
to F3 amplitude of the vowel, rises at the transition and as /S/ if it falls. These
findings led Stevens to hypothesize that the vowel is used as an “anchor against
which the spectrum of the fricative noise is judged or evaluated” (Stevens 1985, p.
249).
Other researchers tried to test the robustness of this feature in different
contexts. Hedrick and Ohde (1993) looked into the effect of frication duration
and vowel context on the relative amplitude and whether such changes would
affect perception of fricative place of articulation. This was done by varying the
amplitude of the fricative relative to vowel onset amplitude at F3 and F5 for the
contrast /s/-/S/ and /s/-/T/ respectively. Frication duration and vowel context
also varied. Ten adult listeners with no history of speech or hearing disorders who
successfully perceived (with 70% accuracy) the end points of /s - S/ and /s - T/
continua were asked to identify each stimulus as one member of the contrastive
pairs above. In the /s/-/S/ contrast, listeners chose more /s/ responses when
presented with lower relative amplitude and more /S/’s when presented with higher
relative amplitude. These findings held constant across the different vowel and
duration conditions and were in agreement with those obtained by Stevens (1985).
12
Furthermore, the additional post-fricative vowel contexts in Hedrick and Ohde’s
study influenced only the magnitude of the relative amplitude effect for a given
contrast. Hedrick and Ohde claim that relative amplitude is used as a primary
invariant cue since listeners used relative amplitude information more effectively
than the context-dependent formant transitions. To further test this assumption,
Hedrick and Ohde (1993) also varied along a continuum the appropriate formant
transitions of the contrasts presented above while keeping the relative amplitude
fixed across all stimuli. The hypothesis was that if relative amplitude was indeed
a primary cue, then variation in formant transition would not affect identification
of members of the contrasting pair. Their findings indicate that for the /s/-/S/
contrast, formant transition did affect the identification of at least the end points of
the continua. For the /s/-/T/ contrast, formant transitions had a negligible effect
on the identification of the two fricatives even at boundary points.
Taken together, all these findings indicate that relative amplitude is part of
a primary cue to fricative place of articulation. Such a role becomes more salient
when the contrast involves sibilant vs. nonsibilant fricatives. Additionally, Hedrick
and Ohde (1993) findings also suggest that formant transitions do influence the
perception of fricative place of articulation, at least among sibilants.
However, a trading relationship seems to exist between the use of the two
cues in the presence of factors obstructing an effective use of a given cue. Hedrick
(1997) found that listeners with sensorineural hearing loss relied less on formant
transition information than on relative amplitude in discriminating between English
/s/ and /f/. On the other hand, listeners with normal hearing showed the opposite
preference. This was the case even when the formant transition information was
presented at a level audible to listeners with sensorineural hearing loss.
So far, relative amplitude has been shown only to differentiate between
sibilants and nonsibilants as a class, with the exception of Jongman et al. (2000)
13
study, in which they found that relative amplitude, as defined by Hedrick and Ohde
(1993), also differentiates among all four places of fricatives articulation in English.
2.3.2 Duration Cues
Fricative duration measures were used in previous research mainly to
differentiate between sibilants and nonsibilants, and to assess the voicing of
fricatives. One such study was conducted by Behrens and Blumstein (1988b)
who recorded three native speakers of English producing each of the 4 English
voiceless fricatives /f, T, s, S/ followed by one of the five vowels /i, e, a, o, u/. They
found that sibilants /s, S/ were longer than nonsibilants /f, T/ with an average
difference of 33 ms. Also, they found no significant differences between the duration
of members of the same class. The vowel effect was found to be minimal and
only among the nonsibilant fricatives. Similar results were obtained by Pirello,
Blumstein, and Kurowski (1997). The researchers also found that alveolar fricatives
were longer on average than labiodental fricatives in English.
Jongman (1989) questioned the importance of frication noise duration as a cue
for fricative identification. He found that listeners can identify fricatives based on a
fraction of its frication noise duration. In a perception test, listeners only needed as
little as 50-ms of the initial frication noise of a naturally produced fricative-vowel
syllable to successfully classify fricatives. Although cues like amplitude or spectral
properties localized at the initial parts of the frication noise may have been used
here, it is important to note that such results undermine the significance of an
absolute duration value in classifying fricatives. Temporal features of speech can
vary as a function of speaking rate. In fact, when frication noise duration was
normalized by taking the ratio of fricative duration over word duration, Jongman
et al. (2000) found a significant difference among all places of fricative articulation
with the exception of the labiodental and interdental contrast.
14
Frication noise duration has also been used to assess the voicing distinction
between fricatives of the same place of articulation. Cole and Cooper (1975)
examined the role of frication noise duration on the perception of voicing in
fricatives. They found that decreasing the length of frication noise of voiceless
fricative in syllable-initial position resulted in a shift in their perception toward
their voiced counterparts. They noted also that in syllable-final position, duration
of the frication noise relative to that of the preceding vowel becomes the cue for
fricative voicing (voiced fricatives being shorter than voiceless). Similar findings
were also obtained by Manrique and Massone (1981) for Spanish fricatives /B, f,
D, s, S, Z, x, G/ in three conditions: isolated, in CV syllables, and CVCV words.
Noise duration was significantly shorter for voiced fricatives than for voiceless
fricatives in all three conditions. However, of these fricatives, only /S, Z/ and
/x, G/ are homorganic; while the other two pairs do not share the same place
of articulation (Baum and Blumstein 1987). Therefore, the reported temporal
differences in Manrique and Massone’s study might have been due to factors other
than fricative voicing since, as mentioned previously, durational differences existed
between fricatives sharing the same voicing but belonging to different places of
articulation (Behrens and Blumstein 1988b). Nevertheless, Baum and Blumstein’s
own experiments showed that syllable-initial voiceless English fricatives in citation
forms are longer than their voiced counterparts. However, they noted considerable
overlap in duration distributions of voiced and voiceless fricatives at all places
studied.
Using connected speech, Crystal and House (1988) also found that, on average,
voiceless fricatives in word-initial position are longer than voiced fricatives. Like
Baum and Blumstein’s results, there was a considerable amount of overlap between
the duration distributions of the voiced and voiceless fricatives in connected speech.
Again, the use of duration per se as the sole cue for fricative voicing was questioned
15
by Jongman (1989) who found that identification of fricatives voicing was accurate
(83%) even if only 20 ms of frication noise is used. However, Jongman et al. (2000)
used a relative measure of duration to quantify its use as a cue for fricative voicing.
Normalized fricative noise duration (defined as the ratio of fricative duration over
that of the carrier word) significantly longer for voiceless than for voiced fricatives.
They also found that such differences are more apparent in nonsibilant than in
sibilant fricatives.
2.3.3 Spectral Cues
In addition to amplitude and duration, spectral properties of the frication
noise have been investigated to find cues that identify fricative place of articulation.
Among the spectral properties previously studied are spectral peak location and
spectral moments measurements.
2.3.3.1 Spectral peak location
One of the early attempts to relate the fricative place of articulation to the
frequency location of energy maximum in the frication noise was the study by
Hughes and Halle (1956). In this study, gated 50 ms windows of the frication noise
were used to produce spectra of English fricatives /f, v, s, z, S, Z/. An investigation
of the fricative spectra revealed that for some speakers a strong energy component
was located at the frequency region below 700 Hz for the spectrum of voiced
fricatives. Such energy concentration was absent at the same region for voiceless
fricatives. However, these findings were not consistent among all speakers. Based
on this inconsistency, in addition to the similarities found between the spectra
of homorganic voiced and voiceless fricatives above 1 kHz, Hughes and Halle
ruled out the use of spectral prominence as a basis for voicing distinction among
fricatives. On the other hand, the distinction of place was found to be related,
to a certain extent, to the location of the most prominent spectral peak. Hughes
and Halle found that /f, v/ had a relatively flat spectrum below 10 kHz, whereas
16
spectral prominence was observed for /S, Z/ at the region of 2-4 kHz, and for /s,
z/ at the region above 4 kHz. Also, they found that the exact location of the
peak for each fricative was lower for males and higher for females. Based on these
observations, Hughes and Halle concluded that the size and shape of the resonance
chamber in front of the fricative’s point of constriction determine the place of
energy maximum in frication noise spectra. Specifically, they reported that the
length of the vocal tract from the point of constriction to the lips was inversely
related to the frequency of the peak in the spectrum. Thus, the spectral peak
increases as the point of articulation becomes closer to the lips. Such observations
are consistent with predictions made by the the source-filter theory of speech
production presented in section 2.2.
Strevens (1960) also looked into the use of spectral prominence to differentiate
between fricatives through examining the front (/F, f, T/), mid (/s, S, ç/) and back
(/x, X, h/) voiceless fricatives as produced by subjects with professional training in
phonetics. Based on average line spectra, Strevens found that the front fricatives
were characterized by unpatterned low intensity and smooth spectra, the mid
fricatives by high intensity with significant peaks on the spectra around 3.5 kHz
and the back fricatives by medium intensity and a marked formant like structure
with peaks around 1.5 kHz.
The results reported above for front and mid fricatives were also shown to
be perceptually valid (Heinz and Stevens 1961). Using a synthesized continuum
of white noise with spectral peaks in ranges representative of those found in /S, ç,
s, f, T/, Heinz and Stevens found that participants were consistently shifting the
identification of the fricative from /S/ to /ç/ to /s/ to /f, T/ as the peak of the
resonance frequency increased, with no distinction that could be made between /f,
T/.
17
Similar properties were also found for fricatives in Spanish. In their study of
Spanish fricatives, Manrique and Massone (1981) found that /s/, /f/ and /T/ have
spectral peak values comparable to the English fricatives as reported by Hughes
and Halle (1956). Furthermore, they reported finding that spectral energy in /x/
is concentrated in a low narrow frequency band continuous with the F2 of the
following vowel and that /ç/ spectral frequency is concentrated at a low band
continuous with F3 of the following vowel. Manrique and Massone (1981) also
examined the identification of a subset of Spanish fricatives to see whether changes
in spectral peak location would change the way fricatives are perceived by Spanish
speakers. They synthesized 9 cascade stimuli of the middle 500 ms of each of a
deliberately lengthened /f, s, S, x/ using a set of low- and high-pass filters so that
only certain spectral zones were present for each stimuli. The unfiltered fricatives
had recognition scores ranging from 95% for /f/ and /s/, to 100% for /S/ and /x/.
For the filtered fricatives, they found that the spectral peak location carries the
perceptual load for the identification of /s/, /S/, and /x/. However, the diffused
spectrum of /f/ was believed to be the characterizing factor of its identifiability.
Other studies of English fricatives confirmed that spectral peak location
can classify sibilants from nonsibilants as a class, and only between sibilants.
For example, Behrens and Blumstein (1988b) found that for English voiceless
fricatives, major spectral peaks in ranges within 3.5-5 kHz were apparent for /s/
and within 2.5-3.5 kHz for /S/. On the other hand /f/ and /T/ appeared flat with
a diffused spread of energy from 1.8-8.5 kHz with a good deal of variability in their
spectral shape. The same pattern was also observed across age groups. Pentz et al.
(1979), for example, compared the spectral properties of English fricatives (/f,
v, s, z, S, Z/) produced by preadolescent children to that reported for adults. As
reported for adults elsewhere, they found the same pattern of energy localization
and constriction point. However, the values obtained from children in their study
18
were higher than those obtained for male and female adult speakers in the studies
mentioned above. This difference was attributed in large part to the differences
in vocal tract lengths. Male adult speakers have the longest vocal tract and the
lowest vocal tract resonance, while children have the shortest vocal tract and the
highest vocal tract resonance; female adult speakers fall between the two groups. In
another study, Nissen (2003) investigated, among other metrics, the spectral peak
location of voiceless English obstruents as produced by male and female speakers
of four different age groups. For the fricatives in the study, he found that “the
spectral peak decreased as a function of increased speaker age” (Nissen 2003, p.
139). Beside being age and gender dependent, spectral peak location has also been
found to be vowel dependent (Mann and Repp 1980; Soli 1981) and highly variable
for speakers with neuromotor dysfunction (Chen and Steven 2001) due to their lack
of control over articulatory muscles.
However, in contrast to all the studies mentioned above, Jongman et al.
(2000) found that across all (male and female) speakers and vowel contexts, all
four places of fricative articulation in English were significantly different from
each other in terms of spectral peak location. Further, they found spectral peak
location to reliably differentiate between /T/ and /D/ and between /f/ and /v/.
The researchers justified the use of the larger analysis window they adopted in their
study, as compared to other studies, as a way to obtain better resolution in the
frequency domain at the expense of temporal domain resolution. They argue that
such a compromise is advantageous due to the stationary nature of frication noise.
In summary, spectral peak location for the fricatives increases as the
constriction becomes closer to the open end of the vocal tract. Also, spectral peak
for back fricatives shows a formant-like structure similar to the following vowel.
Both of these generalizations can be accounted for by the source-filter theory of
speech production. Fricatives are characterized by turbulent airflow through a
19
narrow constriction in the oral cavity, with the portion of the vocal tract in the
front of the constriction effectively becoming the resonating chamber. For long
and narrow constrictions, like fricatives, the acoustic theory of speech production
predicts that the only present resonance components in the spectrum are those
related to the area in front of the constriction due to lack of acoustic coupling
from the cavity behind the constriction (Heinz and Stevens 1961). The size of the
resonating cavity, therefore, can be inversely correlated with the frequency of the
most prominent peak in the spectrum (Hughes and Halle 1956). As a result of this
correlation, fricatives produced at or behind the alveolar region are characterized
by a well-defined spectrum with peaks around 2.5-3.5 kHz for /S, Z/ and at 3.5-5
kHz for /s, z/. However, due to the very small area in front of the constriction,
fricatives produced at the labial or labiodental area are characterized with a
flat spectrum and a diffused spread of energy between 1.5 and 8.5 kHz. Since
nonsibilant production creates a cavity in close proximity to the open end of the
vocal tract, different degrees of lip rounding (Shadle, Mair, and Carter 1996), and
the additional turbulence produced by the air stream hitting the teeth (Strevens
1960; Behrens and Blumstein 1988a) will introduce a great amount of variability
in the location of the energy concentration. On the other hand, sibilants usually
have a clearly defined spectral peak location. However, for speakers with limited
precision over the placement of the constriction (Chen and Steven 2001), such
variability also exists for sibilants.
2.3.3.2 Spectral moments
Spectral moments analysis is another metric that has been used for fricative
identification. Unlike spectral peak location analysis, this statistical approach
captures both local (mean frequency and variance) and global (skewness and
kurtosis) aspects of fricative spectra. Spectral mean refers to the average energy
concentration and variance to its range. Skewness, on the other hand, is a measure
20
of spectral tilt that indicate the frequency of the most energy concentration.
Skewness with a positive value indicates a negative spectral tilt with energy
concentration at the lower frequencies, while negative skewness is an indication of
positive tilt with energy concentration at higher frequencies (Jongman et al. 2000).
Kurtosis is an indicator of the distribution’s peakedness.
One of the early applications of spectral moments to classify speech sounds
was the study by Forrest et al. (1988) on English obstruents. For the fricatives
in that study, Forrest et al. generated a series of Fast Fourier Transforms (FFT)
using a 20 ms analysis window with a step-size of 10 ms that started at the
obstruent onset through three pitch periods into the vowel. The FFT-generated
spectra were then treated as a random probability distribution from which the
first four moments (mean, variance, skewness, and kurtosis) were calculated.
The spectral moments obtained from both linear and Bark scales were entered
into a discriminant function analysis in an attempt to classify voiceless fricatives
according to their place of articulation. Classification scores, on both scales, were
good for the sibilants /s/ and /S/ with 85% and 95% respectively. The nonsibilants,
on the other hand, were not as accurately classified using any moment on either of
the two scales (58% for /T/ and 75% for /f/). Subsequent implementations of the
spectral moment analysis tried to extend or replicate Forrest et al. approach with
some modifications. The study by Tomiak (1990) of English voiceless fricatives,
for example, used a different analysis window (100 ms) at different locations of
the English voiceless frication noise. Like in previous research, spectral moments
were successful in classifying sibilants and /h/ data. In the case of nonsibilants, it
was found that the most useful spectral information is contained in the transition
portion of the frication. Additionally, in contrast to Forrest et al., Tomiak found an
advantage for the linearly derived moment profiles over the Bark-scaled ones.
21
Spectral moments were also used by Shadle et al. (1996) to classify voiced
and voiceless English fricatives. The study involved spectral moments measured
from discrete Fourier transform (DFT) analyses performed at different locations
within the frication noise and at different frequency ranges. They found that
spectral moments provided some information about fricative production but did not
discriminate reliably between their different places of articulation. Furthermore,
their results indicated that spectral moments are sensitive to the frequency range
of the analysis. However, the moments were not sensitive to the analysis position
within the fricative. Similar results were also obtained for children (Nittrouer,
Stiddert-Kennedy, and McGowan 1989; Nittrouer 1995). The use of spectral
moments as a tool to distinguish between /s/ and /S/ was also extended to atypical
speech and found to be reliable. Tjaden and Turner (1997), for example, compared
spectral moments obtained from speakers with amyotrophic lateral sclerosis (ALS)
and healthy controls matched for age and gender and found that the first moment
was significantly lower for the ALS group. Tjaden and Turner suggested that the
low means values found among ASL speakers can be attributed to difficulties they
face at making the appropriate degree of constriction required to produce frication,
or to a weaker subglottal sound source due to weak respiratory muscles that are
common with ASL speakers.
The studies mentioned so far demonstrate the ability of spectral moments
to distinguish sibilants from nonsibilants as a class and that they can reliably
distinguish only among sibilants. However, contrary to the studies mentioned
above, Jongman et al. (2000) found that spectral moments were successful in
capturing the differences between all four places of fricative articulation in English.
Jongman et al. study, however, differs from other studies in that it calculated
moments from a 40 ms FFT analysis window placed at four different places in
the frication noise (onset, mid, end, and transition into vowel) and that it uses a
22
larger and more representative number of speakers and tokens (2880 tokens from
20 speakers) as compared to a smaller population in other studies. Across moments
and window locations, variance and skewness at onset and transition were found
to be the most robust classifiers of all four places. Also, on average, variance was
shown to effectively distinguish between voiced and voiceless fricatives with the
former having greater variance.
2.3.4 Formant Transition Cues
2.3.4.1 Second formant at transition
Early research on formant transition focused on perceptual usefulness of such
information in classifying speech sounds. For example, Harris (1958) recorded the
English fricatives /f, v, T, D, s, z, S, Z/ followed by one of each of the vowels /i,
e, o, u/. Then she spliced and recombined vocalic and frication partitions of all
CV combinations. Listeners correctly identified sibilant fricatives regardless of
the source of the cross-spliced vocalic part. Frication noise alone was sufficient for
correct identification of sibilant fricatives. On the other hand, among nonsibilant
fricatives, a correct identification as /f, v/ occurred only when the vocalic part was
matching (i.e. coming from a /f, v/ syllable), and as /T, D/ with mismatching
vocalic parts. Based on these identification patterns, Harris suggested that
the perception of fricatives occurs at two consecutive stages. In the first stage,
cues from frication noise alone determine whether the fricative is a sibilant or
nonsibilant. If sibilant is the determined class, then cues from the frication
noise alone will differentiate among the sibilant fricatives. However, if the class
is determined to be nonsibilant at the first stage, then the formant transition
information is used for the within-class classification. As was the case with cross-
splicing methods previously mentioned (section 2.3.1.1), this method also does not
eliminate the possibility of dynamic coarticulatory information from being colored
into the precut vowel and/or fricative. It is not clear, therefore, that the results
23
obtained can be attributed solely to the mismatching vocalic part of the cross-
spliced signal. To overcome this problem, Heinz and Stevens (1961) synthesized
stimuli consisting of white noise of varying frequency peaks, similar to peaks found
in English fricatives, followed by four synthetic formant transition values. Listeners
were instructed to label these stimuli as one of the four voiceless English fricatives
/f, T, s, S/. Based on identification scores, the researchers concluded that /f/ is
distinguished from /T/ on the basis of the F2 transition in the following vowel.
There was no apparent effect of formant transition on the distinction between /s/
and /S/. These findings support those of Harris (1958), while using more controlled
stimuli.
The role of formant transition, however, was not found to be as crucial in other
studies. LaRiviere, Winitz, and Herriman (1975) used the fricative noise in its
entirety in a perceptual test and obtained high recognition scores for /s, S/, lower
scores for /f/ and poor scores for /T/. More importantly, when vocalic information
was included for the /f, T/ tokens, no significant increase in their recognition was
obtained. Other studies (Manrique and Massone 1981; Jongman 1989) also found
similar results using different methods.
The perceptual experiments thus far mentioned used a forced-choice technique
that might have biased participants’ responses. For that reason Manrique and
Massone (1981) used a tape splicing paradigm to study the effect of formant
transition on the perception of Spanish fricatives by Spanish listeners. They
constructed their stimuli by splicing CV syllables into their respective frication
and vowel parts. Listeners were asked to choose the fricative when presented with
the frication noise alone and to freely guess the sound that preceded the vowel
when presented with the vocalic part. In the latter case, most token were judged
(85% of the responses) to have been preceded with a stop sharing the same place
of articulation as the spliced fricative. Spanish fricatives with no stops sharing
24
the same place of articulation were perceived as /t/, with the exception of /f/
which was perceived as /p/ 50% of the times. The same listeners were able to
identify the fricative accurately from only the frication part in all cases except
for /x/ and /G/. However, another study found that formant transition was not
crucial for correct identification of fricatives (Jongman 1989). Based only on the
frication noise part of fricative-vowel syllables, Jongman (1989) achieved correct
(92%) fricative identification in a perceptual experiment of English fricatives. More
importantly, there was no significant increase in identification accuracy when the
entire fricative-vowel syllable was presented.
As with results obtained from synthetic speech, measures of formant transition
from naturally produced fricatives are also conflicting. Wilde and Huang (1991), for
example, measured the F2 at the vowel onset for fricatives of only one male speaker
and found that the F2 value did not differentiate systematically between /f/ and
/T/. However, in another study, Wilde (1993) found that transitional information
as measured by F2 value at the fricative-vowel boundary can be used to identify
fricative place of articulation. The measurement she obtained from two speakers
showed that as the place of constriction moves back in the vocal tract, the value of
F2 systematically increases and its range becomes smaller.
2.3.4.2 Locus equations
Locus equations provide a method to quantify the role of formant transition
in the identification of fricative place of articulation by relating second formant
frequency at vowel onset (F2onset) to that at vowel midpoint (F2vowel). Locus
equations are straight line regression fits to data points formed by plotting onsets
of F2 transitions along the y axis and their corresponding vowel nuclei F2 along
the x axis in order to obtain the value of the slope and y-intercept. This metric
has been used primarily to classify English stops (Lindblom 1963; Sussman et al.
1991). It was only recently that this measure was applied to fricatives. Fowler
25
(1994) investigated the use of locus equations as cues to place of articulation across
different manners of articulation including the fricatives /v, D, z, Z/ as spoken
by five males and five females speakers of English. In this study, Fowler found
that locus equations (in terms of slope and y-intercept) of a homorganic stop and
fricative were significantly different, while those of a stop and a fricative of different
place of articulation were significantly similar. Nevertheless, locus equations were
able to differentiate between members that share the same manner of articulation.
Slopes for fricatives /v, D, z, Z/, for example, were significantly different (slopes
of 0.73, 0.50, 0.42, and 0.34 respectively). In another study, Sussman (1994)
investigated the use of locus equations to classify consonants across manners of
articulation (approximants, fricatives, and nasals). In contrast to Fowler (1994),
he found that fricatives were not distinguishable based on the slope of their locus
equations. Only /v/ had a distinctive slope.
Results of other studies of English fricatives were similar to those of Sussman
(1994). For example, in their large-scale study of English fricatives, Jongman et al.
(2000) calculated the slope and y-intercept for all English fricatives in six vowel
environments. Specifically, Jongman and colleagues measured F2onset and F2vowel
from a 23.3 ms full Hamming window placed at the onset and midpoint of the
vowel respectively. This was the same method used by the previously mentioned
studies. Similar to Sussman (1994), Jongman et al. (2000) found that only the
slope value for /f, v/ was significantly different and that the y-intercept were
distinct only for /f, v/ and /S, Z/. Locus equations are particularly of interest
here since they have been shown to work across languages (Sussman, Hoemeke,
and Ahmed 1993), gender (Sussman et al. 1991), speaking style (Krull 1989), and
speaking rate (Sussman, Fruchter, Hilbert, and Sirosh 1998).
26
2.4 Studies of Arabic Fricatives
The use of acoustic cues to distinguish between the different fricatives in
Arabic has been underinvestigated in the literature. Furthermore, the very few
studies dealing with acoustic characteristic of Arabic fricatives (see below) have
been predominantly concerned with a single acoustic feature and not with the
way multiple cues can be integrated in order to distinguish among the fricative
place of articulation. While some of the cues mentioned above seem to distinguish
with a relatively good accuracy between English fricatives, the same cues when
used to classify Arabic fricatives need to take into account acoustic characteristics
particular to Arabic. For example, unlike English, Arabic utilizes durational
differences of both vowels and consonants for phonemic distinctions. It is of
interest, therefore, to see how such durational property would affect voicing and
place classification of Arabic fricatives. Another interesting feature of Arabic is the
existence of co-articulated (pharyngealized) fricatives that are phonemically distinct
from their plain counterparts. Due to their double articulation mechanism, it is
expected in our study that pharyngealized fricatives will have two patterns of peaks
emerging at the middle and near the end of frication. Therefore, it seems necessary
to use a second analysis window at the end of frication noise such that its right
shoulder is aligned with the end of frication noise. Additionally, the two window
locations are suggested because studies of spectral peak location have demonstrated
that high frequency peaks are more likely to emerge at the middle and end of
frication noise (Behrens and Blumstein 1988b). Also, the frequency of the most
prominent peak for the pharyngealized fricatives is expected to be lower than their
plain counterparts because of acoustic coupling resulting from co-articulation.
Spectral moments seem to be another promising technique in classifying
Arabic fricatives if the proper size and location of the analysis windows are used.
In fact, in a study of fricatives in Cairo Arabic, Norlin (1983) found that /s,
27
sQ, z, zQ/ are characterized by a sharp peak in higher frequencies, and that the
peak of /sQ, zQ/ are broader than /s, z/. Norlin used Center of Gravity (COG)
and dispersion as ways of quantifying the location of the peak and the spread of
the dispersion respectively. Therefore, it seems that a combination of spectral
mean and variance along with skewness measures would differentiate between
pharyngealized and plain fricatives.
The use of formant transition information was investigated in the literature
in relation to the fricatives articulated at the back of the oral cavity. For example,
El-Halees (1985) found that the F1 value at the transition differentiates between
uvular and pharyngeal fricatives with the former being lower. Also, he found
that listeners can differentiate between the two classes based only on this single
feature. The perceptual salience of F1onset was also demonstrated by Alwan (1989),
who used synthetic speech to test the discrimination between voiced pharyngeal
fricative /Q/ and voiced uvular fricative /X/. She found that the higher F1onset
for the pharyngeal was essential to make the distinction, while F2onset was not.
The relation between back articulation and high F1 was also attested for vowels
following such sounds. Zawaydeh (1997) found that F1 at the middle of the vowel
was raised when preceded by one of the gutturals /sQ, è/ or the glottal /h/ as
compared to non-gutturals.
In addition to first and second formant at transition, locus equations were
also used as a classification metric for Arabic. The first attempt was part of a
cross-linguistic study of locus equations as a cue for stops place of articulation.
Sussman et al. (1993) recorded the voiced stops /b, d, dQ, g/ as produced by
three speakers of the Cairene dialect of Arabic. They found that both slope and
y-intercept for almost all comparisons were significantly different except for the
slope of /d/ and /dQ/, and the y-intercept for /b/ and /g/. The second study
was conducted by Yeou (1997) who elicited both stops and fricatives from nine
28
Moroccan subjects. Yeou found that y-intercept and slope distinguished between
most fricative comparisons. However, neither slope nor y-intercept distinguished
/S/ from /è/ or /f/ from /X/. More importantly, locus equation slopes were able
to group pharyngealized (/DQ, sQ/) together as a distinct group differing from
their non-pharyngealized counterparts and other fricatives with distinctly low
y-intercepts and flat slopes. Yeou argued that unlike their plain counterparts,
pharyngealized fricatives resist the articulatory effects of the following vowel due
to their double articulation. Instead they induce their coarticulatory effect on
the following vowel by raising its F1 and lowering its F2. This change in F2, as
compared to plain fricatives, causes the slope to be flatter and the intercept to be
lower.
To summarize, several acoustic cues related to spectral, temporal and
amplitude information found in the speech signal were used in different languages
to classify fricatives into their places of articulation. Such cues, alone and
collectively, served to distinguish between different places/classes of fricatives
in English. Howeve, the use of these cues to classify Arabic fricatives has not
received much attention. In our study we attempt to examine how each of the
spectral, temporal and amplitude characteristics mentioned in Sections (2.3)
would serve alone and collectively to distinguish between place of articulation of
Arabic fricatives. Additionally, of particular importance to our study is to see if
the acoustic cues found to be effective in fricative classification in other languages
will be affected by the vowel length differences present in Arabic; and if such cues
would distinguish between plain and pharyngealized fricatives. In the following
chapter, we will discuss how such cues are investigated and the modifications
implemented in the measurements techniques if any.
CHAPTER 3METHODOLOGY
Several spectral, amplitude, and temporal measurements have been used
in previous research to describe the acoustic cues that characterize fricatives in
different languages. The current study investigated Arabic fricatives to find such
acoustic cues. This chapter describes the way in which the speech samples were
elicited, recorded and analyzed. For most of the acoustic analyses, this research
followed the procedures commonly used to study fricatives in English as illustrated
in Jongman et al. (2000). Certain modifications were applied to further investigate
characteristics particular to Arabic. All coding and data analysis was carried
out using the PRAAT software (Boersma and Weenink 2004) and a set of scripts
developed at the phonetics lab of the University of Florida by the author.
3.1 Data Collection
3.1.1 Participants
A group of eight adult male speakers of Modern Standard Arabic (MSA)
were recruited to participate in our study from the general undergraduate student
population of King Saud University1. The mean age of participants was 20 years.
They did not have any history of hearing or speaking impairments, and all had a
very limited experience with English as a second language. Participants were given
class credit by their instructors for participating in the study.
1 King Saud University, Riyadh, Saudi Arabia
29
30
3.1.2 Materials
There is a gap that exists in Arabic between MSA and its vernacular varieties.
Arabic has been known as a traditional example of diglossia in which two varieties
of the language are used to fulfill different communicative functions (Ferguson
1959). Although participants were all fluent speakers of MSA, additional care
was taken in eliciting speech material in order to ensure that the participants
would stay within the target MSA register. Therefore fricatives were elicited
using screen prompted speech in conjunction with prerecorded audio prompts. A
trained phonetician, who is also a fluent speaker of MSA, produced CVC syllables
where the initial consonant was a MSA fricative /f, T, D, DQ, s, sQ, z, S, X, K, è, Q,
h/ followed by each of the six vowels /i, i:, a, a:, u, u:/. The final consonant was
always /t/. Each resulting word was repeated three times to yield a total of 234
audio prompts (13 fricatives × 6 vowels × 3 repetitions). The recorded prompts
were then edited to be of equal length (' 1 second) by adding silence to the end
if needed. The written prompts were constructed using fully vowelled Arabic
orthography on a white background. The participants were instructed to repeat
the word presented in the carrier phrase “qul marratajn” (say twice); with
the audio prompt functioning only as a reference. The prompts were presented
randomly in blocks of 39 words with breaks between blocks. Before the actual
recording of any participant, a practice session with 10 words presented in two
blocks was conducted to familiarize the participants with the task.
3.1.3 Recording
The recording was carried out using the facilities of the Computer &
Electronics Research Institute at KACST2. Two adjacent sound-attenuated
booths with a monitoring window between them hosted the data collection process.
2 King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia.
31
In one booth a PC computer running Microsoft PowerPoint was used to present
the synchronized audio-written production prompts via an LCD screen affixed to
the outside of the monitoring window of the other booth. The text was shown on
the LCD screen while the synchronized audio prompt was fed through headphones
(Sennheiser Noisegard mobile HDC 451). A Kay Elemetrics CSL (Computer
Speech Lab) model 4300B which was connected to another PC computer was
used for in-line recording of the participants’ utterances. It should be pointed out
that anti-aliasing is carried out automatically during data capture through CSL
external module. All recordings were done at 22.05 kHz sampling rate and 16 bit
quantization. The participant’s production of the word in the carrier phrase was
captured using a low-impedance, unidirectional head-worn dynamic microphone
(SHURE SM10A) positioned about 20 mm to the left of the participants’ mouth in
order to prevent direct air flow turbulence from impinging on the microphone.
Each word lasted 4 seconds on the screen and then the following word was
shown. In case a participant did not produce the word in the allocated time or
a mispronunciation occurred, the recording was stopped by the author and that
particular word was presented again.
Each block was saved to a separate sound file for easy manipulation. The
resulting sound files were then transfered into PRAAT for segmentation and further
analyses.
3.2 Data Analysis
3.2.1 Segmentation of Speech
Both a wide-band spectrogram and a waveform display were used in the
segmentation of the recorded material into the monosyllabic words containing
the test fricatives. For each token, four points were identified on the waveforms:
the beginning of frication, the offset of fricative/beginning of the vowel, the end
of the vowel, and the end of word. For all these points the nearest zero-crossing
32
was always used. Fricative onset was taken to be the point in time at which high-
frequency energy appeared on the spectrogram and/or a significant increase in
zero-crossings rate occurred. The offset of the voiceless fricative was taken to be
the point of minimum intensity preceding the periodicity of the vowel. For the
voiced fricatives, the offset was taken to be the zero-crossing of the pulse preceding
the earliest pitch period exhibiting a change in the waveform from that seen
throughout the initial frication (Jongman et al. 2000). The vowel offset was taken
to be the end of periodicity while the end of the segmented token was taken to be
the onset of stop burst release. Figure 3–1 shows an example of these points. The
time indices of the segmentation points were written to a PRAAT TextGrid file. Such
files make it easier to handle the signal independently from the segmentation data
and labels.
Fricative onset Fricative offset Vowel offset Stop release
Figure 3–1. Example of Segmentation
33
The only exception to the above mentioned general rules was with the voiced
pharyngeal fricative /Q/, where it was difficult to visually localize the fricative-
vowel boundary. Pharyngeal fricative /Q/ is known to have a formant-like structure
continuous with that of the following vowel, with the lowest frequency of the
fricative matches that of the second formant of the following vowel (Johnson
1997). Therefore, the frication offset for /Q/ was taken to be the point at which
an upwards intensity-shift occurred with reference to the intensity of the fricative
onset. Such point indicates the shift from low intensity founds in the frication
noise towards the higher intensity of the vocalic part. Figure 3–2 shows an example
of the segmentation of /Q/. Due to the absence of voicing during frication, such
modification in segmentation criteria was not necessary for either /è/ nor /h/.
Fricative onset
Fricative offset
Vowel offsetStop release
Figure 3–2. Segmentation of /Q/. The dotted line shows the intensity level.
34
3.2.2 Acoustic Analyses
All measurements described below were obtained using scripts written by the
author for the PRAAT program. All measurements were then entered into a MySQL
database for later querying and statistical analyses. For spectral analyses based
on fast Fourier transform (FFT), a double-Kaiser window was used. A window
is a frequency weighting function applied to the time domain data to reduce
the spectral leakage associated with finite-duration time signals. This process is
achieved by applying a smoothing function that peaks in the middle frequencies
(forming a main lobe) and decreases to near zero at the edges (forming side lobes),
thus reducing the effects of the discontinuities as a result of finite duration. The
ideal window is one that has a narrow main lobe and low sidelobes (Harris 1978).
However, there is a tradoff relationship between these two characteristics as
narrowing the main lobe introduces many levels of sidelobes and vice versa.
Traditionally, in speech research, Hamming and Hann windows were used
for spectral analyses. However, the more optimum Kaiser window is used in our
study. The Kaiser window is the best approximation to a Gaussian window given
a certain ratio between physical length and effective length. More precisely, when
weighting is used, a Kaiser window of double physical length is applied to the
signal (Boersma and Weenink 2004). Such windowing function produces similar
bandwidth as compared to a Hamming window with comparable effective width.
However, with a Hamming window, we end up with sidelobes of about −42 dB on
each side of the main lobe while such windowing artifacts are at a level of −190 dB
for the Kaiser window (Figure 3–3). Most speech analysis software uses a Hamming
(or Hann) window because evaluating a Kaiser window as explained above is slower
by a factor of two since the analysis is performed on twice as many samples per
frame. With modern computers, such speed/performance tradeoff is minimal and
hence the adaptation of the weighting function for our study.
35
Frequency (Hz)980 1020
Soun
d pr
essu
re le
vel (
dB/H
z)
40
60
80
main lobe
side lobes
Frequency (Hz)980 1020
Soun
d pr
essu
re le
vel (
dB/
Hz)
40
60
80
A B
Figure 3–3. Two Window functions. A)The 0.1-seconds Hamming Window. B)The0.2-seconds Kaiser Window.
Pre-emphasis of each spectral analysis interval was carried out in order to
correct for the −6 dB per octave falloff in production of voiced speech. This falloff
is a result of the 12 dB per octave decrease due to excitation source and 6 dB per
octave increase due to the radiation compensation at the lips. With pre-emphasis
applied, the flattened spectrum would be a function of the vocal tract alone. Pre-
emphasis was applied as described in the PRAAT manual as a filter changing each
sample xj of the sound (except for x1) starting from the last sample according
to Equation (3–1) where 4t is the sampling period of the sound and F is the
frequency above which the change is applied. In our study α was set to 0.98 and F
to 50 Hz. The pre-emphasis filter was applied to the signal before windowing.
α = exp (−2 π F 4t)
xj = xj − αxj−1
(3–1)
36
3.2.2.1 Duration
Three temporal measurements were extracted based on the segmentation
criteria mentioned above: fricative, vowel and word duration. Since different tokens
of the same fricative included different stop burst durations, word duration was
measured from fricative onset to the point where the release of stop burst is visible
on the spectrogram (Figure 3–4).
FricativeVowel
Word
Figure 3–4. Duration
3.2.2.2 Spectral Moments
Spectral Moments measurements were modeled after those of Forrest et al.
(1988) with the window length modification employed by Jongman et al. (2000).
After pre-emphasis is applied to the signal, FFT spectra were calculated from
four different locations in the fricative with a 40 ms double-Kaiser window. The
first three windows were aligned so that the first covered the initial 40 ms of the
fricative, the second the middle 40 ms and the third the final 40 ms of frication
noise. The fourth window was centered over the fricative-vowel boundary so that
it covered 20 ms of each, capturing any transitional information. The analysis
37
windows may or may not overlap based on the length of the frication noise.
Following Forrest et al. (1988), each FFT was treated as a random probability
distribution from which the first four moments (mean, variance, skewness, and
kurtosis) were calculated. Only moments from linear spectra were calculated since
previous research on fricatives (Jongman et al. 2000) reported that there was no
substantial difference between the linear and bark-transformed spectra. The PRAAT
program measures the first moment (center of gravity) as in Equation (3–2) where
S(f) is the complex spectrum, f is the frequency and the denominator is the
energy. The quantity p was set to 2 in order to weigh the average frequency by the
power spectrum (not by the absolute spectrum).∫∞0
f |S(f)|p df∫∞0|S(f)|p df
(3–2)
The other three moments were first calculated using Equation (3–3) where n
denotes the nth moment. To normalize skewness with regard to different levels of
variance, the product of Equation (3–3), with n = 3, was divided by 1.5 power of
the second moment. Likewise, to normalize kurtosis, the product of Equation (3–3),
with n = 4, was divided by the square of the second moment and then a value of 3
was subtracted (Forrest et al. 1988).∫∞0
(f − fc)n |S(f)|p df∫∞
0|S(f)|p df
(3–3)
3.2.2.3 RMS Amplitude
Root-Mean-Square (RMS) amplitude in dB was measured from the entire
frication noise. Since different speakers and recording sessions may result in
different intensities, direct measures of amplitude cannot be compared across
speakers. Therefore, fricative amplitude was normalized using the method
described by Behrens and Blumstein (1988b). Basically, the average RMS
amplitude (in dB) of three consecutive pitch periods at the point of maximum
38
vowel amplitude was subtracted from the RMS amplitude of the entire frication
noise. In PRAAT, RMS amplitude was given in units of Pascal and were then
changed into dB following Equation(3–4).
RMS Amplitude dB = 20× log10
{Amplitudepascal
2× 10−5
}(3–4)
3.2.2.4 Spectral Peak Location
Spectral Peak Location of the fricative was estimated using a 40 ms double-
Kaiser window positioned over the middle of the frication noise. The analysis
window was set this large in order to gain better frequency resolution (Jongman
et al. 2000). Another window was placed at the end of the frication noise such
that its right shoulder was aligned with the end of frication noise. The two window
locations were used because studies of spectral peak location have demonstrated
that high frequency peaks are more likely to emerge at the middle and end of
frication noise (Behrens and Blumstein 1988a). Further, as explained in Section
(2.3.3.1), it is anticipated that two patterns of peaks will emerge: one at middle of
the frication noise and the other at the end of the co-articulated pharyngealized
fricatives due to their coarticulatory nature. After applying pre-emphasis and
windowing, an FFT spectrum was derived. A script written for PRAAT searched
each spectrum to find the highest amplitude peak and its associated frequency. As
before, the amplitude was converted into dB using Equation (3–4).
3.2.2.5 Relative Amplitude
Relative Amplitude was measured as described in Hedrick and Ohde (1993)
and later in Jongman et al. (2000) with one more modification. An FFT spectrum
was derived at vowel onset with a 23.3 ms double-Kaiser window. The mean value
of the first six formants in the windowed selection were estimated based on the
FFT spectrum. Each spectrum was then filtered using a pass-band Hann filter to
39
isolate regions of the second, third and fifth formants based on the mean values
obtained above. Each region spanned from the mean frequency of the target
formant to half the distance to the two adjacent formants. A schematic example of
the upper and lower limits of such region is presented in Equation (3–5).
maxFi = meanFi + [(meanFi −meanFi−1)/2]
minFi = meanFi − [(meanFi+1 −meanFi)/2](3–5)
A script written for PRAAT searched each frequency region of the spectrum
to find its spectral peak and associated amplitude as mentioned in Section 3.2.2.4
above. Similar to previous research with (English) fricatives, spectral peak at the
F5 region was used for non-sibilant fricatives /f, T, D/ and spectral peak at F3
region for sibilant fricatives /s, z, S/. However, for the remaining fricatives (/X, K,
Q, h, sQ, DQ/), spectral peak of the F2 region was used.
Another FFT spectrum was derived at the middle of frication noise and
subsequently filtered into frequency regions based on the frequency of amplitude
peaks of F2, F3 and F5 regions of the vowel. Each region spanned 128 Hz on
each of the two sides around the vowel’s frequency regions. The amplitude of the
spectral peak in the said regions was measured using the same procedure outlined
above for the vowel. Relative amplitude was then defined for each frequency region
as the ratio between fricative amplitude and vowel amplitude at that frequency
range. Ratios in log scale are expressed as the difference between the two values.
3.2.2.6 Locus Equations
Following previous research on locus equations (for example Sussman et al.
1991, 1993; Fowler 1994; Sussman 1994; Yeou 1997; Govindarajan 1998; Jongman
1998; Jongman et al. 2000; Tabain 2002), coefficients of locus equations were
derived from scatterplots of F2 values measured at vowel onset and vowel nucleus
for each speaker and place of articulation combination. Specifically, the second
formant at vowel onset as well as at the middle of the vowel were estimated using
40
the formant tracking procedure implemented in PRAAT. At first, the sound was
resampled to 10 kHz and then pre-emphasized using the algorithm mentioned
above Equation (3–1). After a Gaussian-like window of 25 ms length was applied to
the signal, the LPC coefficients were calculated for each analysis window using the
algorithm by Burg, as given in Anderson (1978) and Press, Flannery, Teukolsky,
and Vetterling (1992). For each speaker and place combination, linear regression
fits were applied on scatterplots with F2 averaged across all vowel contexts. Each
scatterplot had F2 measured at the onset of the vowel represented on the y-axes
and F2 measured at the mid-point of the vowel represented on the x-axes. The
coefficients of each regression line (the slope ‘k’ and the y-intercept ‘c’) were taken
to be the terms of locus equations.
3.2.2.7 F2 at Transition
Second Formant at the transition was also measured from the first window (at
vowel onset) used to derive F2 for the locus equations above.
3.3 Statistical Analyses
Along with reporting the descriptive statistics for the acoustic measures
mentioned above, measures of significant differences between different places
of articulation for these measures were obtained using appropriate Analysis of
Variance (ANOVA) methods. All reported statistics were calculated from data
points aggregated across the three repetitions for each speaker.
Discriminant function analysis (DFA) was used to measure the contribution
of different cues towards the classification of fricatives into their respective classes.
The DFA procedure reduces the physical space, built by extracted cues, into
subspaces corresponding to the sound classes under consideration (Jassem 1979).
This classification method works first by forming vectors of the metrics mentioned
above. Recall that each cue mentioned above, except for locus equations, represents
a value of some single feature at a given point in time. Therefore, each token can
41
be represented as a combination of values (a vector) from all these cues. All the
tokens, then, are represented as points defined by their respective vectors in a
multidimensional space. The dimensions of such space depend on the number of
parameters in use.
The goal of DFA is to find the optimal number of parameters that provide the
optimal classification accuracy of tokens into their pre-defined classes. This process
involves calculating three types of probabilities: the probability of observing a
particular parameter p for a token t (P [ p | t ]), the probability of observing a token
t in the data (P [ t ]) and finally the probability of observing a specific value for
a parameter (P [ p ]). All these probabilities are calculated from training data to
predict the membership of an unknown token in testing data using the Bayesian
Theorem (3–6). The value P [ t |p ] is the probability that an unknown token belongs
to class t given a value for parameter p (Harrington and Cassidy 1999).
P [ t | p ] =P [ p | t ] P [ t ]
P [ p ](3–6)
The unknown token then is classified as belonging to class A (ta) not class B
(tb) if the condition P [p|ta ]P [ta ] > P [p|tb ]P [tb ] is satisfied (Harrington and Cassidy
1999). The traditional way of applying this method to fricatives classification (see
for example Shadle and Mair 1996; Tabain 1998; Jongman et al. 2000; Nissen 2003)
involves all-but-one speakers as the training data and tokens from the remaining
speaker as the testing data. The process is repeated so that each speaker will be
in the testing data at a given time. The DFA procedure produces a classification
accuracy score along with a set of coefficients that represent the contribution of the
parameters in the classification.
CHAPTER 4AMPLITUDE AND DURATION
This chapter reports results of the amplitude and duration measurements.
These results were derived from a three-way ANOVA with place of articulation,
voicing, and vowel context as between-subject factors. Post hoc tests of significant
effects were adjusted for multiple comparisons using the Bonferroni method. All
data were aggregated across the three repetitions of each speaker prior to any
statistical analysis.
4.1 Amplitude Measurements
4.1.1 Normalized Frication Noise RMS Amplitude
Normalized frication RMS amplitude was calculated as the difference
between frication noise RMS amplitude and the average RMS amplitude of
three consecutive pitch periods at the point of maximum vowel amplitude.
A three-way Analysis of Variance (ANOVA) with normailized frication noise
RMS as the dependent factor and the place of articulation, voicing, and vowel
context as between subject factors revealed a significant main effect of Place
[F (8, 561) = 75.241, p < 0.001; η2 = 0.518]. Due to a lack of voicing contrast
at some places of fricative articulation in Arabic (Labiodental, Post-Alveolar, and
Glottal), differences within voiceless fricatives and within voiced fricatives will
be interpreted separately. For both voiced and voiceless fricatives, subsequent
Bonferroni post hoc tests showed that plain fricatives and their pharyngealized
counterparts (/D - DQ/ and /s - sQ/) did not differ in normalized RMS amplitude
(mean normalized RMS values are reported in Figure 4–1). However, with the
exception of the contrast between voiced alveolar and uvular fricatives (/z -
K/), normalized RMS amplitude significantly (p < 0.0001) distinguished all
42
43
places of voiced fricative articulation. Additionally, within voiceless fricatives,
nonsibilant fricatives /f, T/ had the lowest normalized RMS amplitude (−23.94
and −22.50 dB respectively). While such RMS amplitude values for /f/ and /T/
were not statistically different from each other, normalized RMS amplitude values
of both /f/ and /T/ were significantly lower than all other voiceless fricatives.
Additionally, no differences were obtained between /s, S, h/ or between /X, è/. All
other contrasts were significant (Figure 4–1).
-17.26
-14.53
-16.55
-13.66
-7.52
-18.15
-14.40
-20.17
-19.09
-14.01
-15.38
-22.50
-23.94Labiodental
Dental
Pharyngealized
Dental
Alveolar
Pharyngealized
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Glottal
Pla
ce o
f A
rtic
ula
tion
Normalized RMS Amplitude (dB)
voiced voiceless
Normalized RMS Amplitude (dB)
Pla
ceofA
rtic
ula
tion
Figure 4–1. Mean frication noise normalized RMS amplitude (dB) by place ofarticulation and voice.
There was also a significant main effect of Vowel context [F (5, 561) =
16.185, p < 0.001; η2 = 0.126]. For short vowels, normalized frication RMS
amplitude tended to be lower as the vowel context changed from /i/ to /u/ to
44
/a/ with means of −16.51 dB, −17.03 dB, and −17.81 dB respectively. The same
pattern was also observed with long vowels (/i:/ to /u:/ to /a:/ with means of
−14.30 dB, −16 dB, and −18.58 dB respectively). However, statistically significant
differences in terms of vowel context effect, as suggested by post hoc tests, were
observed with long vowels only with p = 0.004 for the /i: -u:/ contrast and
p < 0.001 for all other contrasts. Additionally, as can be seen from Figure 4–2,
when comparing a short vowel to its long variant, we find that only the front
long vowel /i:/ resulted in a significantly (p < 0.001) lower value for normalized
frication RMS amplitude than its short counterpart /i/.
-20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
/ i / / u / / a /
Vowel Context
Norm
aliz
ed R
MS A
mplit
ude (
dB)
Short Vowels Long Vowels
Vowel Context
Norm
alize
dR
MS
Am
plitu
de
(dB
)
Figure 4–2. Mean frication noise normalized RMS amplitude (dB) by vowelcontext.
Finally, a significant main effect of Voicing [F (1, 518) = 315.204, p <
0.001; η2 = 0.36] was also found. Normalized RMS amplitude of voiced fricatives
45
(mean = −14.22 dB) was greater than that of voiceless fricatives (mean = −18.26
dB). In addition to this main effect, there was a significant Place by Voicing
interaction [F (3, 561) = 41.9, p < 0.001; η2 = 0.183]. As can be seen in Figure
4–3, Bonferroni post hoc tests showed that the significant difference in normalized
frication RMS amplitude between voiced and voiceless fricatives noted above was
not present for alveolar fricatives /s, z/.
-25
-20
-15
-10
-5
0
Dental Alveolar Uvular Pharyngeal
Place of Articulation
Norm
aliz
ed R
MS A
mplit
ude (
dB)
Voiced
Voiceless
Place of Articulation
Norm
alize
dR
MS
Am
plitu
de
(dB
)
Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.
1
Figure 4–3. Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.
4.1.2 Relative Amplitude of Frication Noise
Relative amplitude is defined here as the ratio between the amplitude of
a specific frequency (F3 for /f, T, D/, F5 for /s, z, S/, and F2 for /X, K, sQ, DQ,
è, Q, h/) measured at the frication noise midpoint and the amplitude of the
corresponding frequency measured at vowel onset. Results of a three-way ANOVA
46
(place × voice × vowel) with relative amplitude as the dependent variable showed a
significant main effect of Place [F (8, 561) = 104.525, p < 0.001; η2 = 0.598].
In general, relative amplitude of a fricative becomes greater as the place of
articulation advances towards the lips (Figure 4–4). The only notable exception
was the post-alveolar fricative (/S/). It was the only fricative in which the frication
amplitude measured at the region of F3 was greater than the amplitude of the
same frequency region at the following vowel onset (i.e., giving a value for relative
amplitude above zero). Collapsed across voicing, differences in relative amplitude
between all places of fricative articulation were significant with the exception of all
possible pairwise comparisons between the following three places: alveolar /s, z/,
pharyngeal /è, Q/, and glottal /X, K/ fricatives. However, since voicing contrast
is not present at all places, Bonferroni post hoc tests carried out on voiced and
voiceless fricatives showed a different pattern. Within voiced fricatives, relative
amplitude of pharyngealized dental fricative /DQ/ was significantly lower than those
of all other voiced fricatives, while those of alveolar /z/, dental /D/, and uvular
/K/ fricatives were not statistically different from one another. Furthermore, the
difference in relative amplitude between /D/ and /Q/ was not significant. All other
contrasts between voiced fricatives were significant (Figure 4–4). Within voiceless
fricatives, relative amplitude differentiated /f/ (−5.22 dB) and /T/ (−5.45 dB)
from all other fricatives; however, no significant difference was observed between
these two nonsibilant fricatives. Additionally, relative amplitude differentiated
between all other voiceless fricatives with the exception of the contrasts between
/s/–/è/, /s /–/h/, and /è/–/h/.
There was also a significant main effect for Vowel context [F (5, 561) =
11.642, p < 0.001; η2 = 0.094]. However, the source of this main effect as revealed
by Bonferroni post hoc tests can be solely attributed to differences in the context of
long vowels. Specifically, relative amplitude of fricatives followed by the high back
47
-14.95
-28.03
-20.05
-11.78
-5.22
-5.45
-31.23
-22.66
-17.32
-14.27
-16.28
0.90
-15.76
Labiodental
Dental
Pharyngealized
Dental
Alveolar
Pharyngealized
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Glottal
Pla
ce o
f A
rtic
ula
tion
Relative Amplitude (dB)
Voiced Voiceless
Relative Amplitude (dB)
Pla
ceofA
rtic
ula
tion
Figure 4–4. Mean relative amplitude of fricatives.
48
vowel /u:/ (mean = −11.31 dB) was significantly higher (p < 0.0001) than relative
amplitude of fricative in front of any other vowel except /i:/ which has similar
height and length as /u:/. Another source for the obtained main effect above was
the significantly low (p < 0.016) relative amplitude of fricatives preceding the low
vowel /a:/ (mean = −17.02 dB) in relation to other long vowels. Furthermore,
there was a general trend such that a short vowel would result in a lower relative
amplitude than its long counterpart with only /u, u:/ contrast reaching significance
level (p < 0.05). Mean values for relative amplitude of fricatives in different vowel
contexts are presented in Table 4–1 where cells with significant differences are
shaded.
Table 4–1. Relative amplitude in different Vowel contexts. Means are arranged indescending order.
Mean /i/ /u/ /a/ /i:/ /u:/ /a://u:/ -11.31 ∗ ∗ ∗ ∗/i:/ -13.85 ∗ ∗/i/ -16.17 ∗/u/ -16.33 ∗/a:/ -17.02 ∗ ∗/a/ -18.61 ∗ ∗∗ significant difference at p < 0.05
The ANOVA also revealed a significant Place by Voicing interaction
[F (3, 561) = 20.834, p < 0.001; η2 = 0.10]. Bonferroni post hoc tests showed
that only the differences between voiceless and voiced dental fricatives /T, D/ (9.5
dB) and between voiceless and voiced pharyngeal fricatives/è, Q/ (−5.5 dB) were
significant (Figure 4–5). However, no main effect of voicing was obtained.
A Place by Vowel context interaction was also significant [F (40, 561) =
4.101, p < 0.001; η2 = 0.226]. Multiple one-way ANOVAs, with Bonferroni post
hoc tests corrected for multiple comparisons, were conducted for each place of
articulation in which vowel context was separated as long and short vowels. The
results of these ANOVAs showed that for long vowels, the significant increase
49
-25
-20
-15
-10
-5
0
Dental Alveolar Uvular Pharyngeal
Place of Articulation
Norm
aliz
ed R
MS A
mplit
ude (
dB)
Voiced
Voiceless
Place of Articulation
Rel
ati
ve
Am
plitu
de
(dB
)
Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.
1
Figure 4–5. Relative amplitude as a function of Place and Voicing.
50
of relative amplitude in front of /u:/ mentioned above was present only within
labiodental (/f/) (mean = 5.34 dB) and alveolar (/s, z/) (mean = −6.37 dB)
fricatives. In addition, relative amplitude within pharyngealized alveolars (/sQ/) in
the context of low vowel /a:/ was significantly lower (mean = −38.21 dB) than in
the context of high vowels /i:/ (mean = −21.36 dB) and /u:/ (mean = −22.54 dB).
Finally, unlike the absence of differences between long vowels of the same height
observed above, the relative amplitude of glottal fricative (/h/) in the context
of the front vowel /i:/ (mean = −10.21 dB) was significantly higher than in the
context of back vowel /u:/ (mean = −20 dB) (Figure 4–6). As for short vowels,
a similar pattern of significant differences was obtained. Specifically, the relative
amplitude of labiodental (/f/) and alveolar (/s, z/) fricatives was significantly
higher in the context of /u/ (mean = −1.31 and −10.64 dB respectively) than
either /i/ (mean = −9.77 and −21.58 dB respectively) or /a/ (mean = −9.83
and −20.79 dB respectively). Moreover, the relative amplitude of pharyngealized
Alveolar (/sQ/) in the context of low vowel /a/ (mean = −39.07 dB) was only
significantly lower than in the context of high vowel /i/ (mean = −28.02 dB)
(Figure 4–7). Mean values for relative amplitude of fricatives in different vowel
context are also presented in Table (4–2).
Finally, a Vowel context by Voicing interaction was also found to be significant
[F (5, 561) = 4.574, p < 0.001; η2 = 0.039]. Bonferroni post hoc tests were carried out
on long and short vowels separately. In general the relative amplitude of voiceless
fricatives in a given vowel context is higher than that of voiced fricatives in the
same context (Figure 4–8 and Figure 4–9), however this difference was significant
only with /i:/ (mean = −10.80 dB for voiceless and −18.71 dB for voiced).
51
-50
-40
-30
-20
-10
0
10
/ i // u // a /
Place of ArticulationR
ela
tive
Am
plitu
de
(dB
)
/h//è, Q//X, K//S//sQ//s, z//DQ//T, D//f/
Figure 4–6. Relative amplitude (dB) as a function of place of articulation and shortvowels.
52
-50
-40
-30
-20
-10
0
10
/ i : // u: // a: /
`
Place of ArticulationR
ela
tive
Am
plitu
de
(dB
)
/h//è, Q//X, K//S//sQ//s, z//DQ//T, D//f/
Figure 4–7. Relative amplitude (dB) as a function of place of articulation and longvowels.
53
Tab
le4–
2.M
ean
rela
tive
amplitu
de
offr
icat
ion
noi
se.
/i/
/u/
/a/
shor
tlo
ng
shor
tlo
ng
shor
tlo
ng
Lab
ioden
tal
Voi
cele
ss-9
.77
-7.1
2-1
.31
5.34
-9.8
3-8
.64
Den
tal
Voi
ced
-18.
88-1
5.22
-14.
49-9
.36
-15.
85-1
5.88
Voi
cele
ss-7
.13
-5.2
6-6
.51
0.87
-7.5
5-7
.12
Alv
eola
rVoi
ced
-21.
54-1
8.67
-9.8
3-6
.91
-22.
28-1
8.44
Voi
cele
ss-2
1.62
-17.
87-1
1.46
-5.8
4-1
9.30
-18.
49
Pos
-Alv
eola
rVoi
cele
ss-2
.09
-1.0
53.
737.
96-3
.16
0.01
Uvula
rVoi
ced
-21.
31-2
2.71
-18.
45-1
5.10
-22.
58-2
0.15
Voi
cele
ss-1
6.67
-16.
52-2
9.88
-22.
51-2
7.48
-22.
90
Phar
ynge
alVoi
ced
-12.
98-1
2.05
-14.
65-1
0.58
-10.
78-9
.66
Voi
cele
ss-1
0.60
-7.0
4-2
4.63
-21.
55-1
9.76
-20.
35
Phar
ynge
aliz
edD
enta
lV
oice
d-2
6.30
-24.
91-2
8.53
-26.
76-3
2.02
-29.
67
Phar
ynge
aliz
edA
lveo
lar
Voi
cele
ss-2
8.02
-21.
36-3
8.21
-22.
54-3
9.07
-38.
21
Glo
ttal
Voi
cele
ss-1
3.30
-10.
21-1
8.09
-20.
00-1
2.20
-11.
80
54
-25
-20
-15
-10
-5
Voiced Voiceless
/ i /
/ u /
/ a /
`
VoicelessVoiced
Rela
tive
Am
plitu
de
(dB
)Figure 4–8. Relative amplitude (dB) as a function of voicing and vowel context
(short vowels).
55
-25
-20
-15
-10
-5
Voiced Voiceless
/ i : /
/ u: /
/ a: /
VoicelessVoiced
Rela
tive
Am
plitu
de
(dB
)Figure 4–9. Relative amplitude (dB) as a function of voicing and vowel context
(long vowels).
56
4.2 Temporal Measurements
Two measures of fricative noise duration are reported here: absolute fricative
duration and normalized fricative duration. For the latter, the ratio between word
and fricative durations was calculated to normalize and account for the different
speaking rates that might have occurred. For each measure, a three-way ANOVA
(place × voice × vowel context) was carried out. Subsequent post hoc tests were
corrected for multiple comparisons using the Bonferroni method.
4.2.1 Absolute Duration of Frication Noise
A three-way ANOVA (place × voice × vowel context) with the duration
of the frication noise as the dependent factor revealed a main effect of Place
[F (8, 561) = 50.092, p < 0.001; η2 = 0.417] with mean frication noise duration
of 117.99 ms. Mean duration of frication noise as a function of place of articulation
and voicing are presented in Figure 4–10. Averaged across voicing and vowel
context, pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest
duration with a mean of 86.47 and 98.55 ms respectively. Due to the well known
effect of voicing on segmental duration (Cole and Cooper 1975; Manrique and
Massone 1981; Baum and Blumstein 1987; Behrens and Blumstein 1988b; Crystal
and House 1988; Pirello et al. 1997, among others), two sets of comparisons were
mad, one fore voiced and the other for voiceless fricatives. Among voiced fricatives,
alveolar fricative /z/ was significantly longer than all other voiced fricatives with a
mean duration of 110.12 ms. No other differences among voiced fricatives reached
the significance level of p < 0.05.
On the other hand, contrasts within voiceless fricatives revealed that glottal
fricative /h/, with a mean duration of 98.55 ms, was significantly shorter than all
other voiceless fricatives. Although no significant difference between nonsibilants
was observed, each of the nonsibilants /f/ and /T/ (127.86 and 131.68 ms
respectively) were significantly shorter than each of the sibilants /s/, /sQ/, and
57
/S/. Additionally, alveolar /s/ and it pharyngealized counterpart /sQ/ (mean =
149.86 and 149.70 ms) were significantly longer than all other voiceless fricatives
excluding /S/. As in the case of voiced fricatives, no significant differences were
found among voiceless labiodental, dental, uvular, and pharyngeal fricatives or
between pharyngealized fricatives and their plain counterparts (/sQ-s/).
91.36
86.47
110.21
88.39
83.82
127.86
131.68
149.86
138.59
149.70
142.59
134.84
98.55
Labiodental
Dental
Pharyngealized
Dental
Alveolar
Pharyngealized
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Glottal
Pla
ce o
f A
rtic
ula
tion
Frication Noise Duration (ms)
Voiceless
Voiced
Frication Noise Duration (ms)
Pla
ceofA
rtic
ula
tion
Figure 4–10. Absolute Frication noise duration as a function of place and voiceaveraged across all vowel context and speakers.
Also, as expected, a main effect of Voicing was found [F (1, 561) = 721.75, p <
0.001; η2 = 0.563], with voiceless fricatives (mean 134.21 ms) being significantly
longer than voiced fricatives (mean 92.05 ms). A Place by Voice interaction was
also significant [F (3, 561) = 3.327, p < 0.05; η2 = 0.017]. Subsequent Bonferroni post
hoc tests showed that this difference was significant across all places of articulation
58
with a voicing contrast (Figure 4–11). The source of this interaction is probably
due to variation in the magnitude of duration differences between a voiced and
voiceless fricative in a given place. As is apparent from Figure 4–11, the difference
between voiced and voiceless fricatives was greater for uvular and pharyngeal than
for dental and alveolar fricatives.
60
70
80
90
100
110
120
130
140
150
160
Dental Alveolar Uvular Pharyngeal
Place of Articulation
Dura
tion o
f Fricati
on N
ois
e (
ms)
Voiced
Voiceless
Place of Articulation
Fri
cati
on
Nois
eD
ura
tion
(ms)
Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.
1
Figure 4–11. Mean absolute frication noise duration for places with a voicingcontrast.
Finally, a main effect of Vowel context [F (5, 561) = 4.708, p < 0.001; η2 = 0.04]
was significant. However, post hoc tests showed that differences in frication
noise duration measured in the context of vowels of the same length were not
significantly different from each other. Moreover, the source of the main effect was
due to the significantly increased duration of fricatives measured in the context of
/i:/ (mean 123.25 ms) as compared to all short vowels; and the significantly longer
59
duration of frication noise in the context of /u:/ (mean 122.80 ms) when compared
to /a, u/ (Figure 4–12).
0
20
40
60
80
100
120
140
/ i / / u / / a /
Vowel Context
Dura
tion o
f Fricati
on N
ois
e (
ms)
Short Vowels Long Vowels
Vowel Context
Fri
cati
on
nois
eD
ura
tion
(ms)
Figure 4–12. Mean absolute frication noise duration in different vowel contexts.
4.2.2 Normalized Duration of Frication Noise
Normalized frication noise duration is defined here as the ratio between
fricative duration and word duration. As can be seen from Figure 4–13, normalized
frication noise followed a pattern similar to the one observed with absolute
frication noise duration. Specifically, averaged across voicing and vowel context,
pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest normalized
duration with means of 0.27 and 0.31 respectively. The results of the three-way
ANOVA revealed a main effect of Place [F (8, 561) = 49.82, p < 0.001; η2 = 0.415].
Separated according to voicing, Bonjferroni post hoc tests showed, as was the case
60
with absolute duration, that /z/ (mean 0.34) was significantly longer than all other
voiced fricatives. No significant differences were observed among voiced dental,
uvular, and pharyngeal fricatives or between pharyngealized dental and their plain
counterparts (i.e., /DQ - D/).
As for contrasts within voiceless fricatives, glottal fricative /h/, with the
mean duration of 0.307, was significantly shorter than all other voiceless fricatives.
Moreover, voiceless alveolar /s/ was significantly longer than all other voiceless
fricatives excluding the post-alveolar and pharyngealized alveolar fricatives/S, sQ/,
which in themselves were significantly longer than labiodental, pharyngeal, and
glottal fricatives /f, è, h/. No difference among voiceless fricatives reached the
significance level of p < 0.05.
0.284
0.266
0.335
0.276
0.263
0.307
0.379
0.401
0.405
0.388
0.412
0.370
0.375Labiodental
Dental
Pharyngealized
Dental
Alveolar
Pharyngealized
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Glottal
Pla
ce o
f A
rtic
ula
tion
Mean Normalized Frication Duration
Voiceless
Voiced
Normalized Frication Noise Duration
Pla
ceofA
rtic
ula
tion
Figure 4–13. Mean normalized frication noise duration as a function of place andvoice averaged across all vowel contexts and speakers.
61
The effect of Voicing on normalized fricative duration was also significant
[F (1, 561) = 724.74, p < 0.001; η2 = 0.564]. Averaged across other conditions,
voiced fricatives had significantly shorter normalized durations (mean = 0.29)
than voiceless fricatives (mean = 0.38). In addition, a significant Place by Voicing
interaction [F (3, 561) = 7.079, p < 0.001; η2 = 0.036] and subsequent Bonferroni
post hoc tests showed that this difference was greater for uvular and pharyngeal
than for dental and alveolar fricatives (Figure 4–14).
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Dental Alveolar Uvular Pharyngeal
Place of Articulation
Norm
aliz
ed D
ura
tion o
f Fricati
on N
ois
e
Voiced
Voiceless
Place of Articulation
Norm
alize
dFri
cati
on
Nois
eD
ura
tion
Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function ofplace of articulation and voicing.
1
Figure 4–14. Mean of normalized frication noise duration for places with a voicingcontrast.
Finally, as shown in Figure 4–15, normalized frication noise duration was
significantly affected by the Vowel context [F (5, 561) = 8.862, p < 0.001; η2 =
0.073]. However, such effect as suggested by Bonferroni post hoc tests was localized
only with reference to contrasts involving long vowels. Specifically, while no
62
significant differences were observed within short vowels, normalized frication noise
duration was significantly shorter (mean = 0.32) in the context of /a:/ than all
other vowels. On the other hand, fricatives preceding /i:/ had significantly longer
normalized duration (mean 0.35) than in the context of other long vowels.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
/ i / / u / / a /
Vowel Context
Norm
aliz
ed D
ura
tion o
f Fricati
on N
ois
e
Short Vowels Long Vowels
Vowel Context
Norm
alize
dFri
cati
on
Nois
eD
ura
tion
Figure 4–15. Mean normalized frication noise duration in different vowel contexts.
CHAPTER 5SPECTRAL MEASUREMENTS
5.1 Spectral Peak Location
This chapter reports on results of the spectral measurements which include
spectral peak location (frequency region of eneregy maximum in frication noise)
and spectral moments (mean, variance, skewness, and kurtosis). As mentioned in
Section (3.2.2.4), spectral peak frequencies were measured at eh center as well as
the end of frication noise. First, mean spectral peak location obtained from the two
locations was used in a one-way ANOVA as dependent variable to test for the effect
of the analysis window location. The ANOVA showed a main effect for Window
Location [F (1, 1246) = 1022.9, p < 0.001; η2 = 0.451]. Mean spectral peak location
when measured at the middle of the frication noise (4323 Hz) was higher than when
measured at the end of frication noise. However, a three-way ANOVA (place ×
vowel × voicing) with spectral peak measured at the end of the frication noise as
the dependent variable showed no significant effect for place. Therefore only the
results of measurements derived from the middle of frication noise will be reported
in details below.
Table 5–1 represents the mean frequency of spectral peak location obtained
from a 40-ms Kaiser window placed at the middle of frication noise of all fricatives
in different vowel contexts averaged across speakers and repetitions. Results of
a three-way ANOVA (place × vowel × voicing) with spectral peak measured at
the middle of frication noise as the dependent variable revealed a main effect for
Place [F (8, 561) = 143.402, p < 0.001; η2 = 0.672]. The observed general trend
of spectral peak location is that, when averaged across speakers and vowel context,
63
64
the frequency of the peak tends to decrease as the place of articulation moves
backwards in the oral cavity.
Since voicing contrast is not present for some places of fricative articulation
in Arabic, Bonferroni post hoc tests conducted to test for the simple main effect
for place will be conducted separately for voiced and voiceless fricatives. That is,
differences within voiceless fricatives and within voiced fricatives will be interpreted
separately. Mean frequencies of spectral peak of fricatives separated by place
and voicing are presented in Figure (5–1). Among voiceless fricatives, three
homogeneous groups of fricatives articulated at adjacent places emerged, with
differences in spectral peak location significant only for contrasts between members
of different groups. The first group included labiodental, dental, and alveolar
fricatives (/f, T, s/); the second included post-alveolar and uvular fricatives (/S,
X/); and finally the third group consisted of pharyngeal and glottal fricatives (/è,
h/). As for voiced fricatives, only the difference between /K/ and /Q/ was not
significant. Moreover, no significant difference was observed between plain fricatives
and their pharyngealized counterpart (/D - DQ/ or /s - sQ/).
Another main effect was observed for Voicing [F (1, 561) = 152.388, p <
0.001; η2 = 0.214], in which the frequency of spectral peak location for
voiceless fricatives (mean =4957 Hz) was significantly greater than that of voiced
fricatives (mean =3279 Hz). However, a significant Place by Voicing interaction
[F (3, 562) = 26.48, p < 0.001; η2 = 0.124] and subsequent Bonferroni post hoc
comparisons within places that have a voicing contrast showed that the difference
between voiceless and voiced fricatives was not significant for alveolar fricatives (/s,
z/). Also, as apparent from Figure (5–2), the difference was most prominent for the
nonsibilant dental fricatives (/T, D/).
A main effect for Vowel context was also significant [F (5, 561) = 8.473, p <
0.001; η2 = 0.07]. While no significant differences between vowels differing only in
65
Tab
le5–
1.M
ean
freq
uen
cy(H
z)of
amplitu
de
pea
kas
mea
sure
dat
the
mid
dle
offr
icat
ion
noi
se.
/i/
/u/
/a/
shor
tlo
ng
shor
tlo
ng
shor
tlo
ng
Lab
ioden
tal
Voi
cele
ss81
4472
1070
3162
4176
1379
40
Den
tal
Voi
ced
4115
5838
2559
3823
2942
1788
Voi
cele
ss76
8682
7574
2675
1378
7972
48
Alv
eola
rVoi
ced
6720
8079
5228
5283
7124
7237
Voi
cele
ss80
1676
8655
8358
0173
8972
70
Pos
t-A
lveo
lar
Voi
cele
ss34
8636
9033
2736
6833
4834
95
Uvula
rVoi
ced
1872
2153
1414
1368
2186
2104
Voi
cele
ss32
0632
3839
2733
9833
2337
67
Phar
ynge
alVoi
ced
763
1139
640
641
900
1162
Voi
cele
ss24
9325
4526
5124
1422
0322
98
Phar
ynge
aliz
edD
enta
lV
oice
d34
1342
4931
0127
6737
0240
47
Phar
ynge
aliz
edA
lveo
lar
Voi
cele
ss71
3568
7547
3861
4769
7271
37
Glo
ttal
Voi
cele
ss22
4323
6393
511
4917
7620
42
66
3511
3547
6612
1850
874
7363
7671
6958
1751
2434
3476
3502
6501
Labiodental
Dental
Pharyngealized
Dental
Alveolar
Pharyngealized
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Glottal
Pla
ce o
f A
rtic
ula
tion
Spectral Peak Location (Hz)
Voiceless
Voiced
Spectral Peak Location (Hz)
Pla
ceofA
rtic
ula
tion
Figure 5–1. Mean spectral peak location as a function of place and voicing
67
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Dental Alveolar Uvular Pharyngeal
Place of Articulation
Spectr
al peak locati
on (
Hz)
Voiced
Voiceless
Place of Articulation
Spect
ralPeak
Loca
tion
(Hz)
Figure 5–2. Place of articulation and voicing interaction for spectral peak location
68
length were present (Figure 5–3), subsequent post hoc tests adjusted for multiple
comparisons using the Bonferroni method showed that frequency of spectral peak
location measured in the context of either /u/ or /u:/ was significantly lower than
spectral peak location measured in the context of either /i/ or /i:/. Moreover,
spectral peak location of fricatives preceding /u/ had significantly lower frequencies
than in the context of all other vowels except as noted above for the /u-u:/
contrast.
0
1000
2000
3000
4000
5000
6000
/ i / / a / / u /
Vowel Context
Spectr
al peak locati
on (
Hz)
short long
Place of Articulation
Spect
ralPeak
Loca
tion
(Hz)
Figure 5–3. Frequency of spectral peak location in different vowel contexts
A significant [F (40, 561) = 1.441, p < 0.05; η2 = 0.093] Place by Vowel context
interaction with subsequent Bonferroni post hoc tests showed that the effect of
vowel context mentioned above was confined only to alveolar and glottal fricatives.
As apparent from Figure (5–4) and Figure (5–5), both /u/ and /u:/ resulted in a
significantly lower frequency of spectral peak location in alveolar fricatives than
69
all other vowels. In the case of glottal fricative /h/, the short high back vowel /u/
(mean =935 Hz) introduced a significantly lower spectral peak frequency only when
compared to /i/ and /i:/ (mean =2243 Hz and 2363 Hz respectively). Although
the frequency of the spectral peak location of /sQ/ in the context of /u/ was about
2396 Hz lower than that of /a, i/, such a difference was only marginally significant
(p = 0.051).
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
/ i // u // a /
Place of Articulation
Spect
ralpeak
loca
tion
(Hz)
/h//è, Q//X, K//S//sQ//s, z//DQ//T, D//f/
Figure 5–4. Mean frequency of spectral peak location as a function of place andshort vowels
5.2 Spectral Moments
The first four statistical moments were computed from three 40 ms windows
located at the onset, middle, and offset of the frication and from a 40 ms window
centered at the fricative offset to capture any transitional information into the
vowel. In this section, two analyses are presented for each moment. Specifically, to
capture the general trend of spectral moments, separate one-way ANOVAs were
70
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
/ i: // u: // a: /
Place of Articulation
Spect
ralPeak
Loca
tion
(Hz)
/h//è, Q//X, K//S//sQ//s, z//DQ//T, D//f/
Figure 5–5. Mean frequency of spectral peak location as a function of place andlong vowels
71
conducted for place and voice with moments across window locations as dependent
variables. Additionally, a preliminary one-way ANOVA test of differences between
moments computed at different windows showed a main effect for window location
for all moments. Therefore, separate three-way ANOVAs (place × vowel × voicing)
with subsequent Bonferroni post hoc tests were conducted for each moment and
window location combination. A summary of the spectral moments collapsed across
speakers, vowel context, and window locations are presented in Table (5–2).
5.2.1 Spectral Mean
One-way ANOVAs for place and voicing were carried out utilizing spectral
mean measurements across the four window locations as the dependent
variable. The ANOVA revealed a main effect for Place of articulation
[F (8, 2487) = 210.567, p < 0.001; η2 = 0.403]. Subsequent Bonferroni post
hoc tests were conducted for voiceless and voiced fricatives separately. For voiced
fricatives, spectral mean was highest for alveolar /z/ (5935 Hz) and lowest for
pharyngeal /Q/ (1547 Hz). Differences in spectral means for all contrasts within
voiced fricatives were significant, with the exception of the contrast between plain
dental /D/ and its pharyngealized counterpart/DQ/. As for voiceless fricatives,
alveolar /s/ had the highest spectral mean (5546 Hz), while glottal /h/ had the
lowest (2513 Hz). Also, with the exception of the nonsibilants (/f, T/), spectral
mean tends to decrease as the fricative articulation moves towards the back
of the mouth. Additionally, as was the case in spectral peak location (Section
5.1), three categories containing fricatives articulated in adjacent places (/f, T,
s, sQ/, /S, X/ and /Q, h/) were observed to have no within-group differences that
were statistically significant. Only comparisons involving members of different
groups were significant. The only exception to this general observation was with
the first group in which the contrast between labiodental /f/ (4802 Hz) and
alveolar /s/ (5546 Hz) was significant. A main effect was also obtained for Voicing
72
Tab
le5–
2.Spec
tral
mom
ents
for
pla
cean
dvo
ice
aver
aged
acro
ssal
lw
indow
loca
tion
s.
Pla
ceSpec
tral
Mea
nV
aria
nce
Ske
wnes
sK
urt
osis
ofA
rtic
ula
tion
(Hz)
(MH
z)Lab
ioden
tal
Voi
cele
ss48
025.
970.
702.
96
Den
tal
Voi
ced
3999
6.91
0.65
1.15
Voi
cele
ss52
665.
990.
250.
7246
336.
450.
450.
93
Alv
eola
rVoi
ced
5935
5.26
-0.0
60.
74Voi
cele
ss55
464.
390.
441.
0557
404.
830.
190.
89Pos
t-A
lveo
lar
Voi
cele
ss38
883.
611.
332.
38
Uvula
rVoi
ced
2396
4.38
1.79
6.48
Voi
cele
ss36
524.
401.
363.
9730
244.
391.
575.
23
Phar
ynge
alVoi
ced
1547
1.46
2.25
13.6
9Voi
cele
ss25
222.
452.
429.
7920
341.
962.
3411
.74
Phar
ynge
aliz
edD
enta
lV
oice
d39
107.
450.
842.
10
Phar
ynge
aliz
edA
lveo
lar
Voi
cele
ss52
574.
390.
691.
51
Glo
ttal
Voi
cele
ss25
134.
431.
764.
56
73
[F (1, 2494) = 59.025, p < 0.001; η2 = 0.023]. Collapsed across all speakers, place
and vowel contexts, voiceless fricatives had higher values for spectral mean (4181
Hz) than voiced fricatives (3557 Hz).
As mentioned above, values for spectral mean measured at different window
locations were statistically different [F (3, 2492) = 326.978, p < 0.001; η2 = 0.28].
Therefore, separate three-way ANOVAs (place × vowel × voicing) were carried
out for spectral mean at each window location. There was a main effect for place
of articulation for all window locations with η2 values of 0.736 (window 1), 0.830
(window 2), 0.790 (window 3) and 0.602 (window 4). The range of η2 indicates
that spectral information measured at these windows contributed with varying
degrees to the separation of fricatives according to their place of articulation. This
observation was confirmed by post hoc tests for differences performed on voiced
and voiceless fricatives separately. For voiced fricatives, across all windows, alveolar
fricative /z/ had the highest spectral mean while pharyngeal /Q/ had the lowest.
Additionally, spectral mean distinguished between all places of voiced fricatives in
all windows, with the exception of the contrasts between (/D/ and /DQ/) in the first
three windows and between any combination of (/K/, /Q/ and /DQ/) in the fourth
window (Figure 5–6). On the other hand, differences between voiceless fricatives
in terms of spectral mean measured at different windows were not as categorically
distinguishing as in the case of voiced fricatives. Nevertheless, as noted above,
three clusters containing fricatives articulated in adjacent places (/f, T, s, sQ/, /S, X/
and /è, h/) emerged as distinct groups for which no within-group differences were
significant with regard to spectral mean measured at the second (middle) and third
(offset) windows. However, all comparisons between members of different groups
were significant with spectral mean decreasing as the articulation moved backwards
in the mouth (Figure 5–6). Furthermore, spectral mean as measured at the first
(onset) window significantly differentiated between all places with the exception
74
of all possible contrast involving (/T, s, sQ/) and the contrast between (/è- h/).
Only alveolar /s/ was significantly different than all other voiceless fricatives at
the fourth (transitional) window. Moreover, at the onset and transitional windows,
differences observed elsewhere between /f/ and /T/ were not significant (Figure
5–6).
There was also a main effect for Voicing in all four windows. As can be seen
from Figure (5–7), spectral mean for voiceless fricatives was significantly higher
than voiced fricatives in the first three windows and significantly lower at the last
(transitional) window. Additionally, a significant Place by Voicing interaction
(Figure 5–8) revealed that alveolar fricatives /s, z/ were not significantly different
from each other in terms of spectral mean in all but the fourth window at which
the /s - z/ contrast was the only one reaching significance level (p < 0.05).
Finally, there was a main effect for Vowel context at all four windows. Spectral
mean was highest for fricatives preceding /i/ and /i:/, and lowest for fricatives
preceding either /u/ or /u:/. Pairwise comparisons for the different vowel contexts
at each window showed that the difference between any of the high front vowels (/i,
i:/) and either of /u/ and /u:/ was significant at all window locations. Additionally,
spectral mean of fricatives in the context of both /i, i:/ was significantly higher
than that in the context of either /a, a:/ at the fourth (transitional) window
(Figure 5–9).
5.2.2 Spectral Variance
One-way ANOVAs for Place and Voice were conducted with spectral variance
averaged across all window locations. A main effect for Place of articulation was
obtained [F (8, 2487) = 206.936, p < 0.001; η2 = 0.399], with the lowest variance
observed for sibilants and back articulated fricatives while the highest variance
was observed for nonsibilants. Table (5–2) shows mean variance values for all
fricatives measured in Megahertz (MHz). Bonferroni post hoc tests showed that
75
A LabiodentalB DentalC AlveolarD Post-AlveolarE UvularN Pharyngeal
G Pharyngealized
Dental
H Pharyngealized
AlveolarM Glottal
Place of Articulation
2000
3000
4000
5000
6000
7000
Spectr
al M
ean (
Hz)
B
B B
B
C
CC
C
E
EE
EN N N N
G
GG
G
Window Location
2000
3000
4000
5000
6000
7000
Spectr
al M
ean (
Hz)
A
A
A
A
B
B
B
B
C
C
C
C
D D
D
D
E
E
E
E
N N N
N
H
H
H
HM M M
M
onset middle o!set transition
A
B
Spect
ralM
ean
(Hz)
Spect
ralM
ean
(Hz)
Window Location
onset middle offset transition
Figure 5–6. Spectral mean (Hz) averaged across vowel contexts for each window asa function of place of articulation. A) voiced. B) voiceless.
76
0
1000
2000
3000
4000
5000
6000
1 2 3 4Window Location
Spectr
al M
ean (
Hz)
Voiced
Voiceless
Window Locationonset middle offset transition
Figure 5–7. Spectral mean (Hz) averaged across place and vowel contexts for eachwindow as a function of voicing.
77
0
2000
4000
6000
8000
Dental Alveolar Uvular Pharyngeal
Spec
tral
Mea
n (H
z)
0
2000
4000
6000
8000
Dental Alveolar Uvular Pharyngeal
Spec
tral
Mea
n (H
z)
voicedvoicceless
A B
0
2000
4000
6000
8000
Dental Alveolar Uvular Pharyngeal
Spec
tral
Mea
n (H
z)
0
2000
4000
6000
8000
Dental Alveolar Uvular Pharyngeal
Spec
tral
Mea
n (H
z)
C D
Figure 5–8. Place of articulation and voicing interaction for spectral mean at fourwindow locations. A) onset, B) middle, C) offset, and D) transition.
78
0
2000
4000
6000
/ i / / u / / a /
Spec
tral
Mea
n (H
z)
0
2000
4000
6000
/ i / / u / / a /Sp
ectr
al M
ean
(Hz)
short long
A B
0
2000
4000
6000
/ i / / u / / a /
Spec
tral
Mea
n (H
z)
0
2000
4000
6000
/ i / / u / / a /
Spec
tral
Mea
n (H
z)
C D
Figure 5–9. Spectral mean as a function of vowel context at four window locations.A) onset, B) middle, C) offset, and D) transition.
79
within voiced fricatives, spectral variance did not differentiate between plain dental
(/D/) and its pharyngealized counterpart (/DQ/). However, all other comparisons
within voiced fricatives were significant (p < 0.001). As for voiceless fricatives,
spectral variance for the nonsibilants /f, T/ was significantly higher than those of
all other places. However, spectral variance for the /f/ and /T/ themselves was not
significantly different. Moreover, spectral variance for /S/ and /è/ was significantly
lower than that of all other places. Another main effect was observed for Voicing
[F (1, 2494) = 39.778, p < 0.001; η2 = 0.016] with voiced fricatives having higher
variance (5.09 MHz) than voiceless fricatives (4.45 MHz).
Since a one-way ANOVA showed that overall spectral variance differed
significantly as a function of Window Location [F (3, 2492) = 33.742, p <
0.001; η2 = 0.04], multiple three-way ANOVAs (place × vowel × voicing) were
carried out for spectral variance at each window location. The ANOVAs revealed a
main effect for Place of Articulation [F (8, 561) = 104.502 (onset), 98.597 (middle),
137.024 (offset), 55.05 (transition); p < 0.001; η2 = 0.6 (onset), 0.58 (middle),
0.66 (offset), 0.44 (transition)]. As apparent from Figure (5–10), for both voiced
and voiceless fricatives, nonsibilants (/f, T, D, DQ/) had the highest variance while
pharyngeal fricatives (/è, Q/) had the lowest variance. Pairwise comparisons
within voiced fricatives showed that only the difference between /D - DQ/ was not
significant at all windows. With the exception of the /D - DQ/ contrast, spectral
variance differentiated between all places of articulation within voiced fricatives
at all window locations. On the other hand, spectral variance did not differentiate
between voiceless fricatives in the same manner as it did with voiced fricatives.
Specifically, spectral variance was able to distinguish between any combination
of voiceless fricatives either at the second or the third window (Figure 5–10).
The only exceptions are the expected lack of difference between /s, sQ/ and the
insignificant difference between /h, sQ/ at all windows. Additionally, as with voiced
80
fricatives, nonsibilant fricatives (/f, T/) had significantly higher variance than all
other voiceless fricatives in at least three of the four analysis windows.
As mentioned previously, a main effect of Voicing was observed with the
overall spectral variance. However, ANOVA’s conducted for individual windows
revealed that such effect was only present at the second (middle) window
[F (1, 561) = 9.973, p < 0.001; η2 = 0.017] with the expected increase in variance
for voiced fricatives (5.4 MHz compared to 4.5 MHz for voiceless fricatives).
Nevertheless, a significant Place by Voicing interaction was present at all analysis
windows. Bonferroni post hoc tests showed that the increase in spectral variance for
voiced fricatives as compared to voiceless fricatives was significant only for dentals
(/T, D/) at the second window; and for alveolars (/s, z/) at fourth window. Another
source of the interaction, as can be seen from Figure (5–11), is due to an increase
in spectral variance for voiceless, rather than voiced, pharyngeal fricatives. Such an
increase, and subsequent shift in the voicing effect, was present at all windows but
significant only at the fricative-vowel boundary (windows three and four).
There was also a main effect for Vowel context (p < 0.0001) in all but the first
analysis window. The source for this effect as revealed by post hoc tests is twofold:
first, there was a significant increase in spectral variance for fricatives preceding
either /u/ or /u:/ as compared to all other vowels in the second (middle) and third
(offset) windows (Figures 5–12A and B); and second, the variance of fricatives
preceding /i/ and /i:/ was significantly higher than that of either /a/ or /a:/ in the
fourth window (Figure 5–12C).
5.2.3 Spectral Skewness
A one-way ANOVA for spectral skewness across all window locations showed a
significant main effect for Place [F (8, 2487) = 137.975, p < 0.001; η2 = 0.31], with
skewness ranging from 2.34 for pharyngeal (/è, Q/) to 0.19 for alveolar fricatives
(/s, z/). Subsequent Bonferroni post hoc tests indicated that for both voiced and
81
A LabiodentalB DentalC AlveolarD Post-AlveolarE UvularN Pharyngeal
G Pharyngealized
Dental
H Pharyngealized
AlveolarM Glottal
Place of Articulation2
4
6
8
Spectr
al V
ari
ance (
MH
z)
BB B
BC
CC
C
E
EE
E
N N NN
GG
G
G
Window Location
2
4
6
8
Spectr
al V
ari
ance (
MH
z)
A A
A
A
B
B
B
BC
C
C
C
D DD
D
E
E
E
EN
N
N N
H
H HH
M M M
M
onset middle o!set transition
A
B
Spect
ralV
ari
ance
(MH
z)Spect
ralV
ari
ance
(MH
z)
Window Location
onset middle offset transition
Figure 5–10. Spectral variance (MHz) averaged across vowel contexts for eachwindow as a function of place of articulation. A) voiced. B) voiceless.
82
0
1
2
3
4
5
6
7
8
Dental Alveolar Uvular Pharyngeal
Spec
tral
Var
ianc
e (M
Hz)
0
1
2
3
4
5
6
7
8
Dental Alveolar Uvular PharyngealSp
ectr
al V
aria
nce
(MHz
) voiced
voiceless
A B
0
1
2
3
4
5
6
7
8
Dental Alveolar Uvular Pharyngeal
Spec
tral
Var
ianc
e (M
Hz)
0
1
2
3
4
5
6
7
8
Dental Alveolar Uvular Pharyngeal
Spec
tral
Var
ianc
e (M
Hz)
C D
Figure 5–11. Place of articulation and voicing interaction for spectral variance atfour window locations. A) onset, B) middle, C) offset, and D)transition.
83
0
1
2
3
4
5
6
7
/ i / / u / / a /
Spec
tral
Var
ianc
e (M
Hz)
Short Long
0
1
2
3
4
5
6
7
/ i / / u / / a /Sp
ectr
al V
aria
nce
(MHz
)
A B
0
1
2
3
4
5
6
7
/ i / / u / / a /
Spec
tral
Var
ianc
e (M
Hz)
C
Figure 5–12. Spectral variance as a function of vowel context at three windowlocations. A) middle, B) offset, and C) transition.
84
voiceless fricatives, skewness did not differentiate between plain fricatives and
their pharyngealized counterparts (/D - DQ, s - sQ/). However, besides the exception
noted above, all voiced fricatives were significantly different from each other in
terms of skewness (means are reported in Table (5–2). Within voiceless fricatives,
skewness significantly differentiated among nonsibilants /f/ and /T/ (0.7 and 0.25
respectively). However, skewness did not distinguish nonsibilants from either /s/
or /sQ/ or between /S/ and / X/. All other voiceless fricatives were significantly
different from each other in terms of spectral skewness. The effect of voicing on
spectral skewness was not significant (p = 0.67).
Due to the previously mentioned significant differences between skewness
measured at different windows [F (3, 2492) = 145.382, p < 0.001; η2 = 0.15], a
three-way ANOVA (place × vowel × voicing) was conducted for spectral skewness
at each window location. A main effect for Place was obtained at all window
locations. With the exception of /D - DQ/ contrast, pairwise comparisons showed
that all voiced fricatives were significantly different from each other in term of
spectral skewness at the second (middle) and third (offset) windows (Figure
5–13). Pharyngeal /Q/ had the highest skewness, indicating a concentration of
energy at frequencies lower than for all other voiced fricatives, while the negative
skewness obtained for /z/ indicates a concentration of energy at higher frequencies.
Interestingly the difference in skewness between dental and pharyngealized dental
(/D - DQ/) reached significance (p = 0.008) only at the fourth window located at
fricative-vowel transition (Table 5–3). The lack of a significant difference between
plain fricatives and their pharyngealized counterparts was also present for voiceless
fricatives /s - sQ/ at all window locations. As can be seen in Table (5–4), skewness
differentiated between all voiceless fricatives in at least two windows with the
notable exception of the /S - h/ contrast, which was significant only at the fourth
window (transition). If the number of places distinguished in term of skewness
85
A LabiodentalB
DentalC
AlveolarD
Post-AlveolarE
UvularN Pharyngeal
G Pharyngealized
Dental
H PharyngealizedAlveolar
M Glottal
Place of Articulation
-1.00
-0.50
0.00
0.50
1.00
1.50
2.00
2.50
3.00
Skew
ness
B
B B
B
C
CC
C
E
E
E
EN N N N
G
G G
G
Window Location
-1.00
-0.50
0.00
0.50
1.00
1.50
2.00
2.50
3.00
Skew
ness
A
A
A
A
B
B
B
B
C
C
C
CDD
DD
E
EE
E
N NN
N
H
H
H
HM M M
M
A
B
onset middle o!set transition
Spect
ralSkew
ness
Spect
ralSkew
ness
Window Location
onset middle offset transition
Figure 5–13. Spectral skewness averaged across vowel contexts for each window asa function of place of articulation. A) voiced. B) voiceless.
86
differences at a given window is used as an indicator to that window’s distinctive
spectral information, windows placed at the middle and offset of frication noise
were more successful in distinguishing between voiceless fricatives than others
(Tables 5–3 and 5–4).
Table 5–3. Window locations at which a difference between voiced fricatives interms of spectral skewness are significant.
/D/ /z/ /K/ /Q//z/ 1 2 3 4/K/ 1 2 3 4 1 2 3 4/Q/ 1 2 3 4 1 2 3 4 φ 2 3 φ/DQ/ φ φ φ 4 1 2 3 4 1 2 3 φ 1 2 3 φ
φ indicates absence of significant differences
Table 5–4. Window locations at which a difference between voiceless fricatives interms of spectral skewness are significant.
/f/ /T/ /s/ /S/ /X/ /è/ /sQ//T/ 1 φ φ 4/s/ φ 2 φ 4 1 φ φ 4/S/ 1 2 3 4 1 2 3 φ 1 2 3 φ/X/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ 3 4/è/ 1 2 3 φ 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 φ/sQ/ φ 2 φ 4 1 2 3 φ φ φ φ φ 1 2 3 φ φ 2 φ 4 1 2 3 4/h/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ φ 4 φ 2 3 φ 1 2 3 φ 1 2 3 4
φ indicates absence of significant differences
Although the effect of voicing was not significant for the overall skewness, a
main effect for Voicing was obtained at all but the third (offset) window. At both
frication onset and middle windows, voiceless fricatives had significantly (p < 0.001)
lower skewness than voiced fricatives; while skewness measured at the fricative-
vowel transition was significantly (p < 0.0001) higher for voiceless fricatives than
voiced ones (Figure 5–14). Also, a Place by Voicing interaction was significant
at all but the last (transition) window. In general, the reduction in skewness for
voiceless fricatives when compared to voiced fricatives as noted in the main effect
above was reversed for alveolar and pharyngeal fricatives in the first three windows;
and for all fricatives in the fourth window (Figure 5–15). However, this increase in
87
skewness for voiceless fricatives was only significant (p < 0.05) for alveolar fricatives
at the fourth (transition) window.
0
0.5
1
1.5
2
2.5
1 2 3 4
Window Location
Spectr
al Skew
ness
Voiced Voiceless
Window Location
Spect
ralSkew
ness
0
0.5
1
1.5
2
2.5
onset middle offset transition
Figure 5–14. Spectral skewness averaged across place and vowel contexts for eachwindow as a function of voicing.
The ANOVAs also revealed a main effect of Vowel context at all window
locations. The magnitude of the effect becomes larger as the window moves closer
to the vowel (η2 = 0.028 at frication mid-piont, 0.037 at frication offset and 0.31 at
fricative-vowel transition). The source of such effect, as illustrated in Figure (5–16)
and associated Bonferroni post hoc tests, is attributed to the significant decrease
in fricative skewness in the context of short /i/ and long /i:/. Specifically, long
/i:/ resulted in significantly lower skewness than long /u:/ in all but the second
window, while short /i/ resulted in significantly lower skewness than short /u/
in the first and fourth windows. Additionally, differences between high front and
88
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Dental Alveolar Uvular Pharyngeal
Spec
tral
Ske
wne
ss
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Dental Alveolar Uvular PharyngealSp
ectr
al S
kew
ness
voiced voiceless
A B
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Dental Alveolar Uvular Pharyngeal
Spec
tral
Ske
wne
ss
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Dental Alveolar Uvular Pharyngeal
Spec
tral
Ske
wne
ss
C D
Figure 5–15. Place of articulation and voicing interaction for spectral skewness atfour window locations. A) onset, B) middle, C) offset, and D)transition.
89
low front vowels (/i, i:/ and /a, a:/) were significant only at the transition window
(Figure 5–16D).
0
0.2
0.4
0.6
0.8
1
1.2
1.4
/ i / / u / / a /
Spec
tral
Ske
wne
ss
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
/ i / / u / / a /
Spec
tral
Ske
wne
ss
shortlong
A B
0
0.2
0.4
0.6
0.8
1
1.2
/ i / / u / / a /
Spec
tral
Ske
wne
ss
0
0.5
1
1.5
2
2.5
3
/ i / / u / / a /
Spec
tral
Ske
wne
ss
C D
Figure 5–16. Spectral skewness as a function of vowel context at four windowlocations. A) onset, B) middle, C) offset, and D) transition.
5.2.4 Spectral Kurtosis
One-way ANOVAs testing for effects of place and voice with spectral kurtosis
measurements across the four windows as the dependent variable revealed a main
effect of Place [F (8, 2487) = 99.567, p < 0.001; η2 = 0.24]. Bonferroni post
hoc tests conducted on voiced fricatives showed that only kurtosis of uvular /K/
(6.5) and pharyngeal /Q/ (13.7) were significantly higher than all other voiced
90
fricatives. As for within voiceless fricatives, kurtosis significantly differentiated
between the nonsibilants /f/ and /T/ with a mean of 2.96 and 0.72 respectively.
Moreover, pharyngeal /è/ with kurtosis of 9.8 was significantly higher than all
other voiceless fricatives. The ANOVA also revealed a main effect of Voicing
[F (1, 2494) = 22.922, p < 0.001; η2 = 0.01] in which voiceless fricatives
had significantly lower kurtosis than voiced fricatives (mean of 3.376 and 4.83
respectively).
A one-way ANOVA showed that kurtosis differed significantly as a function
of Window location [F (3, 2492) = 67.968, p < 0.001; η2 = 0.076], with the
fourth (transition) window registering the highest values for kurtosis. Therefore, a
three-way ANOVA (place × vowel × voicing) was conducted for spectral kurtosis
at each window location. The results of the three-way ANOVAs showed a main
effect of Place at all window locations. With the exception of the fourth window,
the magnitude of the effect becomes larger as the window advances towards the
fricative-vowel boundary (η2 of the first three windows was 0.34, 0.46 and 0.51
respectively). Subsequent Bonferroni post hoc tests at each window were carried
out for voiced and voiceless fricatives separately (Figure 5–17). Within voiced
fricatives, no significant differences were observed with all possible contrasts
between /D, DQ, z/ at all windows with the exception of the /DQ - z/ contrast,
which reached significance level (p < 0.05) at the fourth window only. Moreover,
while kurtosis of pharyngeal /Q/ was significantly higher than uvular /K/ in
all but the last (transition) window, each of the two fricatives had significantly
higher (p < 0.01) kurtosis than all other voiced fricatives in the first and third
window. A similar pattern was also observed with voiceless fricatives. Specifically,
voiceless pharyngeal fricative /è/ had significantly higher kurtosis than all other
voiceless fricatives in the second (mean =11.6) and third analysis windows (mean
=10.8). Also, as was the case with /D - DQ/ contrast, no difference was obtained
91
A LabiodentalB DentalC AlveolarD Post-AlveolarE UvularN Pharyngeal
G Pharyngealized
Dental
H Pharyngealized
AlveolarM Glottal
Place of Articulation
0
5
10
15
Kurt
osis
BB
B
B
C C CC
E
E
E
EN
N N
N
G GG
G
Window Location
0
5
10
15
Kurt
osis
A AA
A
B BB
B
CC
C
CDD
D
D
E
EE
E
NN
N
N
HH H
H
M M M
M
A
onset middle o!set transition
B
Spect
ralK
urt
osi
sSpect
ralK
urt
osi
s
Window Locationonset middle offset transition
Figure 5–17. Spectral kurtosis averaged across vowel contexts for each window as afunction of place of articulation. A) voiced. B) voiceless.
92
between plain alveolar /s/ and its pharyngealized counterpart /sQ/ at all windows.
Additionally, while kurtosis of glottal /h/ was significantly lower than that of
pharyngeal /è/ at all windows, it was significantly higher than kurtosis of /S/
in the fourth window and significantlly higher than all other remaining voiceless
fricatives in the second and third windows (Figure 5–17).
A main effect of Voicing was also obtained at all but the fourth window.
Similar to the effect observed with the overall kurtosis, voiceless fricatives in the
aforementioned windows had significantly lower kurtosis than voiced fricatives
(Figure 5–18). The size of this effect was rather small and generally decreased
in the middle window (η2 of the first three windows was 0.05, 0.03 and 0.06
respectively). Moreover, a Place by Voicing interaction was also significant at the
first three windows. Basically, as suggested by the corrosponding post hoc tests
shown in Figure (5–19), the effect of voicing was significant (p < 0.05) for uvulars
/K, X/ at frication onset, for pharyngeals /è, Q/ at the middle of frication noise and
for both uvular and pharyngeal places of articulation at the frication offset.
Finally the effect of vowel context was observed only at the edges of the
frication noise: frication onset [F (5, 561) = 3.068, p < 0.001; η2 = 0.03]; and
transition into the vowel [F (5, 561) = 17.406, p < 0.001; η2 = 0.134]. Subsequent
Bonferroni post hoc tests carried out at these windows showed that the source of
the main effect is due to the significant decrease in kurtosis for a fricative preceding
/i:/ as compared only to /u/ at the onset window (Figure 5–20A); and due to
the greater decrease in kurtosis for fricatives preceding short /i/ and long /i:/
as compared to all other vowels at the transition window (Figure 5–20B). The
difference between long /i:/ and long /u:/ was marginally significant (p = 0.056) at
the onset window.
93
0
1
2
3
4
5
6
7
8
9
1 2 3 4Window Location
Spectr
al Kurt
osis
voiced voiceless
Spect
ralK
urt
osi
s
0
1
2
3
4
5
6
7
8
9
Window Locationonset middle offset transition
Figure 5–18. Spectral kurtosis averaged across place and vowel contexts for eachwindow as a function of voicing.
94
-2
0
2
4
6
8
10
12
14
16
Dental Alveolar Uvular Pharyngeal
Spec
tral
Kur
tosi
s
-2
0
2
4
6
8
10
12
14
16
Dental Alveolar Uvular PharyngealSp
ectr
al K
urto
sis
voiced voiceless
A B
-2
0
2
4
6
8
10
12
14
16
Dental Alveolar Uvular Pharyngeal
Spec
tral
Kur
tosi
s
C
Figure 5–19. Place of articulation and voicing interaction for spectral kurtosis atfour window locations. A) onset, B) middle, and C) offset.
95
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
/ i / / u / / a /
Spec
tral
Kur
tosi
s
0
2
4
6
8
10
12
/ i / / u / / a /
Spec
tral
Kur
tosi
s
short long
A B
Figure 5–20. Spectral kurtosis as a function of vowel context at two windowlocations: A) onset and B) transition.
CHAPTER 6FORMANT TRANSITION
This chapter reports on acoustic measurements related to spectral information
at the fricative-vowel transition that might help distinguish between the different
places of fricative articulation. The first measurement reported is the frequency of
the second formant (F2) measured in Hertz from a 25-ms kaiser window placed at
the vowel onset. The second measurement is the coefficients of regression line fits
with scatterplots of F2 at the vowel’s onset (y-axes) and mid-point (x-axes) derived
for each place and speaker and averaged across voicing and vowel context.
6.1 Second Formant (F2) at Transition
Table (6–1) presents the F2 values at the onset of the vowel for each place of
articulation and voicing, averaged across speakers and vowel context. The results of
a three-way ANOVA (place × voicing × vowel) showed a significant main effect for
Place of articulation [F (8, 561) = 97.988, p < 0.0001; η2 = 0.58]. Subsequent post
hoc tests were carried out separately on voiced and voiceless fricatives. For both
voiced and voiceless fricatives, pharyngealized fricatives (/DQ/ 1164 Hz and /sQ/
1288 Hz) had significantly lower F2 frequencies than their plain counterparts (/D/:
1603 Hz and /s/: 1636 Hz). In fact, within voiced fricatives /DQ/ had a significantly
lower frequency than all voiced fricatives with the exception of uvular /K/. While
upholding the lack of significance between /DQ - K/, voiced uvular /K/ also had
a significantly lower F2 frequency (1171 Hz) than all other voiced fricatives. No
other contrasts within voiced fricatives were statistically significant.
A similar pattern was also observed within voiceless fricatives. Specifically, as
was the case for voiced fricatives, there was a lack of significant difference between
pharyngealized and uvular fricatives (/sQ - X/ in this case), and between dental and
96
97
alveolar fricatives (/T - s/). Moreover, the F2 frequencies of both pharyngeal /è/
and glottal /h/ were statistically similar to /f/, /T/ and /s/ (means are reported in
Table (6–1)). Additionally, no significant difference was obtained between uvular
and pharyngeal (/X - è/). All other contrasts between voicless fricatives were
significant (p < 0.05 for within non-sibilants and p < 0.0001 for other contrasts).
Table 6–1. Mean values of F2 (Hz) at transition averaged across speakers andvowel context as a function of place and voicing.
Place of Articulation F2 at transition (Hz) meanLabiodental Voiceless 1496
Dental Voiced 1603Voiceless 1602
1602Alveolar Voiced 1633
Voiceless 16361634
Post-Alveolar Voiceless 1742
Uvular Voiced 1171Voiceless 1325
1248Pharyngeal Voiced 1555
Voiceless 15891572
Pharyngealized Dental Voiced 1164
Pharyngealized Alveolar Voiceless 1288
Glottal Voiceless 1565
The ANOVA also revealed a main effect of Voicing [F (1, 561) = 9.145, p <
0.005; η2 = 0.016], with voiceless fricatives registering higher F2 frequencies than
voiced fricatives (mean 1530 and 1425 respectively). However, a significant Place
by Voicing interaction [F (3, 561) = 5.337, p < 0.002; η2 = 0.028] and subsequent
Bonferroni post hoc tests (Figure 6–1) showed that such effect was limited to uvular
fricatives.
98
1000
1100
1200
1300
1400
1500
1600
1700
Dental Alveolar Uvular Pharyngeal
F2
at
Vow
el
On
set
(Hz)
voiced
voiceless
Figure 6–1. Place of articulation and voicing interaction for F2 (Hz) measured atvowel onset.
99
There was also a main effect of Vowel context [F (5, 561) = 221.237, p <
0.0001; η2 = 0.66]. As expected, F2 (measured at the onset of high front vowels /i,
i:/ with mean frequency of 1708 and 1919 Hz respectively) were significantly higher
than all other vowels (p < 0.0001). Also, the F2 frequencies of back vowels (/u, u:/
with means of 1209 and 1259 Hz respectively) were significantly lower than those of
all other vowel contexts (p < 0.0001). The mean frequency of F2 at /a/ onset was
1435 Hz and 1409 Hz for /a:/. The effect of vowel length on F2 frequency was not
significant except for the /i -i:/ contrast, for which long vowels introduced higher
F2 frequencies.
0
500
1000
1500
2000
2500
/ i / / a / / u /
F2 a
t V
ow
el
On
set
(Hz)
short long
Figure 6–2. F2 (Hz) measured at vowel onset as a function of vowel context.
100
6.2 Locus Equation
Locus equation coefficients for every place of articulation were obtained for
each of the eight speakers in our study (8 speakers × 9 places of articulation).
Specifically, a linear regression fit was applied on scatterplots with F2 values
averaged across all vowel contexts. Each scatterplot had F2 measured at the onset
of the vowel represented on the y-axes and F2 measured at the mid-point of the
vowel represented on the x-axes. The coefficients of each regression line (the slope
‘k’ and the y-intercept ‘c’) were taken to be the terms of locus equations. An
example plot is presented in Figure (6–3).
y = k x + cy = 0.5837 x + 666.25
0
500
1000
1500
2000
2500
0 500 1000 1500 2000 2500
F2 Frequency (Hz) at Vowel mid-point
F2 F
req
uen
cy (
Hz)
at
Vow
el o
nse
t
Figure 6–3. An example of a scatterplot to derive coefficients of locus equation.
Table (6–2) presents mean slope and y-intercept values for each place of
articulation averaged across vowel contexts. A one-way ANOVA for slope showed
101
a main effect for Place of Articulation [F (8, 63) = 15.092, p < 0.001; η2 = 0.66].
Pharyngealized fricatives had the lowest slope (0.168 for /DQ/ and 0.399 for /sQ/),
while glottal /h/ had the highest (mean slope of 0.924). However, post hoc tests
revealed that the slope for pharyngealized dental /DQ/ was significantly different
from all other plain (non-pharyngealized) fricatives. Furthermore, the high slope
of /h/ was significantly different from all other fricatives with the exception
of uvular fricatives /X, K/. The slope of pharyngealized alveolar /sQ/ was only
significantly different from uvular fricatives. No other contrasts were significant.
On the other hand, a one-way ANOVA for y-intercept revealed a main effect for
place [F (8, 63) = 10.313, p < 0.001; η2 = 0.57]. Glottal /h/ and uvular fricatives
/X, K/ had the lowest y-intercept values (160 and 289 Hz respectively), while the
highest y-intercept value was observed for post-alveolar fricative /S/ (956 Hz).
Although no significant differences between y-intercept of /h/ and /X, K/ were
observed, Bonferroni post hoc tests showed that y-intercept for /h/ was significantly
lower than all other places of articulation. Additionally, the y-intercept values for
uvular fricatives were significantly lower than all other places of articulation with
the exception of labiodental and pharyngeal fricatives (/f/ and /Q, è/). No other
significant differences were obtained.
Table 6–2. Mean slope and y-intercept values for each place of articulationaveraged across vowel contexts.
Placeslope y-intercept
of ArticulationLabiodental 0.565 652
Dental 0.507 825Alveolar 0.451 930
Post-Alveolar 0.502 956Uvular 0.692 289
Pharyngeal 0.579 665Pharyngealized Dental 0.168 938
Pharyngealized Alveolar 0.399 751Glottal 0.925 160
CHAPTER 7STATISTICAL CLASSIFICATION OF FRICATIVES
Discriminant Function Analysis (DFA) was used to determine the most
parsimonious way to distinguish among the different places of articulation using the
acoustic cues investigated in our study (descriptive DFA). Furthermore, DFA was
used here to assess the contribution of each selected cue to the overall classification
of fricatives into their places of articulation. Also, to get a more realistic indication
of the use of these cues in distinguishing unknown tokens, a cross-validation
method was used with the obtained discriminant functions (predictive DFA).
All acoustic variables investigated in our study were used in the DFA procedure
with the exception of locus equations since they do not reflect measures of single
tokens, but rather the coefficients of linear regression fits on aggregated data points
representing places of articulation for each speaker.
7.1 Discriminant Function Analysis
Discriminant function analysis is a statistical procedure that classifies tokens
into two or more mutually exclusive a priori groups (i.e., place of articulation)
using a set of predictors (i.e., acoustic cues) (Klecka 1980; Hair, Anderson,
and Tatham 1987; Stevens 2002). A discrimination function consists of a linear
combination of one or more variables that maximizes the distance (i.e., differences)
between the groups being classified. In our study, for both descriptive and
predictive DFA, predictors were entered into the analysis using a step-wise method
in which only the predictor that minimized Wilks’ Lambda (Λ) statistic, also known
as U-statistic, would be entered at any given step. The criteria for entry was set
at p = 0.05 and at p = 0.10 for removal. Also, since the levels of the dependent
variables (i.e., places of articulation) have unequal numbers of cases due to lack of
102
103
voicing contrast in some places, the prior probabilities for group membership were
calculated from the group size (Table 7–1).
Table 7–1. Prior probabilities for group membership
Cases UsedPlace Prior in AnalysisLabiodental 0.077 48Dental 0.154 96Alveolar 0.154 96Post-Alveolar 0.077 48Uvular 0.154 96Pharyngeal 0.154 96Pharyngealized Dental 0.077 48Pharyngealized Alveolar 0.077 48Glottal 0.077 48Total 1 624
The number of discriminant functions obtained by the DFA procedure is the
smallest of (g − 1), where g is the number of groups, or (k), where k is the number
of predictors. In our study the number of discriminant functions obtained was
eight and all were significant (p < 0.001). Table (7–2) shows the percentage of
variance accounted for by each of the eight functions. Although all functions were
significant, we limited our interpretation to the first three functions since they were
the ones contributing the most to the accumulative variance as inferred from their
eigenvalues and the canonical correlation associated with these functions (Table
7–2).
7.2 Classification Accuracy of DFA
Before interpreting the classification results obtained from DFA procedure,
an assessment of the validity of the current model and its accuracy was carried
out. For any classification method, a certain percentage of any performance can be
attributed solely to random chance. Therefore, for the current classification model
derived from DFA to be valid, it needs to classify cases in a manner better than
if the classification was done based on chance. Since the group sizes are unequal
104
Table 7–2. The amount of the variance accounted for by each of the functionscalculated by the DFA.
Function Eigenvalue % of Variance Cumulative % Canonical Correlation1 5.224 43.0 43.0 0.9162 3.651 30.1 73.1 0.8863 1.894 15.6 88.7 0.8094 0.470 3.9 92.5 0.5665 0.387 3.2 95.7 0.5286 0.244 2.0 97.7 0.4437 0.177 1.5 99.2 0.3888 0.098 0.8 100.0 0.298
in our study, the determination of the chance classification were done using two
criteria: the proportional chance criterion (Cpro) and maximum chance criterion
(MCC) (Hair et al. 1987). The proportional chance criterion is a measure of the
average probability of classification calculated considering all group sizes, while the
MCC is the percentage of the total sample represented by the largest group. Given
the total number of cases and groups in our study, MCC was estimated to be 15.4%
and Cpro to be 12.4%. However, both measures serve only as subjective reference
points for model accuracy. In fact, there is no general consensus on how high the
classification accuracy should be in relation to chance. However, Hair et al. (1987)
suggest that it should be at least one fourth greater than classification by chance.
Subsequently, the current model should achieve an overall classification rate higher
than 19.25% (1.25 × MCC) to be valid. Proportional and maximum chance criteria
were calculated as in Equations (7–1) and Equation (7–2), respectively, where N =
total number of cases, g = number of groups, n = number of cases in a group and
gmax = group with largest number of cases.
Cpro = 100×g∑
i=1
(ni
N
)2
(7–1)
MCC = 100× ngmax
N(7–2)
105
It is important to note that both proportional and maximum chance criteria
are subjective in nature. To circumvent this issue, Press’ Q statistic (Equation 7–3)
was used as an additional measurement of model accuracy. Significance of Press’ Q
statistic is assessed using a chi-square (χ2) distributed with one degree of freedom.
This value will be calculated below for both sets of classification results (descriptive
and predictive DFAs). The value ncorrect in Equation (7–3) denotes the number of
correctly classified cases.
Q =
(N −
(ncorrect × g
))2
N − (g − 1)(7–3)
7.3 Classification Power of Predictors
The standardized canonical function coefficients indicate the partial
contribution of each variable to the discriminant function(s), controlling for
other independents entered in the equation and are used to assess each independent
variable’s unique contribution to the discriminant function (Klecka 1980; Hair et al.
1987). Based on these coefficients, spectral mean (frication noise onset, middle,
and offset), skewness (onset, offset of frication and transition into the vowel),
second formant at vowel onset, normalized RMS amplitude and spectral peak
location were identified to be the variables contributing the most to the overall
classification.
7.4 Classification Results
As mentioned above, the first goal of DFA implementation in our study was
to find the degree to which the acoustic cues investigated here would successfully
classify fricatives. To that effect, DFA revealed that 83.2% of the original grouped
cases were successfully classified into their respective places of articulation using
discriminant functions derived from the acoustic measurements investigated in our
study. Furthermore, when the data was split into voiced and voiceless subgroups,
106
the overall classification accuracy was 92.9% for voiced and 93.5% for voiceless
fricatives. This classification ratio exceeded both the maximum likelihood and the
proportional chance value. Additionally, the Press’s Q statistic (Q = 17.99) was
significant at 0.0001. Therefore, it can be concluded that the model investigated
was valid. In general, three groups can be identified using a two-dimensional
discrimination plane (Figure 7–1 and Figure 7–2).
A leave-one-out (also known as jackknife) classification procedure was also
used to cross-validate the discrimination functions derived above. In this procedure,
the data was split into two sets with discrimination functions obtained from all-
but-one subjects (training set) and then used to classify the cases of the remaining
subject (testing set). The procedure was repeated until each speaker was included
in the testing phase. The overall performance of the discrimination function was
taken to be the averaged score across all speakers. An overall correct classification
ratio of 79.3% was obtained using the cross-validation method outlined above.
When voicing was specified in the model, cross-validated correct classification ratios
of 87.9% and 89.8% were obtained for voiced and voiceless fricatives respectively.
Both procedures satisfy the criteria mentioned in Section (7.2) for model validity
(Cpro, MCC and Press’ Q).
The confusion matrices presented in Tables (7–3) to (7–8) show the percentage
of predicted class membership in terms of the fricative place of articulation.
Numbers in boldface represent correct classification rates while other numbers
represent misclassification rates. Generally speaking, DFA clustered the nine places
of fricative articulation into three groups: non-sibilants (/f, T, D, DQ/), sibilants (/s,
sQ, z, S/ and back-articulated fricatives (/K, X, è, Q, h/) with misclassification rarely
crossing the boundaries of these groups. Such observation was true even when
fricatives are partitioned according to voicing.
107
Table 7–3. Overall classification results of all fricatives.
Predicted Group MembershipPlace /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h//f/ 88 10 0 0 0 2 0 0 0/T, D/ 6 76 7 0 0 0 3 0 7/DQ/ 0 2 88 2 0 0 6 0 2/s, z/ 0 2 0 89 6 2 1 0 0/sQ/ 0 0 0 17 83 0 0 0 0/S/ 0 0 0 0 0 98 2 0 0/X, K/ 0 2 6 0 0 2 72 10 7/è, Q/ 0 0 0 0 0 0 5 87 8/h/ 0 0 2 0 0 0 8 10 79
Table 7–4. Cross-validated classification results of all fricatives.
Predicted Group MembershipPlace /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h//f/ 79 17 0 0 0 2 2 0 0/T, D/ 8 72 8 0 0 0 3 0 8/DQ/ 0 6 77 2 0 0 10 0 4/s, z/ 0 2 0 84 9 3 1 0 0/sQ/ 2 0 0 15 83 0 0 0 0/S/ 0 0 0 0 0 98 2.1 0 0/X, K/ 0 2 7 0 0 2 70 12 7/è, Q/ 0 0 0 0 0 1 7 82 9/h/ 0 0 2 0 0 0 8 13 77
108
A A
A
AAAA
A
AA
AA
A
A
A
A
A
AAA
A
AA A
A
AAA AA A
A AAA
AA
A A
AA
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
AA
AA
A
AAA
A
A
A
A
A
A
A
A A AA
AA
AA
A
A
A
A
A
AA
A
AAA
AA
A
A
A
AA A
AA
A
A
A
AA
AA
AA
A
AA
A
A
A A
A
A
A
A
AA
AA
AAA
A
A
A
A
A
AA
A A
A
AA
AA
A AAAA
A
A
AA
A
A
AA
A
A
A
A
A
A AA
A
AA
A
A
A
A
A
AAA A A
A
AA
AA
A
A
A A
AA AAA
AA
AA A
A
AAA
AA
AA
A
A
A
AA
A A
A
A
A
A
A
AA
A
AA
A
AA
A A
A
A
A
A
AA
A
AA
A
AA
AA
A
A
AA
AA
AA
A A
A AAA
A
AA A AA
AAAA
AA
A
A
AA
AA
AA
A A
A
A
AA
AA
AA
A
A
A
AA
AA
AA
A
A
A
A
A
A
A
A
A
AA
A
AAAA
A
AA
AA
AA
A
A
AAA
AA
AA
A
AAA
A
AA AA
A
A
AA
A
A
A
A A
A
A
A A
A
A
A
AA
AA A
AAA AA A
A A
A
A
A
A
AA
A
AA
A
A
A
A A
A
A
AA
A
A
A
A
AA AA A
A
A
A
A
A AAA
A
A
A
A
AAA
A
A
A
A
A
A A
A
A
AA
A
A
AA
A
A
A
AA
AA
AA
A AA
A A
AA
A
AA
AAA
A
A
AA AAA
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A A
AA
AA
AAA
A
A A
A AA
A
A
AA AAAA
A
A
AA
A
A
A
A
AA
A
AA AA
AA
A
A
A
AAA
A
AA
AA
A
A
AA AA
A
A
A
A
AA
A AA
AA
A
AA
AA
A A A
A AAA
A
AA
A AA
AA
AA A
A
A
A
A
A
AA
A
Labiodental
Dental
Alveolar
Post-Alveolar
Uvular
Pharyngeal
Pharyngealized Dental
Pharyngealized Alveolar
Glottal
Predicted Group
Figure 7–1. Discrimination plane for all fricatives.
109
Table 7–5. Overall classification results of voiced fricatives.
Predicted Group MembershipPlace /D/ /DQ/ /z/ /K/ /Q//D/ 89.6 8.3 0 2.1 0/DQ/ 8.3 87.5 0 4.2 0/z/ 0 0 100 0 0/K/ 6.3 4.2 0 89.6 0/Q/ 0 0 0 2.1 97.9
Table 7–6. Cross-validated classification results of voiced fricatives.
Predicted Group MembershipPlace /D/ /DQ/ /z/ /K/ /Q//D/ 83.3 8.3 2.1 6.3 0/DQ/ 14.6 75 0 10.4 0/z/ 0 0 100 0 0/K/ 6.3 6.3 0 83.3 4.2/Q/ 0 0 0 2.1 97.9
Table 7–7. Overall classification results of voiceless fricatives.
Predicted Group MembershipPlace /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h//f/ 79.2 16.7 0 0 2.1 2.1 0 0/T 8.3 91.7 0 0 0 0 0 0/s 0 2.1 87.5 8.3 2.1 0 0 0/sQ/ 0 0 18.8 81.3 0 0 0 0/S/ 0 0 0 0 100 0 0 0/X 0 2.1 0 0 2.1 91.7 4.2 0/è/ 0 0 0 0 0 6.3 93.8 0/h/ 0 0 0 0 0 0 6.3 93.8
110
Table 7–8. Cross-validated classification results of voiceless fricatives.
Predicted Group MembershipPlace /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h//f/ 83.3 12.5 0 0 2.1 2.1 0 0/T 6.3 93.8 0 0 0 0 0 0/s 0 0 91.7 8.3 0 0 0 0/sQ/ 0 0 10.4 89.6 0 0 0 0/S/ 0 0 0 0 100 0 0 0/X 0 0 0 0 0 97.9 2.1 0/è/ 0 0 0 0 0 2.1 97.9 0/h/ 0 0 0 0 0 0 6.3 93.8
AAA
AAAA
AAA
AAAA
AA
AAAA
AA
A A
A
AAA AA AA A
AA
AAA A
AAA
A
A
AAAA
AAA A
A
A AA
AAAAA
AA
AAAA
AAAAAAA
A
AA
A A
AA
AA
AAAA
AAA
AAA
AA
AA
AA
AA
AAAA
AAAAAA
AA AA
A A
AA AAAAA
AA A
A
AAA
AAAAA
AAA
AA
AA
A
A
A
A A
AA
AA
AA A AA
AA
A
AAA
AAA
AAA
A
AAAA
AA
AA
A AAAAAA
AAA AA
AAAA
AAA
AA A
A
AAAAAA AAA
AAA
A
AAAA
A
AA
AAAA
A
AA AAA
A
AA
AA
A
A AAA
AA
AA AA
A AA
A A
A
A
A A
A
A
AA
A
A
AAA
A AAA
A AAA AAAA
AAA
AAAAAA A AAA
A
AA
A
AA
A
AA
A
AA
AA
AAAA
A AAA
A A
AAA
AA
A A AAAAA
A
AAAA
AA
AA
A
AAAA
A AA
A
A
AAA
AAAA
AA
AAAA
A AAAA
AA
AAA
A AA
AAAA A
AA
AAA
AA
A
AA
A
A
A
A A
AA
A
A
AAAA
AA
AA
A
AAA
A
AA
AA
A
A
A
AAA A
AAAAAA
A
AAA
AAA A
AA
AAAA AA AA
A
AAA
AA
AAA
AA
A
AA AA
A
AA
AA
AAA
A
AAA AA
AAA
AA
AA
A
A
AA
A AAAA
A
AA
AA
A
A
AA A
A
A
A AAAA AAA
AA
AAA
AAA
A
AAAA
A
A
AAA
A
AAA
AAA A
A
AAA
AAAA
AA
A
AAAAA
AAA
A
A
A
A
AA
AA
A AAAA
A
A
A
A
AA
A
A
A
A
A
AA
A
AA
A
A
A
A
A
AAA
AA
A
A
A
A
A
AA
AA
A
A
AA
A
AAAA
AAAA
AA A
AA
AA
A
A
LabiodentalDentalAlveolarPost-AlveolarUvularPharyngealPharyngealized DentalPharyngealized AlveolarGlottal
Predicted Group
A
B
Figure 7–2. Discrimination plane for voiced and voiceless fricatives. A) voiced. B)voiceless.
CHAPTER 8GENERAL DISCUSSION
Several acoustic measurements were investigated in our study with the aim of
describing the acoustic characteristics of fricatives as produced by native speakers
of Arabic. The use of Arabic was motivated by three reasons. First, fricative
articulation in Arabic spans most of the places of articulation in the vocal tract,
starting from the lips and ending at the glottis. Second, for certain fricatives
in Arabic, a phonemic distinction exists between plain fricatives (/D/ and /s/)
and their pharyngealized counterparts (/DQ/ and /sQ/); and between short and
long vowels (/i - i:, u - u:, a - a:/). Third, the majority of studies dealing with
the acoustic characteristics of fricatives have been carried out predominantly
with reference to English fricatives. Therefore, our study aimed at describing
the acoustic characteristics of Arabic fricatives utilizing many of the acoustic
measurements investigated in other related studies, with specific interest in finding
cues that would differentiate between plain and pharyngealized fricatives.
The cues investigated in our study were amplitude measurements (relative
and normalized frication noise amplitude), spectral measurements (spectral
peak location and spectral moments), temporal measurements (absolute and
normalized frication noise duration) and formant information at the fricative-vowel
transition (F2 at vowel onset and locus equation). Along with reporting these
cues, an attempt was also made to classify fricatives into their respective places
of articulation using statistical modeling (discriminant function analysis) with an
optimum combination of the measurements mentioned above.
111
112
8.1 Temporal Measurement
Findings of the present study were in agreement with previous research dealing
with the effect of place of articulation to the frication noise duration. Specifically,
in agreement with previous research (Behrens and Blumstein 1988b; Jongman
1989; Pirello et al. 1997), our study found that the overall absolute frication noise
duration of sibilant fricatives (mean 138.09 ms) was longer than nonsibilants (mean
109.34 ms). The longer duration of sibilants can be attributed to the greater
articulatory effort needed to force air through the narrow constriction required for
sibilant articulation. Additionally, frication noise duration of voiceless fricatives
(mean 134.21 ms) was longer on average than that of voiced fricatives (mean 92.05
ms). Such effect of voicing was also found in previous studies of English (Cole and
Cooper 1975; Baum and Blumstein 1987; Crystal and House 1988; Fox, Nissen,
McGory, and Rosenbauer 2001; Nissen 2003) and Spanish fricatives (Manrique and
Massone 1981). The effect of voicing on the reduction of segmental duration can
be attributed in part to the decrease in air flow due to higher glottal impedance
during voicing.
Contrary to what was reported in previous research (Nissen 2003), our study
did not find an effect of vowel context for vowels of the same length. However,
fricative duration was significantly longer when it was followed by long high
vowels (/i:, u:/) than when followed by their short counterparts (/i/ and /u/
respectively). Similar results with regard to sibilant/nonsibilant duration and
effect of voicing were obtained when the duration of the fricatives was normalized
relative to word duration. However, a different pattern of vowel context effect
emerged with normalized frication duration. Specifically, within long vowels, high
vowels (/i:, u:/) induced a longer normalized frication duration than the low vowel
/a:/. Additionally, the normalized frication noise duration of fricatives was longer
preceding the front vowel /i:/ than preceding the back vowel /u:/. Such effects
113
of vowel context are not surprising if intrinsic differences between vowel duration
is taken into consideration. Vowel duration has been shown to corrolate with the
degree of jaw lowering associated with its production such that the lower the vowel
the longer its duration. (Fant 1960; Lindblom 1967; Beckman 1986).
8.2 Amplitude Measurement
Both normalized frication noise amplitude and relative amplitude were
investigated in our study. Normalized frication RMS amplitude was defined as
the difference between the RMS amplitude of frication noise and the average
RMS amplitude of three consecutive pitch periods at the point of maximum vowel
amplitude. The findings of our study are consistent with findings from previous
research in that such measurements differentiated nonsibilants (/f, T, D, DQ/) as a
class from sibilant fricatives (/s, sQ, z, S/) while failing to distinguish within each
of the two classes. Although Jongman et al. (2000) study of English fricatives
found noise amplitude to differentiate within sibilants and within nonsibilants,
other research on frication noise amplitude (Strevens 1960; Heinz and Stevens 1961;
Manrique and Massone 1979; Behrens and Blumstein 1988a) reported that while
frication noise amplitude distinguished between sibilant and nonsibilants fricatives,
it could not distinguish within sibilant or within nonsibilant fricatives.
The decrease in nonsibilant frication noise normalized RMS amplitude as
compared with sibilant fricatives was expected given the intrinsic amplitude
associated with the two classes. Specifically, sibilant articulation, as explained in
Section (8.1), involves a greater articulatory effort to force the air through the
narrow constriction needed for sibilant articulation, giving rise to an increase in
noise amplitude. The same reasoning can be used to explain the lower frication
noise RMS amplitude of voiceless fricatives (mean −14.22 dB) as compared to their
voiced counterparts (mean −18.26 dB). An additional source for this difference
is the presence of two sources of acoustic energy during the production of voiced
114
fricative. The energy resulting from glottal vibration during voicing, in addition to
acoustic energy resulting from frication at an oral constriction, results in an overall
increase in the RMS amplitude of voiced fricatives.
Not surprising also was the finding that normalized frication noise RMS
amplitude increased proportional to the height of the vowel. Recall here that
frication noise RMS amplitude is normalized by subtracting the vowel RMS
amplitude, so when the intrinsic vowel amplitude increases, the overall normalized
noise frication RMS amplitude decreases. Additionally, such intrinsic vowel
amplitude is controlled by the degree of openness/closeness (height) of the
vowel. In the articulation of /a (:)/, the oral cavity is wide open giving rise to
an acoustic waveform of intrinsically higher amplitude (Lehiste and Peterson 1959;
Beckman 1986). The opposite is true with high vowels. Interestingly, intrinsic
vowel amplitude, as well as duration (see above), led to significant differences in the
overall frication noise RMS amplitude only when the comparisons are confined to
long vowels.
Previous research on relative amplitude generally involved the perceptual
effect of this cue on distinguishing places of articulation with Jongman et al.
(2000) as the only notable exception. Our study found relative amplitude to
be a reliable acoustic cue that differentiates among some, but not all, places of
fricative articulation. On the other hand, the trend in our data was parallel to
previously reported values in the literature (Hedrick and Ohde 1993; Jongman
et al. 2000). Specifically, the voiceless post-alveolar fricative (/S/, mean = 0.9 dB)
had the greatest relative amplitude, indicating a stronger concentration of energy
above the F3 region. Furthermore, in line with Jongman et al. (2000) findings,
our study found that nonsibilants, especially voiceless ones, have the highest
relative amplitude. More importantly, pharyngealized fricatives /DQ/ and /sQ/ had
significantly lower relative amplitude than their plain counterparts.
115
The difference in relative amplitude between plain and pharyngealized
fricatives can be attributed to the lowering of vowel’s F2 frequency caused by
pharyngealization (Stevens 1998) with the increase in amplitude associated with
it. Recall here that for pharyngealized fricatives, relative amplitude was defined as
the difference between the fricative’s and the vowel’s amplitude at the F2 region.
Therefore, an increase in vowel amplitude at such frequency will lead to a lowering
of the relative amplitude value. There was also an effect of vowel context parallel to
that obtained for normalized frication noise RMS amplitude. As before, such effect
of vowel context is related to vowels’ intrinsic amplitude. With relative amplitude,
our study revealed that relative amplitude measured for fricatives preceding low
vowel /a:/ was significantly lower than those preceding high vowels /i:, u:/, due to
the inherent higher amplitude of /a:/.
8.3 Spectral Measurement
Spectral peak location of fricatives, as was the case in previous studies
(Hughes and Halle 1956; Strevens 1960; Manrique and Massone 1981; Behrens
and Blumstein 1988b; Jongman et al. 2000), tends to decrease as the place of
articulation moves backwards in the oral cavity. Furthermore, the results of the
current study were in line with previous research in that spectral peak location
distinguished nonsibilant from sibilant fricatives, with the only exception being
the similar values obtained for /s/ and voiceless nonsibilants /f, T/. Although
spectral peak location distinguished between post-alveolar /S/ and alveolar
fricatives /s, z/, it failed to distinguish among nonsibilants. Moreover, plain and
pharyngealized fricatives did not differ in terms of the frequency of the amplitude
peak as measured at the midpoint of frication noise.
Of interest here, however, is the fact that three mutually exclusive regions of
fricative place of articulation can be identified based on spectral peak location. For
voiceless fricatives, the first group includes fricatives articulated at or anterior to
116
the alveolar ridge, the second includes post-alveolar and uvular fricatives, while
the third group consists of pharyngeal and glottal fricatives. For voiced fricatives,
the groups followed the more traditional division of nonsibilants, sibilant and
back-articulated fricatives. Spectral peak location was found not to be affected by
vowel length but rather by its degree of roundedness such that rounded vowel /u/
introduced a lower spectral peak location than unrounded vowels /i, a/.
Spectral moments (spectral mean, variance, kurtosis and skewness) were
estimated in our study from four windows centered at frication noise onset,
midpoint, offset and transition into the vowel. Albeit lower due to the male
population from which the data were sampled, the average values for spectral mean
in our study were consistent with those reported for similar fricatives in Jongman
et al. (2000); Nissen (2003): alveolar fricatives had the highest while the lowest
spectral mean was observed for pharyngeal and glottal fricatives. Furthermore,
spectral mean, averaged across all windows, served to distinguish all places of
voiced fricatives articulation, and, as was the case with spectral peak location,
identified three mutually exclusive groups of voiceless fricatives (/f, T, s, sQ/,
/S, K/ and /Q, h/). Such classification ability of spectral mean, for both voiced
and voiceless fricatives, was present at the second (frication noise midpoint) and
third (transition) windows. It was also found that voiceless fricatives had higher
spectral means than voiced fricatives in the first three windows, while the effect was
reversed when the vocalic part (transition window) was used to measure spectral
mean.
Similar to the effects explained above for spectral peak location, vowel context
also influenced the measured spectral mean in all four windows; with rounded vowel
/u(:)/ introducing lower spectral mean for the fricatives. Specifically of interest
here is the fact that it was only when the fricative’s transition into the vowel was
used to derive spectral mean values that a significant difference between plain
117
and pharyngealized fricatives was observed in part due to pharyngealization effect
on the vocalic part of the window. As mentioned above, the general pattern of
the obtained spectral mean values was parallel to that of Jongman et al. (2000).
Contrary to this similarity, in our study spectral mean was more effective at the
frication midpoint and offset in separating fricatives into their respective places of
articulation as compared to Jongman et al. onset and transition windows.
The results obtained for the second statistical moment (variance) were parallel
in nature to that of spectral mean and very similar to values reported by Nissen
(2003). No direct comparison could be made with variance values reported in
Jongman et al. (2000) since in that study values were averaged across voicing.
However, like both studies, our study found spectral variance of sibilants to be
significantly lower than sibilants in the first three windows for voiceless fricatives
and at all windows for voiced fricatives. Nevertheless, no differences were found
within nonsibilant fricatives. Jongman et al. (2000) reported similar results for all
but the second window. Another finding consistent with previous research is the
lower variance of voiceless fricatives as compared to voiced fricatives (4.5 MHz and
5.4 MHz respectively) at the middle of frication noise. Although variance served to
distinguish many of fricative place of articulation, it failed at all of the four analysis
windows to statistically distinguish between plain and pharyngealized fricatives, or
between fricatives in the vocalic contexts differing in length.
Skewness measured at all window locations did not differentiate between plain
fricatives and their pharyngealized counterparts. However, skewness measured at
the second and third windows differentiated between all voiced fricatives. With
the exception of alveolar /z/ that had the only negatively skewed distribution
among voiced fricatives, skewness became positively skewed and increased as the
place of articulation advances backwards in the oral cavity. For voiceless fricatives,
skewness distinguished between sibilants and nonsibilants; and within sibilants at
118
the second analysis window. In general, alveolar fricatives had the lowest skewness
indicating a concentration of energy at higher frequencies, while such concentration
of energy was at lower frequencies for pharyngeal and glottal fricatives. Although
the number of places investigated here is greater than in either Jongman et al.
(2000) or Nissen (2003), our results are in general agreement with both studies for
alveolar and post-alveolar fricatives. Also, our study is in agreement with Jongman
et al. in that skewness increases substantially at the fricative-vowel transition due
to “the predominance of low-frequency over high-frequency energy as the vowel
begins” (Jongman et al. 2000, p. 1257). The effect of the vowel context became
more pronounced at this transition window with rounded vowels /u, u:/ with their
inherently lower frequencies.
Kurtosis was used previously in the literature as a measure of the peakedness
if the spectral distribution. In our study, kurtosis was substantially higher for
pharyngeal fricatives /è, Q/ at the first three windows than all other fricatives.
Furthermore, the peakedness of alveolar fricatives observed elsewhere in the
literature (Tomiak 1990; Jongman et al. 2000; Nissen 2003) was not observed in our
results.
8.4 Transition Information
Formant transitions at the fricative-vowel boundary were investigated in our
study using measures of the second formant at transition and locus equations. For
F2 values, the results obtained were consistent with predictions of the Source-Filter
theory of speech production. Specifically, F2 values of pharyngealized fricatives
were significantly lower than their plain counterparts. As mentioned previously,
such values are expected due to the lowering effect of second formant in pharyngeal
co-articulation (Stevens 1998). Also of interest was the finding that, within the
back articulated fricatives, only the uvular fricatives had similar (and significantly
lower) F2 values than sibilants and nonsibilants.
119
The similar grouping of uvular and pharyngealized fricatives suggests similar
articulatory processes in their production. The reasoning behind this grouping is
twofold: first, values of F2 are inversely related to the height of the tongue; and
second, the secondary constriction involved in the /DQ, sQ/ production is in a higher
position than that of plain pharyngeal fricatives (Al-Ani 1970; McCarthy 1994;
Ladefoged and Maddieson 1996). Therefore, the fact that both pharyngealized
and uvular fricatives shared similar F2 properties, that were distinct from all
other fricatives, supports McCarthy (1994)’s proposal to name co-articulated
emphatics in Arabic as “uvularized” rather than “pharyngealized”. However, such
a generalization should be taken cautiously since the realization of emphatics as
either uvularized or pharyngealized is dependent on the dialect of Arabic used
(Keating 1988; Zawaydeh 1997; Watson 1999).
Both the slope and y-intercept of locus equations in our study, in general, did
not distinguish between all the different places of fricative articulation. However,
both measurements served to distinguish uvular and glottal fricatives /X, K, h/
as a group having a higher slope and a lower y-intercept than all other fricatives.
More importantly and in contrast to findings reported in Yeou (1997), y-intercept
of pharyngealized fricatives did not differ from their plain counterparts, while only
the slope of /DQ/ was different from /D/.
8.5 Discriminant Analysis
The various acoustical cues, except for locus equations, were used in a
discriminant function analysis to identify the cues maximally contributing to the
classification of fricatives into places of articulation. It was found that the spectral
mean (at frication noise onset, middle, and offset), skewness (at onset, offset of
frication and transition into the vowel), second formant at vowel onset, normalized
RMS amplitude and spectral peak location were the variables contributing the
most to the overall classification with a success rate of 83.2% . When voicing was
120
specified in the model the correct classification rate increased to 92.9% for voiced
and 93.5% for voiceless fricatives. It is worth mentioning, however, that if rate of
misclassification was taken into consideration, then fricatives could be clustered
into three groups, namely nonsibilants, sibilants and gutturals with pharyngealized
fricatives grouped with their plain counterparts in the same natural class.
8.6 Conclusion
Our study investigated the acoustic characteristics of Arabic fricatives. Results
obtained from most of the cues used were consistent with results obtained in
previous research for fricatives in other languages. Among the cues investigated,
spectral measures were the most efficient in distinguishing among the different
places of fricative articulation. Further research should focus on the perceptual
reality of the acoustic cues investigated in this study and how changes in the
acoustic cue effect the perceptually of fricative place of articulation.
REFERENCES
Abdelatty Ali, A. M., J. Van der Spiegel, and P. Mueller (2001). Acoustic-phoneticfeatures for the automatic classification of fricatives. J Acoust Soc Am 109 (5 Pt1), 2217–2235.
Al-Ani, S. H. (1970). Arabic Phonology. Paris: Mouton, The Hague.
Alwan, A. (1989). Perceptual cues for place of articulation for the voicedpharyngeal and uvular consonants. J Acoust Soc Am 86 (2), 549–556.
Anderson, N. (1978). On the calculation of filter coefficients for maximum entropyspectral analysis. In D. G. Childers (Ed.), Modern spectrum analysis, pp.252–255. New York, NY: IEEE Press.
Baum, S. R. and S. E. Blumstein (1987). Preliminary observations on the use ofduration as a cue to syllable-initial fricative consonant voicing in English. JAcoust Soc Am 82 (3), 1073–1077.
Beckman, M. E. (1986). Stress and non-stress accent. Dordrecht, Holland: Foris.
Behrens, S. and S. E. Blumstein (1988a). Acoustic characteristics of Englishvoiceless fricatives:a descriptive analysis. J Phonetics 16, 295–298.
Behrens, S. and S. E. Blumstein (1988b). On the role of the amplitude of thefricative noise in the perception of place of articulation in voiceless fricativeconsonants. J Acoust Soc Am 84 (3), 861–867.
Boersma, P. and D. Weenink (2004). Praat: a system for doing phonetics bycomputer. Amsterdam: Institute of Phonetic Sciences of the University ofAmsterdam.
Chen, H. and K. N. Steven (2001). An acoustical study of the fricative /s/ in thespeech of individuals with dysarthria. J Speech Lang Hear Res 44 (6), 1300–1314.
Cole, R. A. and W. E. Cooper (1975). Perception of voicing in English affricatesand fricaitves. J Acoust Soc Am 58 (6), 1280–1287.
Crystal, T. and A. House (1988). Segmental durations in connected-speech signals:Current results. J Acoust Soc Am 83, 1553–1573.
El-Halees, Y. (1985). The role of F1 in the place-of-articulation distinction inArabic. J Phonetics 13 (3), 287–298.
Fant, G. (1960). Acoustic theory of speech production. Mouton: The Hague.
121
122
Ferguson, C. A. (1959). Diglossia. Word 15, 325–340.
Forrest, K., G. Weismer, P. Milenkovic, and R. N. Dougall (1988). Statisticalanalysis of word-initial voiceless obstruents: preliminary data. J Acoust SocAm 84 (1), 115–123.
Fowler, C. A. (1994). Invariants, specifiers, cues: An investigation oflocus equations as information for place of articulation. Perception &Psychophysics 55, 597–611.
Fox, R. A., S. Nissen, J. McGory, and K. Rosenbauer (2001). Age-related changesin the acoustic characteristics of voiceless English fricative. J Acoust SocAm 110, 2704.
Govindarajan, K. (1998). Listeners’ perceptual mapping of locus equations andvariability. Behav Brain Sci 21 (2), 266–267.
Gurlekian, J. A. (1981). Recognition of the Spanish fricatives /s/ and /f/. J AcoustSoc Am 70 (6), 1624–1627.
Hair, J., R. Anderson, and R. Tatham (1987). Multivariate data analysis withreadings. New York, NY: MacMillan.
Harrington, J. and S. Cassidy (1999). Techniques in Speech Acoustics. Norwell,MA: Kluwer Academic Publisher.
Harris, F. J. (1978). On the use of windows for harmonic analysis with the discretefourier transform. Proceedings of IEEE 66, 51–83.
Harris, K. S. (1958). Cues for the discrimination of American English fricatives inspoken syllables. Lang Speech 1, 1–7.
Hedrick, M. (1997). Effect of acoustic cues on labeling fricatives and affricates. JSpeech Lang Hear Res 40 (4), 925–938.
Hedrick, M. S. and R. N. Ohde (1993). Effect of relative amplitude of frication onperception of place of articulation. J Acoust Soc Am 94 (4), 2005–2027.
Heinz, J. M. and K. N. Stevens (1961). On the properties of voiceless fricativeconsonants. J Acoust Soc Am 33, 589–596.
Hughes, G. W. and M. Halle (1956). Spectral properties of fricative consonants. JAcoust Soc Am 28, 303–310.
Jassem, W. (1979). Classification of fricative spectra using statistical discriminantfunctions. In B. Lindblom and S. Ohman (Eds.), Fronteirs of Speech Research.London: Academic Press.
Johnson, K. (1997). Acoustic and Auditory Phonetics. Oxford: Blackwell.
123
Jongman, A. (1989). Duration of fricative noise required for identification ofEnglish fricatives. J Acoust Soc Am 85, 1718–1725.
Jongman, A. (1998). Are locus equations sufficient or necessary for obstruentperception? Behav Brain Sci 21 (2), 271–272.
Jongman, A., R. Wayland, and S. Wong (2000). Acoustic characteristics of Englishfricatives. J Acoust Soc Am 108 (3 Pt 1), 1252–1263.
Kaye, A. S. (1972). Arabic /z/: A synchronic and diachronic study. Linguistics 79,31–63.
Keating, P. (1988). A Survey of Phonological Features. Bloomington, IN: IndianaUniversity Linguistics Club.
Kent, R. D. and C. Read (2002). The Acoustic Analysis of Speech. San Diego:Singular Publishing Group.
Klecka, W. (1980). Discriminant Analysis. London: Sage.
Krull, D. (1989). Second formant locus pattern and consonant-vowel coarticulationin spontaneous speech. Perilus 10, 87–108.
Ladefoged, P. and I. Maddieson (1996). The sounds of the world’s languages.Oxford: Blackwell.
LaRiviere, C., H. Winitz, and F. Herriman (1975). The distribution of perceptualcues in English prevocalic fricatives. J Speech Hear Res 18, 613–622.
Lehiste, I. and G. Peterson (1959). Vowel amplitude and phonemic stress inamerican english. J Acoust Soc Am 31, 428–435.
Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy(1967). Perception of the speech code. Psychol Review 74 (6), 431–461.
Lindblom, B. (1963). A spectrographic study of vowel reduction. J Acoust SocAm 35, 1773–1781.
Lindblom, B. (1967). Vowel duration and a model of lip mandible coordination.STL-QPSR 8 (4), 1–29.
Mann, V. A. and B. H. Repp (1980). Influence of vocalic context on perception ofthe [s] - [sh] distinction. Perception & Psychophysics 28, 213–228.
Manrique, A. M. and M. I. Massone (1979). On the identification of ArgentineSpanish voiceless fricatives. In Proceedings of the Ninth International Congress ofPhonetic Sciences, Volume 1, Copenhagen, Denmark, pp. 237.
Manrique, A. M. and M. I. Massone (1981). Acoustic analysis and perception ofSpanish fricative consonants. J Acoust Soc Am 69 (4), 1145–1153.
124
McCarthy, J. (1994). The phonetics and phonology of semitic pharyngeals. InP. Keating (Ed.), Papers in laboratory phonology 3: Phonological structure andphonetic form, pp. 191–233. Cambridge: Cambridge University Press.
McCasland, G. P. (1979). Noise intensity and spectrtuirt cues for spoken fricatives.J Acoust Soc Am Suppl 165, S78–79.
Nissen, S. (2003). An accoustic analysis of voicless obstruents produced by adultsand typically developing children. Ph. D. thesis, Ohio State University, Columbus,OH.
Nittrouer, S. (1995). Children learn separate aspects of speech production atdifferent rates: evidence from spectral moments. J Acoust Soc Am 97 (1),520–530.
Nittrouer, S., M. Stiddert-Kennedy, and R. McGowan (1989). The emergenceof phonetic segments: evidence from the spectral structure of fricative-vowelsyllables spoken by children and adults. J Speech Hear Res 32, 120–132.
Norlin, K. (1983). Acoustic analysis of fricatives in cairo Arabic. Working Papers,Phonetics Laboratory, Lund University 25, 113–137.
Pentz, A., H. R. Gilbert, and P. Zawadzki (1979). Spectral properties of fricativeconsonants in children. J Acoust Soc Am 66 (6), 1891–1893.
Pirello, K., S. E. Blumstein, and K. Kurowski (1997). The characteristics of voicingin syllable-initial fricatives in American English. J Acoust Soc Am 101 (6),3754–3765.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992).Numerical recipes in C: the art of scientific computing. Cambridge: CambridgeUniversity Press.
Shadle, C., S. J. Mair, and J. N. Carter (1996). Acoustic characteristics of the frontfricatives [f, v, T, D]. In Proceedings of ETRW - 4th Speech Production Seminar,Aturans, France, pp. 193–169.
Shadle, C. H. (1985). The acoustics of fricative consonants. Ph. D. thesis, M.I.T.,Cambridge, MA.
Shadle, C. H. (1990). Articulatory-acoustic relationships in fricative consonants. InW. J. Hardcastle and A. Marchal (Eds.), Speech Production and speech modelling,pp. 187–209. Dordrecht, Netherlands: Kluwer Academic Publishers.
Shadle, C. H. and S. J. Mair (1996, October). Quantifying spectral characteristicsof fricatives. In Proceedings of the Fourth International Conference on SpokenLanguage Processing, Volume 3, Philadelphia, PA., pp. 1521–1524.
125
Soli, S. D. (1981). Second formants in fricatives: acoustic consequences of fricative-vowel coarticulation. J Acoust Soc Am 70 (4), 976–984.
Stevens, J. (2002). Applied multivariate statistics for the social sciences. Mahwah,NJ: Erlbaum.
Stevens, K. N. (1971). Airflow and turbulence noise for fricative and stopconsonants: Static considerations. J Acoust Soc Am 50, 1182–1192.
Stevens, K. N. (1985). Evidence for the role of acoustic boundaries in theperception of speech sounds. In V. Fromkin (Ed.), Phonetic Linguistics., pp.243–256. New York, NY: Academic Press.
Stevens, K. N. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press.
Stevens, K. N. and S. E. Blumstein (1981). The search for invariant acousticcorrelates of phonetic features. In P. D. Eimas and J. L. Miller (Eds.),Perspectives of the Study of Speech. Hillsdale, NJ: Erlbaum.
Strevens, P. (1960). Spectra of fricative noise in human speech. Lang Speech 3,32–49.
Sussman, H. M. (1994). The phonological reality of locus equations across mannerclass distinctions: Preliminary observations. Phonetica 51, 119–131.
Sussman, H. M., D. Fruchter, J. Hilbert, and J. Sirosh (1998). Linear correlatesin the speech signal: the orderly output constraint. Behav Brain Sci 21 (2),241–299.
Sussman, H. M., K. A. Hoemeke, and F. S. Ahmed (1993). A cross-linguisticinvestigation of locus equations as a phonetic descriptor for place of articulation.J Acoust Soc Am 94 (3 Pt 1), 1256–1268.
Sussman, H. M., H. A. McCaffrey, and S. A. Matthews (1991). An investigation oflocus equations as a source of relational invariance for stop place categorization.J Acoust Soc Am 90, 1309–1325.
Tabain, M. (1998). Non-sibilant fricatives in English: spectral information above 10khz. Phonetica 55 (3), 107–130.
Tabain, M. (2001). Variability in fricative production and spectra: implicationsfor the hyper- and hypo- and quantal theories of speech production. LangSpeech 44 (Pt 1), 57–94.
Tabain, M. (2002). Voiceless consonants and locus equations: a comparison withelectropalatographic data on coarticulation. Phonetica 59 (1), 20–37.
Tjaden, K. and G. S. Turner (1997). Spectral properties of fricatives inamyotrophic lateral sclerosis. J Speech Lang Hear Res 40 (6), 1358–1372.
126
Tomiak, G. R. (1990). An acoustic and perceptual analysis of the spectral momentsinvariant with voiceless fricative obstruents. Ph. D. thesis, State University ofNew York, Buffalo, NY.
Watson, J. C. (1999). The directionality of emphasis spread in arabic. LinguisticInquiry 30, 289–300.
Wilde, L. (1993). Inferring articulatory movements from acoustic properties atfricative-vowel boundaries. J Acoust Soc Am 94, 1881.
Wilde, L. F. and C. B. Huang (1991). Acoustic properties at fricative-vowelboundaries in American English. In Proceedings of the of the 12th InternationalCongress of Phonetics Sciences, Aix-en-Provence, pp. 394–401.
Yeou, M. (1997). Locus equations and the degree of coarticulation of Arabicconsonants. Phonetica 54, 187–202.
Zawaydeh, B. A. (1997). An acoustic analysis of uvularization spread in Ammani-Jordanian Arabic. Studies in the Linguistic Sciences 27 (1), 185–200.
BIOGRAPHICAL SKETCH
Mohamed Ali Al-Khairy was born in Makkah, Saudi Arabia. He went to Umm
Al-Qura University and earned his B.A. in English Literature and Linguistics. At
the University of Florida, he started graduate study in linguistics in Fall 1998. He
completed an M.A. in linguistics in Fall 2000 and then embarked on a Ph.D. degree
in linguistics. During his study, he taught for the Department of African and Asian
Languages and Literature from 1999 to 2004. He received an Alec Courtelis Award
for Exceptional International Students in 2002 and a College of Liberal Arts and
Sciences Award for International Student with Outstanding Academic Achievement
in the same year. He was also awarded a McLaughlin Dissertation Fellowship in
Spring 2005.
127