perception of vowels and prosody by cochlear implant recipients in noise

16
Perception of vowels and prosody by cochlear implant recipients in noise Marianne Van Zyl, Johan J. Hanekom * Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Lynnwood Road, Pretoria 0002, South Africa 1. Introduction Present-day cochlear implants (CIs) provide listeners who had very little or no residual hearing with access to sufficient acoustic cues for successful perception of many auditory stimuli. While many listeners with CIs display remarkable success with open set speech recognition in quiet (see for example Caposecco, Hickson, & Pedley, 2012), a number of problem areas remain, one being the recognition of speech in the presence of background noise. The majority of everyday communication occurs in at least some degree of background noise and causes great difficulty for listeners with CIs, who require a much more Journal of Communication Disorders 46 (2013) 449–464 A R T I C L E I N F O Article history: Received 2 October 2012 Received in revised form 13 September 2013 Accepted 16 September 2013 Available online 21 September 2013 Keywords: Cochlear implants Prosody Suprasegmental cues Vowel recognition Noise A B S T R A C T The aim of the present study was to compare the ability of cochlear implant (CI) recipients to recognise speech prosody in the presence of speech-weighted noise to their ability to recognise vowels in the same test paradigm and listening condition. All test materials were recorded from four different speakers (two male, two female). Two prosody recognition tasks were developed, both using single words as stimuli. The first task involved a question/statement distinction, while the second task required listeners to make a judgement about the speaker’s attitude. Vowel recognition tests were conducted using vowel pairs selected on the basis of specific acoustic cues (frequencies of the first two formants and duration). Ten CI users and ten normal-hearing controls were tested in both quiet and an adaptive noise condition, using a two-alternative forced-choice test paradigm for all the tests. Results indicated that vowel recognition was significantly better than prosody recognition in both listener groups in both quiet and noise, and that question/ statement discrimination was the most difficult task for CI listeners in noise. Data from acoustic analyses were used to interpret differences in performance on different tasks and with different speakers. Learning outcomes: As a result of this activity, readers will be able to (1) describe suitable methods for comparing vowel and prosody perception in noise, (2) compare performance on vowel and prosody perception tasks in quiet in normal-hearing listeners and cochlear implant recipients, (3) compare performance on vowel and prosody perception tasks in noise in normal-hearing listeners and cochlear implant recipients and (4) relate performance on prosody tasks in quiet to performance on these tasks in noise. ß 2013 Elsevier Inc. All rights reserved. Abbreviations: CI, cochlear implant; NH, normal-hearing; SNR, signal-to-noise ratio; F1, first formant; F2, second formant; F0, fundamental (voice) frequency; 2AFC, two-alternative forced-choice; DL, difference limen; FS1, female speaker 1; FS2, female speaker 2; MS1, male speaker 1; MS2, male speaker 2. * Corresponding author. Tel.: +27 12 420 2461. E-mail address: [email protected] (J.J. Hanekom). Contents lists available at ScienceDirect Journal of Communication Disorders 0021-9924/$ see front matter ß 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jcomdis.2013.09.002

Upload: johan-j

Post on 30-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perception of vowels and prosody by cochlear implant recipients in noise

Journal of Communication Disorders 46 (2013) 449–464

Contents lists available at ScienceDirect

Journal of Communication Disorders

Perception of vowels and prosody by cochlear implant

recipients in noise

Marianne Van Zyl, Johan J. Hanekom *

Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Lynnwood Road, Pretoria 0002, South Africa

A R T I C L E I N F O

Article history:

Received 2 October 2012

Received in revised form 13 September 2013

Accepted 16 September 2013

Available online 21 September 2013

Keywords:

Cochlear implants

Prosody

Suprasegmental cues

Vowel recognition

Noise

A B S T R A C T

The aim of the present study was to compare the ability of cochlear implant (CI) recipients

to recognise speech prosody in the presence of speech-weighted noise to their ability to

recognise vowels in the same test paradigm and listening condition. All test materials were

recorded from four different speakers (two male, two female). Two prosody recognition

tasks were developed, both using single words as stimuli. The first task involved a

question/statement distinction, while the second task required listeners to make a

judgement about the speaker’s attitude. Vowel recognition tests were conducted using

vowel pairs selected on the basis of specific acoustic cues (frequencies of the first two

formants and duration). Ten CI users and ten normal-hearing controls were tested in both

quiet and an adaptive noise condition, using a two-alternative forced-choice test paradigm

for all the tests. Results indicated that vowel recognition was significantly better than

prosody recognition in both listener groups in both quiet and noise, and that question/

statement discrimination was the most difficult task for CI listeners in noise. Data from

acoustic analyses were used to interpret differences in performance on different tasks and

with different speakers.

Learning outcomes: As a result of this activity, readers will be able to (1) describe

suitable methods for comparing vowel and prosody perception in noise, (2) compare

performance on vowel and prosody perception tasks in quiet in normal-hearing listeners

and cochlear implant recipients, (3) compare performance on vowel and prosody

perception tasks in noise in normal-hearing listeners and cochlear implant recipients and

(4) relate performance on prosody tasks in quiet to performance on these tasks in noise.

� 2013 Elsevier Inc. All rights reserved.

1. Introduction

Present-day cochlear implants (CIs) provide listeners who had very little or no residual hearing with access to sufficientacoustic cues for successful perception of many auditory stimuli. While many listeners with CIs display remarkable successwith open set speech recognition in quiet (see for example Caposecco, Hickson, & Pedley, 2012), a number of problem areasremain, one being the recognition of speech in the presence of background noise. The majority of everyday communicationoccurs in at least some degree of background noise and causes great difficulty for listeners with CIs, who require a much more

Abbreviations: CI, cochlear implant; NH, normal-hearing; SNR, signal-to-noise ratio; F1, first formant; F2, second formant; F0, fundamental (voice)

frequency; 2AFC, two-alternative forced-choice; DL, difference limen; FS1, female speaker 1; FS2, female speaker 2; MS1, male speaker 1; MS2, male

speaker 2.

* Corresponding author. Tel.: +27 12 420 2461.

E-mail address: [email protected] (J.J. Hanekom).

0021-9924/$ – see front matter � 2013 Elsevier Inc. All rights reserved.

http://dx.doi.org/10.1016/j.jcomdis.2013.09.002

Page 2: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464450

favourable signal-to-noise ratio (SNR) than normal-hearing (NH) listeners to obtain the same degree of success with speechrecognition in noise (Gifford & Revit, 2010).

To better understand the difficulties posed by background noise, much research has focused on identifying the cues thatare important for speech recognition and examining how these cues are affected by noise. The success with which a listenerperceives speech in noise will depend on which cues in the signal remain available to the listener once the signal has beendegraded by noise, and in the case of CIs to what extent the important cues are conveyed by the speech processor. Extensivework has been done on the cues underlying the recognition of segmental speech features (vowels and consonants) by NHlisteners in quiet and the availability of these cues in noise. For example, a number of studies have investigated the acousticcues that enable NH listeners to identify vowels even in severe background noise (see for example Ferguson & Kewley-Port,2002; Ferguson, 2004; Swanepoel, Oosthuizen, & Hanekom, 2012). Formant frequencies, especially of the first two formants(F1 and F2) (Liu & Kewley-Port, 2004; Nearey, 1989; Peterson & Barney, 1952), the properties of the spectral shape as a whole(Parikh & Loizou, 2005) as well as formant movement and duration (Iverson, Smith, & Evans, 2006) have all been shown tocontribute to successful vowel perception. Other studies have explored the underlying cues of consonants and how thesecues are affected by noise. For instance, a classic study by Miller and Nicely (1955) reported that voicing, nasality, affrication,duration, and place of articulation are all important distinctive features of consonants and that some of these features(voicing and nasality) are more resistant to the effects of white noise than others. These results have since been reproducedusing computerised measures (Phatak, Lovitt, & Allen, 2008), while others have compared the effects of different noise typeson consonant perception (Broersma & Scharenborg, 2010).

A set of speech cues that has not been as thoroughly investigated in noise is prosodic cues. Prosody of speech consists ofelements such as intonation, loudness, stress, rhythm, pauses, and speech rate (Borden, Harris, & Raphael, 2007; Cruttenden,1997; Monrad-Krohn, 1947). Different approaches to defining prosody exist (see Cutler & Ladd, 1983; Ladd, 1988 for adiscussion on the different views). While some researchers view prosody as a structural attribute of language that organisesspeech sounds (e.g. Brown, 1983), others focus on the measurement of particular acoustic parameters and matching these tospecific meanings or communicative functions, for example matching specific acoustic characteristics to particular emotionssignalled by a speaker (Williams & Stevens, 1972). Since the objective of the present research was to determine acousticcorrelates of particular speech features and to establish how well these cues are perceived by CI recipients, the latterapproach was selected. The acoustic parameter associated with intonation is variation in the fundamental voice frequency,perceived by listeners as variation in pitch (Borden et al., 2007). The perception of loudness is related to the intensity of thesound (Lehiste, 1970), while the perception of stress is related to a combination of changes in frequency, intensity andduration, and speech rhythm is the result of the combination of stressed and unstressed syllables in connected speech(Cruttenden, 1997).

Research related to prosody and speech recognition in noise so far has demonstrated that abnormal prosody, particularlyan abnormal intonation pattern, has an adverse effect on speech recognition in noise (e.g. Laures & Bunton, 2003). Otherfindings seem to indicate that prosodic cues are quite robust and might be more immune to noise than segmental cues, atleast for NH listeners. Investigations of the spectral content of prosodic cues have revealed that some prosodic cues areremarkably redundant, and spread out across the frequency spectrum, enabling listeners to discern these cues even whenthe signal is spectrally degraded (Grant & Walden, 1996; Lakshminarayanan et al., 2003). Earlier research has also shownthat the prosodic cues to stress rhythm (the rhythm of strong and weak syllables in the speech signal) as well as the locationof word boundaries remain useful to listeners even when vowels and consonants are masked by noise (Smith, Cutler,Butterfield, & Nimmo-Smith, 1989). However, it is not clear from existing literature whether prosodic cues are available to CIlisteners in noise, and whether these cues show the same redundancy and immunity to noise as seen in NH listeners.

The signal received by CI users contains a reduced set of speech cues compared to the cues available to NH listeners, astypical CI processors do not convey some of the cues that make the speech signal redundant and resilient to noise in NHlisteners (Xu, Thompson, & Pfingst, 2005). Spectral information, for example, is degraded in CIs (Chatterjee & Peng, 2008),with CI users having a limited number of spectral channels available when compared to NH listeners (Friesen, Shannon,Baskent, & Wang, 2001). As a result, CI recipients reportedly have difficulty with the recognition of some prosodic cues,especially those features closely related to fundamental voice frequency (F0). Voice F0 plays an important role in manyimportant prosodic functions, such as conveying normal intonation patterns, which helps with speech recognition in noise(Laures & Bunton, 2003), marking the differences between questions and statements (Grant & Walden, 1996;Lakshminarayanan et al., 2003), conveying the emotion or attitude of a speaker (Breitenstein, Van Lancker, & Daum,2001; Cheang & Pell, 2008; Dmitrieva, Gel’man, Zaitseva, & Orlov, 2008; Murray & Arnott, 1993), and marking accentedwords in a sentence (Breen, Fedorenko, Wagner, & Gibson, 2010; Pell, 2001). CI recipients derive less benefit than NHlisteners from natural intonation patterns in noise (Meister, Landwehr, Pyschny, & Grugel, 2011), and perform significantlyworse than NH listeners on question/statement distinctions and sentence accent perception (Meister, Landwehr, Pyschny,Walger, & Wedel, 2009). Also, CI recipients perform poorly in the recognition of vocal emotions (Hopyan-Misakyan, Gordon,Dennis, & Papsin, 2009; Luo, Fu, & Galvin, 2007).

However, it is not only prosodic cues that CI recipients have difficulty with. Vowels, which have been shown to be aparticularly important segmental cue in speech recognition (Kewley-Port, Burkle, & Lee, 2007), also pose a challenge to theselisteners. Many CI recipients are unable to attain 100% recognition of vowels even in quiet listening conditions (e.g. Munson,Donaldson, Allen, Collison, & Nelson, 2003). Introducing background noise makes vowel recognition even harder for theselisteners (Xu & Zheng, 2007), but the noise might not affect all important speech features equally and performance in noise

Page 3: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 451

can therefore not necessarily be predicted from performance in quiet. Munson and Nelson (2005), for example, havedemonstrated that CI users performed similar to NH listeners on the discrimination of /i/ and /u/ in quiet and in noise, whileon /w/ - /j/ discrimination, CI listeners performed similar to NH listeners in quiet, but significantly worse in noise. Thesefindings show that rapidly changing spectral cues (such as the formant transitions in /w/ and /j/) are more vulnerable to theeffects of noise than other more steady state speech cues, at least in the case of CI users.

From the research mentioned it is clear that generally CI recipients experience difficulty with the recognition of bothprosody and vowels. However, most existing reports do not directly compare perception of the two cue types. A directcomparison between prosody and vowel perception could provide deeper insight into the difficulty that CI listenersexperience with speech perception in noise by showing which cues are worst affected by noise. Given the reportedredundancy and noise robustness of prosodic cues, it is possible that NH listeners use these cues to augment speechperception in noise when segmental cues such as vowels are degraded. It is not clear from existing data whether the cues thatare most immune to noise effects for NH listeners also remain useful to CI listeners in noise. One study that compared voweland prosody perception is that of Luo, Fu, Wu, & Hsu (2009), who investigated the perception of Mandarin Chinese tones andvowels in CI users using their clinically assigned speech processors. Four vowels were each produced with four differenttones (which correspond to changes in voice F0). Listeners responded in a 16-alternative forced-choice paradigm, and resultswere analysed to determine the number of correctly identified syllables, tones and vowels. Findings indicated that CI usersperformed better on vowel recognition than tone recognition, but were still able to score above 60% on average on tonerecognition in quiet. This finding agrees with the findings of Wei, Cao, & Zeng (2004), who also found an average tonerecognition score of above 60% for the CI users in their study. However, it is still unclear whether the F0 cues that are availableto CI listeners in quiet remain available in background noise (Brown & Bacon, 2010), and how the perception of otherprosodic cues compare to vowel recognition in CI listeners.

Therefore, two research questions are considered in the present study, namely (i) how well do CI recipients perceiveprosodic cues in background noise, and (ii) how does the perception of prosody by CI recipients compare to their perceptionof important segmental cues (specifically vowels) in quiet and in noise. The present study was designed to address thesequestions systematically through acoustic analyses and listening experiments in CI recipients and NH controls. Thehypothesis was that perception of prosody would be better than vowel perception in noise in both NH and CI listeners.Although existing literature reports that CI listeners have difficulty with prosody perception related to changes in voice F0, itwas hypothesised that durational and intensity cues in combination with available F0 cues would present enoughredundancy in prosodic cues to provide an advantage over vowel cues.

2. Approach

The perception of prosodic cues is frequently evaluated in a two-alternative forced-choice (2AFC) test paradigm, oftenbecause the very nature of these contrasts in everyday speech involves a choice between two alternatives. Examples of thisinclude question/statement distinctions (e.g. Chatterjee & Peng, 2008; Most, Gaon-Sivan, Shpak, & Luntz, 2012), thediscrimination of attitude as sarcastic or sincere (e.g. Cullington & Zeng, 2011), identification of phrase boundaries (Marslen-Wilson, Tyler, Warren, Grenier, & Lee, 1992), and the resolution of sentence ambiguity based on prosodic cues (Price,Ostendorf, Shattuck-Hufnagel, & Fong, 1991). In contrast, vowel recognition tasks often involve a larger set of alternatives forlisteners to choose from. This means that the two tasks (vowel and prosody perception) cannot be fairly compared, becausethe difficulty of the test paradigm is not the same. Because many prosody perception tasks call for the use of a 2AFCparadigm, the present study adopted this paradigm and casted the vowel recognition task, which served as a segmentalperception task, into the same paradigm to provide a fair comparison between the two task types.

Stimuli were single words for both the prosody and vowel recognition tasks. The motivation was that longer utterances(phrases or sentences) necessarily contain prosodic cues such as stress and rhythm. To isolate prosodic cues from vowel cues,the former was eliminated from the vowel recognition task by using single word stimuli. Additionally, because mostprevious work on the effects of noise on prosodic cues was done on longer utterances, the use of single words offered theopportunity to determine whether the noise robustness of these cues is also observed on single word level. This couldprovide deeper insight into the overall noise immunity and importance of prosodic cues in noise. Furthermore, prosody maybe particularly important in single-word utterances to differentiate speaker intent when contextual cues (i.e. additionalwords in the sentence, or word order cues) are not available. This can be seen, for example, in the case of the word ‘‘okay’’,which is frequently used as a single-word utterance to fulfil a wide variety of functions (Gaines, 2011). Prosody plays animportant role in differentiating the meaning of this word in different contexts (Gravano, Hirschberg, & Benus, 2012). Single-word utterances can also be used as either a statement or a question, with no inversion of word order to help the listenerdistinguish between the two possibilities, and prosody in this case is the only acoustic cue that can aid the listener indifferentiating these (Chatterjee & Peng, 2008).

The main objective of the present study was to compare the perception of vowels and prosody in CI recipients in quiet andin speech-weighted noise. NH age and gender matched control listeners were included to provide a baseline of performancefor both cue types and listening conditions. To accomplish the main objective, materials suitable for the evaluation of vowelsand prosody were developed and acoustically analysed, and listening experiments were carried out with CI recipients andNH listeners. The objective of the acoustic analyses was to understand the acoustic characteristics of the test materials. Thetwo prosodic contrasts used were a linguistic contrast, namely a question/statement distinction, and an attitudinal contrast,

Page 4: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464452

namely a certain/hesitant distinction in the context of giving permission. The question/statement contrast was used as itrepresents an acoustic difference that is to a large extent related to F0 perception, a skill which is problematic for CI users(Brown & Bacon, 2010; Chatterjee & Peng, 2008), while the certain/hesitant contrast was selected because it represents anacoustic difference strongly related to duration (Van Zyl & Hanekom, 2013), and the perception of durational differencesappears to be an easier task for CI users (Moore & Glasberg, 1988). Using these two types of prosodic contrasts thereforeensured representation of two acoustic dimensions that play an important role in prosody perception. The acousticdifferences of these contrasts have been documented elsewhere (Fujie, Ejiri, Kikuchi, & Kobayashi, 2006; Grant & Walden,1996; Peng, Lu, & Chatterjee, 2009; Van Heuven & Van Zanten, 2005; Van Zyl & Hanekom, 2013), but it was still important todetermine the specific characteristics of the language, speakers and materials recorded for this study.

The acoustic characteristics of the prosody materials that were examined included the intensity of each syllable, theaverage F0 of each syllable (which roughly correlates with the perception of intonation), the range of the voice F0 across bothsyllables, and the duration of the entire word. Intensity can play a role in indicating emphasis on a particular syllable (Fry,1955; Lieberman, 1960; Morton & Jassem, 1965). The height of average F0 of each syllable could also indicate emphasis (Fry,1958; Morton & Jassem, 1965), while F0 range across the entire utterance is often associated with the acoustic differencesbetween different prosodic expressions (Breitenstein et al., 2001; Hammerschmidt & Jurgens, 2007; Murray & Arnott, 1993).Duration or speech rate is often mentioned as an acoustic correlate of some forms of prosody expression (Fujie et al., 2006;Murray & Arnott, 1993; Williams & Stevens, 1972).

For the vowel perception task, a collection of fifteen vowels were recorded and acoustically analysed, and three vowelpairs were then purposefully selected to represent specific acoustic differences, as described in section 3. The vowelcharacteristics analysed were F1 and F2 frequencies and vowel duration. The importance of F1 and F2 frequencies for voweldiscrimination has long been established (Assmann, Nearey, & Hogan, 1982; Klatt, 1982; Miller, 1989; Nearey, 1989;Peterson & Barney, 1952). A number of studies also support the role of duration in vowel discrimination, especially withvowels lying close together in the F1–F2 vowel space (Ainsworth, 1972; Hillenbrand, Getty, Clark, & Wheeler, 1995; Tartter,Hellman, & Chute, 1992). Vowel pairs were selected so that each pair would represent a specific acoustic difference (F1, F2 orduration). The results of the acoustic analyses below provide further details on the choice of vowel pairs.

3. Development and acoustic analysis of test materials

The language of the test materials was Afrikaans, a West Germanic language native to South Africa and closely related toDutch (Gooskens, 2007). No existing pre-recorded materials suitable for the present research were available, so newmaterials had to be recorded. The developed materials were subjected to acoustic analyses to make a final selection of testmaterials and to assist in the interpretation of the listening experiment results.

3.1. Methods

3.1.1. Recording of speech material

Two main types of speech materials were recorded for this study, namely single words denoting a prosodic contrast, andconsonant-vowel-consonant utterances where only the vowel differed. All materials were recorded from four differentspeakers (two female speakers denoted FS1 and FS2, and two male speakers, denoted MS1 and MS2), all native speakers ofAfrikaans with normal hearing and speech. Two types of prosodic differences were recorded from each speaker, one being aquestion/statement contrast (a linguistic function of prosody) and the other marking an attitudinal contrast (a non-linguisticfunction of prosody), namely a certain/hesitant contrast.

To ensure that the contrasts were purely prosodic and not related to the content of the utterance, the same word was usedto express both versions of each contrast. For the question/statement contrast, the word ‘‘coffee’’ (‘‘koffie’’ in Afrikaans,pronunciation very similar to English) was used. Fifteen interrogative (question) and fifteen declarative (statement) versionswere recorded from each speaker. The interrogative versions of the utterance were elicited by asking speakers to produce theword ‘‘koffie’’ in a manner as if asking someone if they would like a cup of coffee. The declarative version of the word waselicited by asking the talker a question (such as ‘‘what would you like to drink?’’), and instructing them to produce the word‘‘koffie’’ each time as a response to the question. The recorded materials were validated in a sample of NH listeners (n = 5) toensure recognition accuracy above 90%. The certain/hesitant contrast (development and validation described in Van Zyl &Hanekom, 2013) was represented by the word ‘‘okay’’, with half of the utterances produced with certainty, and the other halfwith hesitation or reluctance. These materials had a recognition accuracy of >90% in a group of NH listeners (n = 12).

For the vowel discrimination tasks, a complete set of fifteen Afrikaans vowels were recorded from each speaker ina /pVOWELt/ format and analysed in order to enable the selection of a representative subset of vowel pairs for the present study,in light of the acoustic characteristics of all the vowels. The first and last consonants (/p/ and /t/) were selected as they are bothvoiceless plosives, which enabled accurate isolation of the vowel segment for analysis. For each /pVOWELt/ combinationcontaining a different vowel, fifteen versions were recorded from each speaker. This enabled acoustic analysis of the vowelsusing average values across several utterances of the same vowel, thereby including in the analyses and listening experimentsthe natural variations that occur when a speaker repeats the same utterance (Peterson & Barney, 1952).

Digital recording of the speech materials was conducted in a double-walled sound booth, using an M-Audio Fast Track Proexternal sound card (sampled at 44.1 kHz with 24-bit resolution) and a Sennheiser ME62 microphone placed on a

Page 5: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 453

microphone stand 20 cm from the speaker’s mouth. Recorded waveforms were edited using Praat software (Boersma &Weenink, 2010) by removing unwanted silences (leaving silences of 100 ms before and after the utterance) and re-scalingthe intensity of each utterance to 70 dB SPL before saving the material to hard disc in .wav format. Re-scaling intensitiespreserved relative intensity changes and cues within utterances, while eliminating any accidental intensity differencesbetween utterances which may have occurred during recording and ensuring accurate SNRs in the noise experiment.

3.1.2. Acoustic analysis

Acoustic analyses of the prosody materials examined the intensity of each syllable, the average F0 of each syllable, therange of the voice F0 across both syllables, and the duration of the entire utterance. Intensity was measured in dB SPL, andreflected the RMS (root-mean-square) value of the intensity of each syllable separately. Intensity analysis included only thevowels, since all the consonants were voiceless and therefore did not reflect voice intensity. The intensity of the two syllableswere analysed separately, since the average intensity of the entire utterance was controlled through digital manipulationand the characteristic of interest was the relative difference in intensity of the two syllables as an indication of syllable stress.The average and range of voice F0 were determined using Praat, and were expressed in Hz.

The vowels contained in the /pVOWELt/ utterances were analysed to determine the F1 and F2 frequencies and vowelduration of the vowel, as motivated in the introduction. Formant frequencies were extracted from the time frame spanningthe middle 80% of the vowel using the LPC-based formant extraction algorithm of Praat.

Each of the recorded utterances were analysed individually, and the average of each acoustic cue for each type ofutterance subsequently calculated across all utterances of the same type. This enabled statistical analysis of the acousticcues, determining the size and significance in the differences of acoustic cues. Kolmogorov-Smirnov tests were conducted onthe data to determine distribution normality. Since not all the acoustic cue data for all the different utterance types werenormally distributed, Student’s t-tests were used to determine the significance of the differences in acoustic cues.

3.2. Results of acoustic analyses

The results of the acoustic analyses of the two prosodic contrasts (question/statement and certain/hesitant) aresummarised in Table 1, showing data for each speaker separately. The data in Table 1 indicate that the question versusstatement utterances differed in both F0 and intensity characteristics, with differences that exceed the difference limens(DLs) reported for NH, and in some cases, for CI listeners. The average duration difference between question and statementutterances was, however, smaller than DLs reported for both NH and CI listeners (Moore & Glasberg, 1988; Small & Campbell,1962). F0 and intensity cues therefore seemed to be the most prominent cues for this contrast.

Certain/hesitant utterances differed in intensity, F0, and duration for most speakers. The intensity differences producedby speaker MS1, however, were below DLs reported in existing literature, even for NH listeners, and the difference in F0 of thesecond syllable was only above the NH DL, and not above the DL reported for CI users (Rogers, Healy, & Montgomery, 2006).Van Zyl and Hanekom (2013) showed that, while cue trading relationships existed between other cues, the most consistentcue was duration difference between certain and hesitant utterances.

The results of the acoustic analyses of the recorded vowels are depicted in Fig. 1, which shows the vowels on an F1–F2vowel plane. These findings were used to select vowel pairs that could be used for the listening experiment in a 2AFC testparadigm based on their acoustic characteristics. Since only a limited number of vowel pairs could be used, the selection hadto represent specific acoustic differences and similarities. A number of acoustic characteristics of the vowel contribute incombination to vowel identification accuracy. These include the availability of formant frequency information, vowelduration, formant movement over time, and distance between vowels (or dispersion of vowels) in the F1–F2 vowel space(Neel, 2008). In the present study, vowels were selected to differ in the three most prominent steady state cues reported inliterature (F1, F2 and duration, as discussed in the introduction). The first vowel pair (/p&t/ and /pet/) was selected to differprimarily in terms of their average F2 frequencies, and having similar average F1 frequencies and durations for all fourspeakers. Despite the relatively large difference in F2, this vowel pair posed a difficult task to listeners due to the highlysimilar F1 frequencies, a cue that has been shown to be particularly important for vowel recognition in noise (Parikh &Loizou, 2005). Similarly, Swanepoel et al. (2012) showed that while F2 is more important than F1 in quiet and low noiseconditions, listeners increase reliance on F1 as noise levels increase. The second vowel pair (/pat/ and /put/) differedprimarily in F1 frequency, while having similar F2 frequencies and durations. As only either F1 or F2 differed within a vowelpair, the F2-F1 difference between the two vowels of a vowel pair was relatively large. Although this difference may have hadan influence on the degree of difficulty in vowel comparisons, the work of Neel (2008) suggests that distinctiveness of vowelsbased on formant frequencies, duration and formant movement over time may more strongly influence vowel identificationthan dispersion in vowel space. The third pair (/pet/ and /pe:t/) differed mainly in duration, while being closely spaced in theF1–F2 plane. This pair was selected in order to examine the noise immunity of duration as a cue to vowel identity in caseswhere formants are very similar and could not be used as cue to distinguish these vowels. This was important to considerparticularly since CI users are reported to have relatively good temporal resolution (close to that of NH listeners) (Moore &Glasberg, 1988), and durational cues were particularly prominent in the certain/hesitant prosodic contrast (Van Zyl &Hanekom, 2013). As can be seen in Fig. 1, a number of other possible vowel pair selections would have constituted an easierlistening task, due to differences in more than one important acoustic characteristic (e.g. /a/ versus /e, i, y, &/). Also, vowelpairs that may have constituted a more difficult listening task appear (e.g. /y/ versus /i/, or /3/ versus /œ/), but due to the low

Page 6: Perception of vowels and prosody by cochlear implant recipients in noise

Table 1

Results of acoustic analyses on prosody materials. Values indicate means and standard deviations are shown in brackets. Syll1 and Syll2 denote first and

second syllables respectively. Intensity and frequency DLs are from Rogers et al. (2006), and duration DLs from Small and Campbell (1962).a

Statement/question contrast

FS1 FS2 MS1 MS2

Intensity Syll1 Statement 77.11 (0.6) 75.75 (1.04) 76.53 (0.36) 77.43 (0.38)

(dB SPL) Question 73.84 (0.72) 74.68 (1.28) 71.74 (1.61) 75.14 (1.11)

Difference �3.27c �1.07 �4.79c �2.29b

Syll2 Statement 67.70 (2.68) 68.28 (3.01) 67.95 (1.65) 59.77 (2.86)

Question 72.60 (0.75) 71.75 (1.58) 73.61 (0.57) 70.76 (1.63)

Difference 4.90c 3.47c 5.66c 10.99c

Mean F0 Syll1 Statement 207.93 (8.35) 168.42 (7.22) 96.97 (7.42) 103.71 (8.10)

(Hz) Question 220.87 (5.47) 173.28 (6.02) 104.00 (6.63) 87.22 (11.45)

Difference 12.94b 4.87b 7.02b �16.49b

Syll2 Statement 188.19 (9.81) 163.87 (7.80) 78.58 (7.49) 71.16 (4.72)

Question 381.65 (9.25) 232.91 (12.23) 151.23 (6.80) 121.39 (22.10)

Difference 193.46c 69.04c 72.65c 50.23c

F0 range (Hz) Statement 54.42 (10.21) 47.21 (19.60) 33.31 (14.20) 40.80 (30.76)

Question 257.61 (17.72) 105.82 (20.76) 63.74 (10.41) 57.06 (17.90)

Difference 203.19 58.61 30.42 16.25

Duration (s) Statement 0.39 (0.02) 0.36 (0.02) 0.40 (0.02) 0.27 (0.02)

Question 0.45 (0.02) 0.35 (0.02) 0.42 (0.02) 0.34 (0.02)

Difference 0.06 �0.01 0.02 0.06

Certain/hesitant contrast

FS1 FS2 MS1 MS2

Intensity Syll1 Certain 72.52 (1.54) 72.04 (2.23) 68.68 (6.82) 72.81 (1.90)

(dB SPL) Hesitant 69.12 (3.65) 68.42 (1.73) 68.85 (2.27) 67.96 (2.06)

Difference �3.40c �3.62c 0.17 �4.85c

Syll2 Certain 72.74 (0.58) 72.86 (0.80) 73.08 (1.58) 73.64 (1.30)

Hesitant 72.33 (0.41) 73.49 (0.54) 72.81 (0.24) 74.71 (0.49)

Difference �0.41 0.63 �0.27 1.07

Mean F0 Syll1 Certain 279.65 (75.69) 209.56 (19.64) 97.43 (4.02) 133.92 (13.86)

(Hz) Hesitant 215.21 (25.19) 164.09 (6.84) 97.64 (4.88) 83.71 (16.35)

Difference �64.44c �45.47c 0.21 �50.21c

Syll2 Certain 263.63 (16.40) 209.41 (10.50) 119.48 (27.61) 121.47 (11.7)

Hesitant 225.27 (13.56) 200.17 (13.61) 113.65 (18.82) 111.66 (6.69)

Difference �38.36c �9.24b �5.83b �9.81b

F0 range (Hz) Certain 193.54 (80.43) 84.54 (46.00) 70.51 (32.17) 53.67 (26.02)

Hesitant 93.73 (41.73) 126.79 (22.89) 73.57 (17.80) 65.13 (10.11)

Difference �99.81 42.25 3.06 11.46

Duration (s) Certain 0.42 (0.03) 0.37 (0.05) 0.47 (0.05) 0.30 (0.04)

Hesitant 0.65 (0.10) 0.54 (0.07) 0.83 (0.10) 0.59 (0.08)

Difference 0.23c 0.17c 0.36c 0.29c

a The difference limens reported here are for NH listeners (Small & Campbell, 1962), and no report of such difference limens measured in CI users could be

found in existing literature. However, evidence from existing literature indicates that the temporal resolution of CI users is close to that of NH listeners

(Moore & Glasberg, 1988).b Indicates values greater than difference limens (DLs) for NH.c Indicates DLs larger than NH and CI DLs.

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464454

frequency of occurrence of /y/ and /œ/ in Afrikaans (Van Heerden, 1999), plus the fact that these vowels are often reduced to/i/ and /3/ in conversational speech, these vowel pairs were not used. The vowel pairs selected for the study are representative ofspecific differences observed within the complete collection of vowels, and were balanced in terms of their difficulty level.

The average difference in duration between the recorded /p?t/ and /pet/ utterances ranged between 0 and 0.01 s for thedifferent speakers, and duration differences for /pat/ and /put/ ranged 0.01 and 0.03 s across speakers. Average durationdifferences for /pet/ and /pe:t/ were between 0.12 and 0.22 s for the different speakers.

4. Listening experiments

4.1. Method

4.1.1. Listeners

Ten cochlear implant recipients (aged 21–70) participated in the study. All participants used Cochlear devices, and yearsof implant used ranged from five to nineteen years. Nine participants had unilateral cochlear implants. One recipient used a

Page 7: Perception of vowels and prosody by cochlear implant recipients in noise

Fig. 1. F1–F2 vowelspace of the 15 Afrikaans vowels recorded from four speakers (FS1 and FS2 are female speakers, MS1 and MS2 are the male speakers).

Values indicate average frequencies calculated from fifteen distinct utterances of each vowel.

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 455

hearing aid in the non-implanted ear, and one had bilateral implants. She was requested to switch off the processor on theear that she considered weakest, while the hearing aid user was asked to switch the hearing aid off during testing, so that allrecipients were evaluated with only one implant. All CI recipients were tested with their processors set to the programmeand settings that they used most frequently. A control group of listeners matched to the CI group in gender and age alsoparticipated in the study. All control subjects had normal hearing (pure tone thresholds � 20 dB HL at octave frequenciesfrom 250 to 8000 Hz). All participants (NH and CI) were native speakers of Afrikaans. Ethics clearance was obtained from therelevant ethics committee at the institution where the research was conducted, and participants provided informed consentprior to testing. Table 2 provides information on the CI recipients who participated in the study.

4.1.2. Procedures

4.1.2.1. Test setup. Listeners were seated in a double-walled sound booth with the test administrator. Speech materials werepresented through an M-Audio EX66 Reference Monitor (�3 dB bandwidth from 37 Hz to 22 kHz, with flat frequencyresponse in between that allows maximum variation of �1 dB). All test materials were presented in a 2AFC paradigm, through agraphic user interface developed in Matlab showing the two alternatives on the screen. Participants had to click on a start button,and subsequently had to click on the alternative they heard to prompt the presentation of the next item.

4.1.2.2. Listening tasks. Each listener had to complete five listening tasks for each of the four speakers, in two listeningconditions. The five listening tasks included two prosody discrimination tasks (question/statement and certain/hesitant) and

Page 8: Perception of vowels and prosody by cochlear implant recipients in noise

Table 2

Details of CI recipients who participated in the listening experiments. Speech recognition scores reflect the percentage of words in pre-recorded sentences

that were identified correctly. Speech recognition data for CI7 was not available. C and CA refer to the Contour electrode and Contour Advance electrodes

respectively. Details of S15’s second implant and processor are indicated in brackets, as this processor was switched off during testing.

Subject

number

Gender Age Processor Implant Strategy Post-/pre-lingual

deafness

No. of years

implanted

Ear(s)

implanted

Speech

recognition %

S15 F 23 Freedom

(CP810)

CI22M

(CI24RE (CA))

SPEAK

(ACE)

Post 19

(7)

Left

(Right)

96

(�)

S24 F 21 Freedom CI24RE (CA) ACE Post 5 Right 100

S22 M 41 CP810 CI24RE (CA) ACE Post 5 Right 100

S23 M 21 ESPrit 3G CI22M SPEAK Pre 15 Right 87

S28 F 58 Freedom CI24RE (CA) ACE Post 5 Right 100

S26 M 22 CP810 CI24R (C) ACE Pre 9 Left 96

S27 F 70 Freedom CI24RE (CA) ACE Post 5 Left -

S14 M 30 CP810 CI24R (C) ACE Post 9 Left 92

S5 F 44 Freedom CI24M SPEAK Post 12 Right 92

S19 F 43 CP810 CI24RE (CA) ACE Post 6 Right 75

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464456

three vowel discrimination tasks (/p&t/ and /pet/;/pat/ and /put/;/pet/ and /pe:t/). The two listening conditions were quietand an adaptive noise condition, using a speech-weighted noise specific to each speaker. In quiet, a total of 36 stimuli werepresented in each task. The first six items in each task were practice items, purposefully selected to include three items ofeach of the two alternatives. The remaining 30 stimuli were presented in random order, and performance was scored as thepercentage of correct responses. No feedback was given to listeners on the correctness of their responses.

To test recognition in noise, an adaptive procedure was used to prevent floor and ceiling effects that could occur whenusing a fixed noise condition, especially in the CI population where there is great inter-individual variability. The SNR waschanged adaptively via a transformed two-down, one-up staircase procedure, where equilibrium occurs at a signal-to-noiseratio corresponding to the 71% correct point on the psychometric function (Levitt, 1971). The 71% correct point is anattractive equilibrium point in a 2AFC paradigm where chance level is 50%, since it is approximately halfway betweenguessing and perfect (100%) recognition (Hartmann, 1998). To minimise practice or learning effects, each listener firstcompleted all five listening tasks in both quiet and noise with recordings from an additional female speaker to ensurefamiliarity with the tasks and procedures. Furthermore, the order of the different tasks and speakers was counterbalancedacross listeners. Total testing time was around seven hours per listener.1

The initial step size of the SNR adjustment (until the first reversal) was 2 dB, and subsequent step size (following thelistener’s first error) was 1 dB A pilot experiment was conducted to determine the test procedure that would result in aminimum amount of variance without extending testing time unnecessarily. Ten reversals in the adaptive procedure andfour to six repetitions of the procedure resulted in the smallest attainable standard deviation (�2 dB). The adaptive procedurewas therefore terminated after ten reversals, of which the last six reversal points were used to calculate the 71% point. Theprocedure was repeated a minimum of four times for each task, and if any of the four results differed more than 4 dB (allowing for�2 dB deviation from the mean), two additional repetitions were carried out. No feedback was provided on the correctness ofindividual items, but at the end of each completed test, listeners were informed about their performance rate, which helped tokeep listeners motivated.

4.1.2.3. Signal intensity. Speech and noise were combined adaptively to attain the desired SNR, in such a way that thecombined stimulus had an intensity of 60 dB SPL. I.e., both the speech and noise levels were adapted after each response sothat the desired SNR was obtained, while maintaining the stimulus level at 60 dB SPL. The latter was measured with a soundlevel metre at the approximate location of the listeners’ ears. Speech was always presented above the threshold of aparticular listener. Across CI listeners, the SNR varied from �15 dB to 10 dB during the adaptive procedure, so that noise andspeech levels varied between 58.6 and 47.6 dB SPL, and 43.6 and 57.6 dB SPL, respectively. The 60 dB level was selected as thecombined intensity for two reasons. Firstly, 60 dB is considered to be the average level at which most conversational speechoccurs (Firszt et al., 2004; Pearsons, Bennett, & Fidell, 1977). Secondly, the Cochlear Freedom and CP810 processors used bythe majority of CI recipients in this study, when set at the default microphone sensitivity, stimulates at C-levels for sounds ator above 65 dB SPL. For this default setting, input sounds at levels above 65 dB SPL are infinitely compressed by theAutomatic Gain Control (Davidson et al., 2009), so average presentation levels of 60 dB ensured that stimuli were presentedat a level below the kneepoint of the Automatic Gain Control.

1 One adaptive procedure took � three minutes to complete, and had to be repeated 4–6 times for each task to ensure accurate determination of recognition

thresholds. This resulted in 3 min � average 5 repetitions � 5 tasks (two prosody and three vowel tasks) � 4 speakers = 6 h, excluding the time required for

training and testing recognition in quiet.

Page 9: Perception of vowels and prosody by cochlear implant recipients in noise

50.00

55.00

60.00

65.00

70.00

75.00

80.00

85.00

90.00

95.00

100.00

105.00

Q/S NH Q/S CI C/H NH C/H CI pεt/pϲt

NH

pεt/p ϲ t

CI

pat/put NH

pat/put CI

pεt/p ε:t

NH

pεt/p ε:t

CI

% r

eco

gn

itio

n in

qu

iet

a)

-16.00

-14.00

-12.00

-10.00

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

10.00

12.00

Q/S NH Q/S CI C/H NH C/H CI pεt/pϲt

NH

pεt/p ϲt CI pat/put NH

pat/put CI pεt/p ε:t

NH

pεt/p ε:t CI

SN

R a

t 71%

reco

gn

itio

n

b)

Fig. 2. (a) Percentage recognition scores obtained in quiet. (b) SNR levels at which 71% recognition was obtained for each task type and listener group. Q/S

denotes question/statement discrimination, C/H denotes certain/hesitant discrimination, NH denotes normal-hearing listeners and CI denotes CI recipients.

Error bars indicate one standard deviation from the mean.

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 457

4.2. Results

Results for the listening experiments are depicted in Fig. 2(a) and (b), which show the average scores for each of the fivelistening tasks (question/statement, certain/hesitant, pet/p&t, pat/put, and pet/pe:t discrimination) across all four speakersfor NH and CI listeners separately.

The results depicted in Fig. 2(a) show that on all tasks in quiet, CI listeners obtained a poorer average score than NHlisteners, and a larger variance. The difference between the two listener groups across tasks was analysed using Mann–Whitney’s U (owing to the small sample size) and indicated that CI listeners performed significantly poorer than NH listeners(U = 4.0, z = �3.48, p < 0.001). Pairwise comparisons between listener groups on the question/statement and certain/hesitanttask indicated that CI listeners performed significantly worse than NH listeners on the question/statement task (U = 9.0,z = �3.1, p < 0.001), but not on the certain/hesitant task (U = 31.0, z = �1.44, p = 0.165). The CI listeners’ results for the voweldiscrimination tasks were compared to NH listeners’ results using a one-sample t-test, since all NH listeners scored 100% forall the vowel tasks. Scores did not differ significantly from 100% for any of the three vowel tasks.

Both listener groups performed best in the vowel recognition tasks. Friedman’s ANOVA revealed significant differences(p < 0.05) between the five tasks for both groups of listeners, and post hoc Wilcoxon signed-rank tests revealed thatperformance on the two prosodic tasks did not differ significantly from each other in either listener group, and neither didthe three vowel tasks. In the NH group, the two prosody tasks both differed significantly from each of the three vowel tasks.

Page 10: Perception of vowels and prosody by cochlear implant recipients in noise

Table 3

Differences between SNRs required by each listener group to obtain 71% recognition for each listening task (averaged across speakers).

Question/statement Certain/hesitant pet/p&t pat/put pet/pe:t

NH average SNR �11.21 �10.34 �13.02 �14.58 �13.37

CI average SNR 2.27 �3.21 �7.00 �11.38 �6.09

Difference 13.48 7.12 6.03 3.20 7.28

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464458

In the CI group, the question/statement task differed significantly from each of the vowel tasks, while the certain/hesitanttask differed only from the pat/put vowel task. A Bonferroni correction was applied to correct for the number of pairwisecomparisons (significance reported at a level of p = 0.005).

Fig. 2b (listening in noise) shows a similar pattern to that of Fig. 2a (listening in quiet), with NH listeners performingbetter than CI listeners on all tasks (as demonstrated by a lower SNR at which 71% correct is achieved) and displayingsmaller variance across listeners. A mixed design ANOVA was performed on the data measured in noise, with within-subject factors defined as task (three levels, i.e. question/statement discrimination, certain/hesitant discrimination, andvowel discrimination) and speaker (four levels, namely FS1, FS2, MS1 and MS2). Listener group (NH or CI) was the onlybetween-subject variable. Between-subjects effects measured found a significant overall effect of listener group, F (1,18) = 62.03, p < 0.001. Within-subject measures showed a significant overall effect of task, F (4, 72) = 62.46, p < 0.001, aswell as a significant interaction between task and listener group (NH or CI), F (4, 72) = 23.74, p < 0.001. Post hoc pairwisecomparisons with Bonferroni corrections indicated that across all speakers and listeners, each of the five listening tasksdiffered significantly from the other four tasks (p < 0.001). Friedman’s ANOVAs were conducted on the average resultsacross speakers for the two listener groups separately, and indicated that there were significant differences between thefive listening tasks in both groups (p < 0.001). Wilcoxon pairwise comparisons in the NH group indicated that none ofthe vowel tasks differed significantly from each other, and the two prosody tasks also did not differ significantly. Thequestion/statement task differed significantly from the pat/put vowel task, and the certain/hesitant task differed fromall three vowel tasks (p < 0.005). In the CI group, the two prosody tasks also did not differ significantly from each other,and neither did the three vowel tasks. In this group, the question/statement task was significantly more difficult than allthree of the vowel tasks, and the certain/hesitant task was significantly more difficult than the pat/put task, but not thanthe other two vowel tasks. Differences between CI and NH performance on the different tasks are reflected by thedifferences between the SNR required for CI listeners to obtain 71%, and that required by NH listeners to achieve thesame level of accuracy with each task (averaged across speakers). Table 3 documents the average SNR improvementrequired for CI listeners to enable them to perform at the same level as NH listeners for each of the tasks. Theaverage values show that performance of NH listeners varied with 4.24 dB between the easiest (pat/put) and mostdifficult (certain/hesitant) task, while CI listeners showed a variation of 13.65 dB between best (pat/put) and worst(question/statement) performance. The question/statement task yielded the biggest difference between NH and CIlisteners.

The effects of different speakers on discrimination performance in noise are shown in Fig. 3(a) and (b) for NH and CIlisteners, respectively.

The overall effect of speaker was found to be significant using a mixed design ANOVA, F (3, 54) = 15.18, p < 0.001. Post hocpairwise comparisons using Bonferroni corrections showed that across all five tasks and both listener groups, resultsobtained with speaker FS1 differed significantly from those obtained from MS2, while outcomes from MS1 differedsignificantly from FS2, and MS2 (all significant at a level of p < 0.008). There was a significant interaction between speakerand hearing, F (3, 54) = 3.44, p < 0.05. This interaction can be seen, for example, in the question/statement discriminationtask, where speaker MS2 elicited the poorest performance from NH listeners, while speaker FS2 elicited the poorestperformance from CI listeners. Mauchly’s test of sphericity indicated that the speaker by task interaction violated theassumption of sphericity, and Greenhouse–Geisser estimates of sphericity were therefore used to correct the degrees offreedom for this interaction (e = 0.46). The speaker by task interaction was significant at the level of p < 0.001, F (5.53,99.51) = 13.80. Speaker by task interaction was particularly salient for the question/statement task, where speaker FS1elicited the best recognition performance for both NH and CI listeners. A significant three-way interaction of speaker by taskby hearing was found, F (5.53, 99.51) = 4.78, p < 0.001. This can be seen for example in results found with FS2, who yieldedthe poorest performance for CI recipients in both prosody tasks, but not in the vowel tasks, while the same speaker yielded aperformance close to the average across all speakers on all five tasks from NH listeners. Friedman’s ANOVAs were used tocompare results from each speaker for each task and listener group separately. Results indicated significant differencesbetween speakers for each of the five tasks in the NH group, and for all but one of the tasks (pat/put discrimination) in the CIgroup.

Correlations between performance in quiet and performance in noise were analysed and compared between listenergroups. Fig. 4 shows the linear regression lines for the two prosody tasks, with results grouped across speakers but separatedfor the two listener groups. Vowel tasks were not included due to the ceiling effect and lack of variance in results obtained inquiet. Spearman’s rho was used to determine the strength and significance of the correlations, as the data were not normallydistributed in all instances.

Page 11: Perception of vowels and prosody by cochlear implant recipients in noise

Fig. 3. (a) Average SNR at 71% recognition attained by NH listeners (n = 10). (b) Average SNR at 71% recognition attained by CI recipients (n = 10). Error bars

indicate one standard deviation from the mean. Q/S denotes question/statement recognition and C/H denotes certain/hesitant recognition. Female speakers

are FS1 and FS2; male speakers are MS1 and MS2. Significant differences (p < 0.05) were found between speakers on all tasks in the NH group, and on all but

one of the tasks (pat/put) in the CI group.

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 459

In the NH group, performance in noise was significantly related to performance in quiet for the certain/hesitantdiscrimination task, rs = �0.67, p (one-tailed) < 0.05, but not for the question/statement discrimination task, rs = �0.44,p = 0.10. In the CI listener group, question/statement discrimination results in noise was significantly related to resultsobtained in quiet, rs = �0.70, p (one-tailed) < 0.05, as was certain/hesitant discrimination results, rs = �0.74, p (one-tailed) < 0.01.

5. Discussion

5.1. Prosody versus vowel perception in quiet

Results from the quiet listening condition showed that NH listeners performed significantly better on all three voweltasks than on prosody discrimination tasks. However, in the CI group, while the question/statement task was significantlymore difficult than the vowel tasks, the difficulty of the certain/hesitant task did not differ significantly from two of the voweltasks. These were specifically those of which the primary underlying cues were F2 and duration. This observation suggeststhat durational differences (underlying the certain/hesitant contrast) available to CI listeners in quiet are more salient thanchanges in voice F0 and intensity (required for accurate perception of question/statement differences). Results from thequestion/statement task are comparable to existing reports on CI performance on listening tasks that also involve F0

Page 12: Perception of vowels and prosody by cochlear implant recipients in noise

-16.00

-14.00

-12.00

-10.00

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

60 65 70 75 80 85 90 95 10 0

SN

R a

t 71%

reco

gn

itio

n

% recogn ition in quiet

NH Que stion/Statement

NH Ce rta in/Hesita nt

CI Que stion/Statement

CI Ce rta in/Hesita nt

Linear (NH Que stion/Statemen t)

Linear (NH Ce rta in/Hesita nt)

Linear (CI Que stion/Statemen t)

Linear (CI Ce rta in/Hesita nt)

Fig. 4. Recognition in noise as a function of recognition in quiet. NH indicates normal-hearing listeners, CI indicates CI recipients. Results reflect the average

scores obtained across four speakers.

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464460

perception. Results in quiet agree with the report by Luo et al. (2009) on tone and vowel perception in Mandarin-speaking CIusers (n = 8), which demonstrated that CI users scored better on vowel recognition (90%) than on tone recognition (63%) inquiet. A second similarity with these data was the difference between NH and CI performance that was smaller for vowelrecognition (non-significant in the present study) than for tasks that involved tone or intonation recognition (the question/statement discrimination task yielded a significant difference between listener groups in the present work). The differencebetween NH and CI performance on the certain/hesitant task in the present study was not significant in quiet, supporting thesuggestion that the underlying cues of this task were more readily available to CI users than the cues underlying thequestion/statement distinction.

5.2. Prosody versus vowel perception in noise

In speech-weighted noise, NH listeners showed the poorest performance for the certain/hesitant contrast, while CIlisteners performed poorest on the question/statement task. Although the differences between prosody tasks were notsignificant in either group, differences between these two tasks and the three vowel tasks showed some interesting effects. InNH listeners, the certain/hesitant task was significantly more difficult than all three vowel tasks, while question/statementdistinction was only more difficult than the pat/put task (the easiest vowel task for both groups). In the CI group, a reversepattern was seen, with the question/statement task being significantly more difficult than all three vowel tasks, while thecertain/hesitant task was only more difficult than the easiest vowel task (pat/put). These results confirm the difficulty that CIlisteners had with the question/statement task, as demonstrated by results from the quiet listening condition. However,their perception of a prosodic contrast that was heavily dependent on duration differences did not differ significantly fromtwo of the vowel discrimination tasks – one that depended especially on perception of duration, and another whichdepended more on F2 perception. This finding suggests that both F2 and durational cues on a vowel level showed similarresistance to noise than the durational cue on a prosody level for these listeners. It seems therefore that CI listeners do haveaccess to some prosodic cues even in background noise, but these cues are not more immune to noise effects than vowel cues.In fact, the prosodic cues required to make the question/statement distinction (F0 and intensity) were severely affected bynoise in this listener group.

The outcome that prosody perception tasks yielded poorer performance in noise than vowel tasks was somewhatunexpected, given evidence in the literature that some prosodic cues are quite redundant and immune to noise effects (Smithet al., 1989; Grant & Walden, 1996; Lakshminarayanan et al., 2003; Dmitrieva et al., 2008). A possible explanation for thisfinding is that most of the studies demonstrating the redundancy of prosodic cues used longer utterances than the singlewords used in the present work. It is conceivable that prosodic cues are less redundant and noise-resistant on a single word

Page 13: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 461

level than in longer utterances such as phrases or sentences. Among other, the stress rhythm and word boundary cuesreported in Smith et al. (1989) will not be available in single-word prosody. The present work therefore contributes to ourunderstanding of the relative robustness of prosody on a single-word level, and suggests that the resilience of prosody thathas previously been reported seems to function mainly on the level of longer utterances such as phrases or sentences.

5.3. Relative performance of CI and NH listeners in noise

Differences between CI users and NH listeners’ performances in noise, expressed as the difference in SNR required toobtain 71% recognition (Table 3), showed the smallest difference in the pat/put discrimination task (around 3 dB). This taskrelied heavily on F1 perception, and the small difference in performance between CI and NH listeners suggests that thenoise immunity of F1 cues was similar in the two groups. The difference between the two groups was similar for the certain/hesitant prosody task (which was strongly connected to duration perception) and two of the vowel tasks, one relyingmostly on F2 perception and the other on duration perception (6–7 dB). The largest difference in performance in noisebetween NH and CI listeners (13.48 dB) was found in the question/statement task. Acoustic analyses showed that thequestion/statement contrast was marked by large differences in average F0 of the second syllable and in F0 range for mostof the speakers, as well as some intensity differences. F0 perception is a known problem area for CI recipients even in quiet(Rogers et al., 2006). Although NH listeners also performed significantly poorer on this task than on vowel discriminationtasks, the large difference between CI and NH performance on this task in noise demonstrates that the addition ofbackground noise can highlight differences between listener groups and different tasks that cannot be detected whentesting in quiet, agreeing with the finding of Luo et al. (2009) that the introduction of a competing talker resulted in largerdifference between NH and CI performance. Furthermore, the average F0 differences between questions and statementswas above DLs reported for CI users in the literature (Rogers et al., 2006) for all four speakers, but the findings of the presentwork suggest that the F0 differences successfully perceived by these listeners in quiet are more vulnerable to the effects ofnoise than the duration differences which marked the certain/hesitant contrast. This finding is an important first step inanswering the question raised by Brown and Bacon (2010) about whether F0 cues perceived by CI listeners in quiet remainavailable in noise.

5.4. Relationship between performance in quiet and performance in noise

Although correlations do exist between discrimination abilities in quiet and in noise for CI listeners, this should beinterpreted with care. Specifically, an individual listener’s performance in noise cannot necessarily be predicted from theirperformance in quiet. A good example of this is CI listener S28, who attained the second highest score in question/statementdiscrimination in quiet (94%), but was in the bottom half of performers in this task in noise. This finding indicates theimportance of directly evaluating the perception of important speech cues in noise, instead of assuming a similar pattern ofrecognition behaviour in noise as was observed in quiet.

5.5. Speaker-dependent differences in performance

Different speakers yielded largely different results in both listener groups, especially with the prosody perception tasks(as indicated by larger standard deviations than for the vowel perception tasks). In light of the acoustic analyses, this may bebecause different speakers used different acoustic cues to indicate the specific prosodic version (e.g. question or statement),and the size of acoustic differences between the two prosodic types varied greatly between speakers. For example, MS2yielded the poorest performance on the question/statement task for NH listeners, and was also the speaker with the smallestdifferences in average and range of F0. FS1 yielded the best question/statement discrimination scores in both listener groups,and was also the speaker with the largest differences in F0 average and range between the two versions. This was the onlyspeaker for whom CI recipients were able to obtain 71% discrimination at a negative SNR, suggesting that the large F0differences produced by this speaker was more immune to the effects of noise than the smaller F0 differences produced bythe other speakers, and that this cue remained useful to CI recipients even in strong background noise. As with the resultsreported by Meister et al. (2009), even large F0 differences (as produced by FS1) yielded only a small difference inperformance in quiet but could result in a distinct advantage in noise, especially for CI users.

The certain/hesitant discrimination task showed smaller differences between scores obtained with the different speakersin both listener groups. MS1 yielded best performance on this task from NH listeners and also had the largest durationdifference between certain and hesitant utterances (0.35 s on average), but MS2 produced duration differences that werevery similar (0.31 s) and yet elicited the poorest performance from NH listeners. In quiet, however, listeners performedslightly better with recordings from MS2 (99.1 � 1.45%) than with those from MS1 (96.1 � 3.84%). This suggests that theperformance difference in noise was probably not due to duration cues. Regression analyses in which the acoustic cues in thecertain/hesitant contrast were considered have shown a degree of cue trading involving duration, frequency and intensity cues(Van Zyl & Hanekom, 2013). It is possible that differently weighted acoustic cues supported the prosodic contrast in each of thesetwo speakers, and that the cue set supporting perception of the contrast in speaker MS1 was more immune to noise than that ofMS2. This underscores the importance of measuring perception of specific cues in noise, since performance in noise cannotnecessarily be predicted from performance in quiet.

Page 14: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464462

5.6. Comments on the experimental design

Interestingly, results for the question/statement distinction obtained using single words in the present study (NHlisteners 96 � 5.35%, CI listeners 85 � 11.99%) were similar to the results of Meister et al. (2009), who measured question/statement discrimination using sentence materials (NH listeners 99 � 2.0%, CI listeners 82 � 10.7%). This supports the use ofsingle-word prosody tasks in the present study.

For practical reasons, the vowel tasks in the present work did not include all the vowels of the test language; testing eachvowel against every other vowel in the 2AFC test paradigm (required to allow a fair comparison between prosody and vowelrecognition) would have resulted in 105 distinct vowel discrimination tasks. Rather than attempting this, three vowel pairswere carefully selected to represent specific acoustic differences. This provided insight into the perception of specificacoustic cues by the two listener groups and allowed comparison between duration perception on a vowel level and durationperception on a prosody level. It is conceivable that some other vowel pair selections may result in a different outcome interms of the noise immunity of specific vowels, but the findings regarding the availability of different cues are expected toremain the same. Future work could explore this; the present study provides a baseline for investigations into the relativedifficulty of vowel and prosody recognition in noise.

6. Conclusions

First, although it was expected that prosodic cues may aid listeners in noise, so that performance on prosodyrecognition would decline less than performance on vowel recognition tasks in noise, the opposite was found for bothNH and CI listeners. This may be because prosodic cues on a single word level contain less redundancy and are thereforeless noise-resistant than those contained in longer utterances. Second, the two prosody tasks yielded similarperformance in both listener groups in both quiet and noise. However, while NH listeners performed poorest on thecertain/hesitant distinction in noise, CI listeners performed poorest on the question/statement task, suggesting thatthese listeners may not have received the F0 and intensity cues used by speakers to mark the question/statementdistinction, while the certain/hesitant task contained more duration and intensity cues that were better preserved tothese listeners in noise. Furthermore, the two types of tasks (prosody recognition and vowel recognition) yieldedsignificantly different performance, and these differences were amplified by adding interfering noise. Evaluations ofdifferent CI speech processors (e.g. Skinner et al., 2002), preprocessing strategies (e.g. Gifford & Revit, 2010) andprocessor settings (Davidson et al., 2009) typically include only phoneme and/or sentence recognition tests. Differencesbetween vowel perception and prosody perception found in the present work suggest that such assessments should alsoinclude tasks specifically aimed at evaluating prosody perception, especially if the effects of background noise on speechperception need to be determined. Finally, it should be noted that an individual’s performance on a particular task innoise is not necessarily predictable from their performance in quiet. In conclusion, CI users performed better on vowelrecognition than prosody recognition in both quiet and in an adaptive noise paradigm, but some prosodic cues remainedmore useful to these listeners than others in noise.

Acknowledgements

We are grateful to both the cochlear implant recipients and normal-hearing listeners who participated in this study. Thisresearch was supported in part by the National Research Foundation of South Africa.

Conflict of interest statement

The authors received financial support from the National Research Foundation of South Africa. There are no nonfinancialrelationships to disclose.

Appendix A. Continuing education questions

1. The acoustic characteristics of the vowels that were analysed included

a. The spectral shape as a wholeb. Formant movementsc. Frequencies of formants one and twod. Duratione. Both c and d

2. T

he test paradigm of the perception tasks wasa. An open set speech recognition paradigmb. A two-alternative forced-choice paradigmc. A four-alternative forced-choice paradigmd. A sixteen-alternative forced-choice paradigm
Page 15: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464 463

3. P

erception in noise was evaluated usinga. One fixed signal-to-noise ratiob. Three fixed signal-to-noise ratiosc. An adaptive procedure determining the signal-to-noise ratio at the 71% correct recognition pointd. An adaptive procedure determining the signal-to-noise ratio at the 50% correct recognition point

4. R

esults obtained in quiet indicated thata. Both normal-hearing (NH) and cochlear implant (CI) listeners performed better on vowel recognition than on prosody

recognitionb. NH listeners performed better on prosody recognition, while CI listeners performed better on vowel recognitionc. NH listeners performed better on vowel recognition, while CI listeners performed better on prosody recognitiond. Both NH and CI listeners performed better on prosody recognition than on vowel recognition

5. R

esults obtained in speech-weighted noise indicated thata. Prosody perception was significantly better than vowel perception in NH listenersb. Prosody perception was significantly better than vowel perception in both listener groupsc. Question/statement discrimination was significantly worse than certain/hesitant discrimination in both listener

groupsd. Question/statement discrimination was the most difficult task for CI listeners, but not for NH listeners.

References

Ainsworth, W. A. (1972). Duration as a cue in the recognition of synthetic vowels. Journal of the Acoustical Society of America, 51, 648–651.Assmann, P. F., Nearey, T. M., & Hogan, J. T. (1982). Vowel identification: Orthographic, perceptual, and acoustic aspects. Journal of the Acoustical Society of America,

71, 975–989.Boersma, P., & Weenink, D. (2010). Praat: Doing phonetics by computer (Version 5.1.32) [Computer software]http://www.praat.org/.Borden, G. J., Harris, K. S., & Raphael, L. J. (2007). Speech science primer: Physiology, acoustics, and perception of speech (5th ed.). Philadelphia: Lippincott Williams &

Wilkins.Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic correlates of information structure. Language and Cognitive Processes, 25, 1044–1098.Breitenstein, C., Van Lancker, D., & Daum, I. (2001). The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an

American sample. Cognition and Emotion, 15, 57–79.Broersma, M., & Scharenborg, O. (2010). Native and non-native listeners’ perception of English consonants in different types of noise. Speech Communication, 52,

980–995.Brown, C. A., & Bacon, S. P. (2010). Fundamental frequency and speech intelligibility in background noise. Hearing Research, 266, 52–59.Brown, G. (1983). Prosodic structure and the given/new distinction. In A. Cutler & D. R. Ladd (Eds.), Prosody, models and measurements (pp. 67–77). Berlin: Springer-

Verlag.Caposecco, A., Hickson, L., & Pedley, K. (2012). Cochlear implant outcomes in adults and adolescents with early-onset hearing loss. Ear and Hearing, 33, 209–220.Chatterjee, M., & Peng, S. C. (2008). Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition. Hearing

Research, 235, 143–156.Cheang, H. S., & Pell, M. (2008). The sound of sarcasm. Speech Communication, 50, 366–381.Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge: Cambridge University Press.Cullington, H. E., & Zeng, F. G. (2011). Comparison of bimodal and bilateral cochlear implant users on speech recognition with competing talker, music perception,

affective prosody discrimination, and talker identification. Ear and Hearing, 32, 16–30.Cutler, A., & Ladd, D. R. (1983). Prosody, models and measurements. Berlin: Springer.Davidson, L. S., Skinner, M. W., Holstad, B. A., Fears, B. T., Richter, M. K., Matusofsky, M., et al. (2009). The effect of instantaneous input dynamic range setting on the

speech perception of children with the Nucleus 24 implant. Ear and Hearing, 30, 340–349.Dmitrieva, E. S., Gel’man, V. Y., Zaitseva, K. A., & Orlov, A. M. (2008). Dependence of the perception of emotional information of speech on the acoustic parameters

of the stimulus in children of various ages. Human Physiology, 34, 149–153.Ferguson, S. H. (2004). Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners. Journal of the Acoustical Society of

America, 116, 2365–2373.Ferguson, S. H., & Kewley-Port, D. (2002). Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. Journal of the

Acoustical Society of America, 112, 259–271.Firszt, J. B., Holden, L. K., Skinner, M. W., Tobey, E. A., Peterson, A., Gaggl, W., et al. (2004). Recognition of speech presented at soft to loud levels by adult cochlear

implant recipients of three cochlear implant systems. Ear and Hearing, 25, 375–387.Friesen, L. M., Shannon, R. V., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic

hearing and cochlear implants. Journal of the Acoustical Society of America, 110, 1150–1163.Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1, 126–152.Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America, 27, 765–768.Fujie, S., Ejiri, Y., Kikuchi, H., & Kobayashi, T. (2006). Recognition of positive/negative attitude and its application to a spoken dialogue system. Systems and

Computers in Japan, 37, 45–55.Gaines, P. (2011). The multifunctionality of discourse operator okay: Evidence from a police interview. Journal of Pragmatics, 43, 3291–3315.Gifford, R. H., & Revit, L. J. (2010). Speech perception for adult cochlear implant recipients in a realistic background noise: Effectiveness of preprocessing strategies

and external options for improving speech recognition in noise. Journal of the American Academy of Audiology, 21, 441–451.Gooskens, C. (2007). The contribution of linguistic factors to the intelligibility of closely related languages. Journal of Multilingual and Multicultural Development,

28, 445–467.Grant, K. W., & Walden, B. E. (1996). Spectral distribution of prosodic information. Journal of Speech, Language, and Hearing Research, 39, 228–238.Gravano, A., Hirschberg, J., & Benus, S. (2012). Affirmative cue words in task-oriented dialogue. Computational Linguistics, 38, 1–39.Hammerschmidt, K., & Jurgens, U. (2007). Acoustical correlates of affective prosody. Journal of Voice, 21, 531–540.Hartmann, W. M. (1998). Signals, sound, and sensation. New York: Springer ScienceBusiness Media Inc.Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97,

3099–3111.Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., & Papsin, B. C. (2009). Recognition of affective speech prosody and facial affect in deaf children with unilateral

right cochlear implants. Child Neuropsychology, 15, 136–146.

Page 16: Perception of vowels and prosody by cochlear implant recipients in noise

M. Van Zyl, J.J. Hanekom / Journal of Communication Disorders 46 (2013) 449–464464

Iverson, P., Smith, C. A., & Evans, B. G. (2006). Vowel recognition via cochlear implants and noise vocoders: Effects of formant movement and duration. Journal ofthe Acoustical Society of America, 120, 3998–4006.

Kewley-Port, D., Burkle, T. Z., & Lee, J. H. (2007). Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing andelderly hearing-impaired listeners. Journal of the Acoustical Society of America, 122, 2365–2375.

Klatt, D. H. (1982). Prediction of perceived phonetic distance from critical-band spectra: A first step. Acoustics, speech, and signal processing. IEEE InternationalConference on ICASSP, 7, 1278–1281.

Ladd, D. R. (1988). Declination ‘reset’ and the hierarchical organization of utterances. Journal of the Acoustical Society of America, 84, 530–544.Lakshminarayanan, K., Ben Shalom, D., Van Wassenhowe, V., Orbelo, D., Houde, J., & Poeppel, D. (2003). The effect of spectral manipulations on the identification of

affective and linguistic prosody. Brain and Language, 84, 250–263.Laures, J. S., & Bunton, K. (2003). Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions. Journal of

Communication Disorders, 36, 449–464.Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press.Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49, 467–477.Lieberman, P. (1960). Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of America, 32, 451–454.Liu, C., & Kewley-Port, D. (2004). Formant discrimination in noise for isolated vowels. Journal of the Acoustical Society of America, 116, 3119–3129.Luo, X., Fu, Q. J., & Galvin, J. J., III (2007). Vocal emotion recognition by normal-hearing listeners and cochlear implant users. Trends in Amplification, 11, 301–315.Luo, X., Fu, Q. J., Wu, H. P., & Hsu, C. J. (2009). Concurrent-vowel and tone recognition by Mandarin-speaking cochlear implant users. Hearing Research, 256, 75–84.Marslen-Wilson, W. D., Tyler, L. K., Warren, P., Grenier, P., & Lee, C. S. (1992). Prosodic effects in minimal attachment. Quarterly Journal of Experimental Psychology,

45 A, 73–87.Meister, H., Landwehr, M., Pyschny, V., & Grugel, L. (2011). Use of intonation contours for speech recognition in noise by cochlear implant recipients. Journal of the

Acoustical Society of America, 129, EL204–EL209.Meister, H., Landwehr, M., Pyschny, V., Walger, M., & Wedel, H. V. (2009). The perception of prosody and speaker gender in normal-hearing listeners and cochlear

implant recipients. International Journal of Audiology, 48, 38–48.Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–352.Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel. Journal of the Association for Research in Otolaryngology, 85, 2114–2134.Monrad-Krohn, G. H. (1947). Dysprosody or altered melody of language. Brain, 70, 405–415.Moore, B. C. J., & Glasberg, B. R. (1988). Gap detection with sinusoids and noise in normal, impaired, and electrically stimulated ears. Journal of the Acoustical Society

of America, 83, 1093–1101.Morton, J., & Jassem, W. (1965). Acoustic correlates of stress. Language and Speech, 8, 159–181.Most, T., Gaon-Sivan, G., Shpak, T., & Luntz, M. (2012). Contribution of a contralateral hearing aid to perception of consonant voicing, intonation, and emotional

state in adult Cochlear Implantees. Journal of Deaf Studies and Deaf Education, 17, 244–258.Munson, B., Donaldson, G. S., Allen, S. L., Collison, E. A., & Nelson, D. A. (2003). Patterns of phoneme perception errors by listeners with cochlear implants as

function of overall speech perception ability. Journal of the Acoustical Society of America, 113, 925–935.Munson, B., & Nelson, P. B. (2005). Phonetic identification in quiet and in noise by listeners with cochlear implants. Journal of the Acoustical Society of America, 118,

2607–2617.Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the

Acoustical Society of America, 93, 1097–1108.Nearey, T. M. (1989). Static, dynamic and relational properties in vowel perception. Journal of the Acoustical Society of America, 85, 2088–2113.Neel, A. T. (2008). Vowel space characteristics and vowel identification accuracy. Journal of Speech Language and Hearing Research, 51, 574–585.Parikh, G., & Loizou, P. C. (2005). The influence of noise on vowel and consonant cues. Journal of the Acoustical Society of America, 118, 3874–3888.Pearsons, K. S., Bennett, R. L., & Fidell, S. (1977). Speech levels in various noise environments (Rep. No. EPA-600/1-77-025). Canoga Park, CA: Bolt Beranek and Newman

Inc.Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. Journal of the Acoustical Society of America, 109, 1668–

1680.Peng, S. C., Lu, N., & Chatterjee, M. (2009). Effects of cooperating and conflicting cues on speech intonation recognition by cochlear implant users and normal

hearing listeners. Audiology and Neurotology, 14, 327–337.Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24, 175–184.Phatak, S. A., Lovitt, A., & Allen, J. B. (2008). Consonant confusions in white noise. Journal of the Acoustical Society of America, 124, 1220–1233.Price, P. J., Ostendorf, M., Shattuck-Hufnagel, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90,

2956–2970.Rogers, C. F., Healy, E. W., & Montgomery, A. A. (2006). Sensitivity to isolated and concurrent intensity and fundamental frequency increments by cochlear implant

users under natural listening conditions. Journal of the Acoustical Society of America, 119, 2276–2287.Skinner, M. W., Holden, L. K., Whitford, L. A., Plant, K. L., Psarros, C., & Holden, T. A. (2002). Speech recognition with the Nucleus 24 SPEAK, ACE, and CIS speech

coding strategies in newly implanted adults. Ear and Hearing, 23, 207–223.Small, A. M., Jr., & Campbell, R. A. (1962). Temporal differential sensitivity for auditory stimuli. The American Journal of Psychology, 75, 401–410.Smith, M. R., Cutler, A., Butterfield, S., & Nimmo-Smith, I. (1989). The perception of rhythm and word boundaries in noise-masked speech. Journal of Speech and

Hearing Research, 32, 912–920.Swanepoel, R., Oosthuizen, D. J. J., & Hanekom, J. J. (2012). The relative importance of spectral cues for vowel recognition in severe noise. Journal of the Acoustical

Society of America, 132, 2652–2662.Tartter, V. C., Hellman, S. A., & Chute, P. M. (1992). Vowel perception strategies of normal-hearing subjects and patients using Nucleus multichannel and 3M/house

cochlear implants. Journal of the Acoustical Society of America, 92, 1269–1283.Van Heerden, R. (1999). Die voorkomsfrekwensie van die spraakklanke van Afrikaans met die oog op fonetiese balansering van oudiologie woordelyste. Department of

Communication Pathology, University of Pretoria (B Communication Pathology Dissertation).Van Heuven, V. J., & Van Zanten, E. (2005). Speech rate as a secondary prosodic characteristic of polarity questions in three languages. Speech Communication, 47,

87–99.Van Zyl, M., & Hanekom, J. J. (2013). When okay is not okay: Acoustic characteristics of single-word prosody conveying reluctance. Journal of the Acoustical Society

of America, 133, EL13–EL19.Wei, C., Cao, K., & Zeng, F. G. (2004). Mandarin tone recognition in cochlear-implant subjects. Hearing Research, 197, 87–95.Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. Journal of the Acoustical Society of America, 52, 1238–1250.Xu, L., Thompson, C. S., & Pfingst, B. E. (2005). Relative contributions of spectral and temporal cues for phoneme recognition. Journal of the Acoustical Society of

America, 117, 3255–3267.Xu, L., & Zheng, Y. (2007). Spectral and temporal cues for phoneme recognition in noise. Journal of the Acoustical Society of America, 122, 1758–1764.