Talker Discrimination, Emotion Identification, and Melody Recognition by Young Children with Bilateral Cochlear Implants
by
Anna Volkova
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Graduate Department of Psychology University of Toronto
© Copyright by Anna Volkova 2012
ii
Talker Discrimination, Emotion Identification, and Melody Recognition by Young
Children with Bilateral Cochlear Implants
Anna Volkova
Doctor of Philosophy
Department of Psychology University of Toronto
2012
Abstract
Users of cochlear implants typically have difficulty differentiating talkers, identifying vocal
expressions of emotion, and recognizing familiar melodies because of the degraded spectral cues
provided by conventional implants. This thesis examined these abilities in a small, relatively
privileged sample of young bilateral implant users. In Study 1 child implant users and a control
sample of hearing children were required to judge whether various utterances were produced by
a man, woman, or girl (Experiment 1) and to identify the voices of cartoon characters from
familiar television programs (Experiment 2). Child implant users’ performance on talker
classification was comparable to that of hearing children. Their identification of cartoon
characters’ voices was less accurate than that of hearing children but well above chance levels.
These findings challenge conventional wisdom about the talker identification difficulties of
implant users. In Study 2 the children were required to indicate whether semantically neutral
utterances (Experiment 1) or classical piano excerpts (Experiment 2) sounded “happy” or “sad”.
In both cases, implant users performed less accurately than hearing children but well above
chance levels. Although the findings on emotion recognition in music are in line with those of
previous research, the findings on emotion in speech are at odds with claims that young implant
users are insensitive to vocal affect. In Study 3 the children were required to identify the theme
songs from familiar television programs on the basis of combined timing and pitch cues as well
iii
as timing or pitch cues alone. Implant users’ performance was comparable to that of hearing
children except when the cues were restricted to pitch relations, which resulted in performance at
chance levels. The findings suggest that the musical representations of young implanted listeners
include precise information about timing and coarser information about pitch. They also
demonstrate, for the first time, that children, both implant users and those with normal hearing,
can identify familiar music on the basis of timing cues alone. Overall, the findings highlight the
importance of timing cues for implant users, the range of individual differences, and habilitation
possibilities for the recognition of talkers, emotion, and music.
iv
Acknowledgements
I am indebted to participants and their families whose cooperation, enthusiasm, and
commitment made this research possible. I thank my supervisors, Dr. Sandra E. Trehub and Dr.
E. Glenn Schellenberg for generous advice, unfailing support and encouragement. I also thank
Dr. Karen A. Gordon for valuable suggestions and research accommodations.
I am grateful to Deanna Feltracco, Judy Plantinga, Sasha Poon, and Lily Zhou for
assistance in data collection; to Marieke van Heugten, Stephen Feltracco, Rebekah Prince, and
Laura Prince for help with stimuli creation; and to Steve Hong for computer programming and
technical support. I gratefully acknowledge the assistance of Vicky Papaioannou, Gina Goulding,
Jerome Valero, and Stephanie Jewell. I am also grateful to Ann Lang for invaluable help in
administrative matters.
Last, but not least, I thank my daughter, Maria Komech, my parents, Margarita Petrosyan
and Dr. Leonid Volkov, and my friend, Dr. Denis Kosygin, for patience and unconditional
support.
v
Table of Contents
General Abstract: Talker Discrimination, Emotion Identification, and Melody Recognition by Young Children with Bilateral Cochlear Implants ..................................................................... ii
Acknowledgements ........................................................................................................................ iv
Table of Contents ............................................................................................................................ v
List of Tables ................................................................................................................................. vii
List of Figures .............................................................................................................................. viii
Introductory Comments ................................................................................................................... 1
Study 1: Children with Bilateral Implants Differentiate Familiar and Unfamiliar Talkers ............ 6
Introduction ................................................................................................................................ 7
Experiment 1 .............................................................................................................................. 9
Method .............................................................................................................................. 10
Results and Discussion ...................................................................................................... 15
Experiment 2 ............................................................................................................................ 21
Method .............................................................................................................................. 22
Results and Discussion ...................................................................................................... 25
General Discussion ................................................................................................................... 27
Study 2: Children with Bilateral Cochlear Implants Identify Emotion in Speech and Music ...... 30
Introduction .............................................................................................................................. 31
Experiment 1 ............................................................................................................................ 34
Method .............................................................................................................................. 36
Results and Discussion ...................................................................................................... 42
Experiment 2 ............................................................................................................................ 45
Method .............................................................................................................................. 48
Results and Discussion ...................................................................................................... 49
General Discussion ................................................................................................................... 54
vi
Study 3: Pitch and Timing Cues in Child Implant Users’ Recognition of Familiar Melodies ..... 56
Introduction .............................................................................................................................. 57
Method ...................................................................................................................................... 61
Results ...................................................................................................................................... 67
Discussion................................................................................................................................. 70
Supplementary Comments ............................................................................................................ 73
References ..................................................................................................................................... 82
vii
List of Tables
Table 1. CI participants: Background information (Study 1)…………………………………... 11
Table 2. Utterances spoken by a man, woman, and girl………………………………………... 12
Table 3. Mean speaking rate values (syllables per second) of the three talkers………………... 13
Table 4. TV characters and their utterances……………………………………………………..23
Table 5. F0 and speaking rate of TV characters selected by CI children………………………..25
Table 6. CI participants: Background information (Study 2). Participants codes are preserved
across studies………….………………………………………………………………………... 37
Table 7. Happy- and sad-sounding speech stimuli……………………………………………... 40
Table 8. CI participants: Background information (Study 3). Participants codes are preserved
across studies……………………………………………….........................................................62
Table 9. Key, pitch range and tempo of melodies extracted from the TV-show theme songs…..63
Table 10. Cumulative scores and demographic profiles of the 11 CI users who completed four or
more tasks comprising the present investigation………………………………………………...81
viii
List of Figures
Figure 1. A spectrogram of the word “elephant” spoken by a woman. The smooth line represents
the intensity contour and the dotted line represents the F0 contour…………………………….14
Figure 2. Performance of CI and NH children as a function of utterance type (Experiment 1).
Error bars represent standard errors……………………………………………………………..16
Figure 3. Performance of CI children as a function of talker and utterance type (Experiment 1)
…………………………………………………………………………………………………..17
Figure 4. Recognition of cartoon voices by CI and NH children (Experiment 2). Error bars
represent standard errors………………………………………………………………………...18
Figure 5. Performance of individual CI children in Experiments 1 and 2……………………...26
Figure 6. Prosodic contours of “The chair has four legs” produced in a happy and a sad manner
by male and female talkers (Experiment 1)……………………………………………………..41
Figure 7. Performance of child CI users and NH children on happy and sad speech as a function
of block order (Experiment 1). Error bars represent standard errors……………………………43
Figure 8. Performance of individual child CI users on happy and sad speech (Experiment 1)
ordered by scores on Block 1 from best to worst. Original participant codes are
preserved………………………………………………………………………………………...44
Figure 9. Performance of child CI users (Block 1) and NH children on happy and sad music
(Experiment 2). Error bars represent standard errors…………………………………………... 51
ix
Figure 10. Performance of individual child CI users on happy and sad music (Experiment 2).
Performance is averaged across the 2 blocks (20 trials) and ordered from best to worst. Original
participant codes are preserved……………………………………………………………........52
Figure 11. Performance on emotion identification in speech and music for the 12 CI users who
participated in both tasks. Performance on each task is averaged across the 2 blocks (36 and 20
trials, respectively)……………………………………………………………………………....52
Figure 12. Accuracy of emotion identification in speech (Experiment 1) as a function of years of
implant use. Performance is averaged across the 2 blocks (36 trials)…………………………..53
Figure 13. Accuracy of emotion identification in music (Experiment 2) as a function of years of
implant use. Performance is averaged across the 2 blocks (20 trials)…………………………..53
Figure 14. Examples of the melodic, timing-only and pitch-only conditions for two TV-show
theme songs: “Backyardigans” and “Diego”……………………………………………………66
Figure 15. Performance of child CI users and NH listeners. Error bars represent standard
errors…………………………………………………………………………………………….67
Figure 16. Performance of individual CI children. Original participant codes are preserved…..69
Figure 17. Performance of individual NH children in the timing-only and pitch-only conditions
…………………………………………………………………………………………………...70
1
Introductory Comments
Cochlear implants (CIs) were developed to make spoken language accessible to
individuals with profound sensorineural hearing loss. The prostheses elicit auditory sensations by
direct stimulation of the auditory nerve. They restore partial hearing to postlingually deaf adults,
enabling many to resume oral conversational interactions in person and over the telephone
(Loizou, 1998). CIs also provide auditory sensations to congenitally or prelingually deafened
children, many of whom have been able to acquire good oral language skills, attend regular
schools, and function successfully in the general community (Geers, 2004; Svirsky, Robbins,
Kirk, Pisoni, & Miyamoto, 2000). Remarkably, the speech perception and production skills of
some child implant users equal those of their hearing peers (Geers, 2006; Nicholas & Geers,
2007).
The signal-processing techniques of contemporary CIs are designed to mimic the
function of the normal cochlea as closely as possible (Gates & Miyamoto, 2003; Loizou, 1998),
but they are inadequate for transmitting some aspects of the auditory signal. More specifically,
CIs deliver temporal envelope cues that are sufficient for speech intelligibility under favorable
(quiet) conditions (Loizou, 1998), but they largely discard fine-structure cues (Smith, Delgutte,
& Oxenham, 2002), limiting listener’s access to spectral information (Wilson et al., 2005). These
limitations interfere with CI users’ ability to perceive speech in noise (Cullington & Zeng, 2008;
Friesen, Shannon, Baskent, & Wang, 2001), talker identity (Cleary, Pisoni, & Kirk, 2005; Fu,
Chinchilla, & Galvin, 2004; Vongpaisal, Trehub, Schellenberg, Van Lieshout, & Papsin, 2010),
lexical tones (Barry et al., 2002), speech prosody (Chatterjee & Peng, 2007; Luo, Fu & Galvin,
2007; Peng, Tomblin, & Turner, 2008), and music (McDermott, 2004).
2
This thesis focuses on perceptual skills that rely, to a considerable extent, on spectral
features in the auditory signal: talker discrimination (Study 1), identification of emotion in
speech and music (Study 2), and melody recognition (Study 3). Individuals with normal hearing
(NH) use voice pitch and voice quality to determine the gender, age, and identity of talkers (Van
Lancker, Kreiman, & Emmorey, 1985). On the basis of intonation patterns or paralinguistic cues,
they differentiate various vocal emotions (Bachorowski, 1999; Scherer, 2003). NH listeners can
also recognize a familiar melody based on relations between successive pitches (e.g., Nimmons
et al., 2007), which enables them to recognize a familiar tune played at different pitch levels and
on different instruments. By contrast, CI users, whether children, adolescents, or adults, have
difficulty differentiating talkers (Cleary et al., 2005; Fu et al., 2004; Fu, Chinchilla, Nogaki, &
Galvin, 2005; Kovačić & Balaban, 2009, 2010), identifying vocal expressions of emotion
(Hopyan-Misakyian, Gordon, Dennis, & Papsin, 2009; Luo et al., 2007; Most & Aviner, 2009),
and identifying familiar melodies (Leal et al., 2003; Nimmons et al., 2007; Olszewski, Gfeller,
Froman, Stordahl, & Tomblin, 2005; Stordahl, 2002; Vongpaisal, Trehub, & Schellenberg, 2006,
2009).
Many studies with CI users have used stimuli that were not ecologically valid and tasks that
were not particularly engaging. The information provided by those studies is of unquestionable
importance, but it may underestimate the abilities of CI users in everyday contexts in which
additional cues may be accessible and useful. In the case of talker identification, for example,
studies with one-syllable utterances (Fu et al., 2005) have revealed extremely poor performance
by adult CI users, yet we know that for NH listeners, longer utterances can lead to increased
accuracy of talker identification (Orchard & Yarmey, 1995). When Cleary et al. (2005) examined
talker identification in child CI users, they used sentence-length utterances but they simulated
variations among talkers by electronically manipulating the pitch and spectral features of a single
3
talker. Those manipulations eliminated temporal cues to talker identity, which are useful to NH
listeners (Lander, Hill, Kamachi, & Vatikiotis-Bateson, 2007; Remez, Fellowes, & Rubin, 1997).
As a result, child CI users had to rely exclusively on spectral cues, which may account for the
very poor outcomes. Vongpaisal et al. (2010) demonstrated that child CI users can make
effective use of temporal cues, such as speaking rate and individual variations in articulation.
Studies of emotional prosody have had relatively poor outcomes with CI users of various
ages (Hopyan-Misakyan et al., 2009; Luo et al., 2007; Most & Peled, 2007), leading many to
conclude that emotion in speech is largely inaccessible to this population. However, these studies
used four or more emotional categories, some of which had considerable overlap in acoustic cues
(Bachorowski, 1999; Scherer, 2003), raising the possibility of better differentiation of emotional
categories that are more acoustically contrastive. Moreover, studies of child CI users’ recognition
of “familiar” melodies have often used melodies that are familiar to the general population
(Stordahl, 2002; Olszewski et al., 2005), but those melodies may be much less familiar to
children who are prelingually deaf than they are to NH children. Child CI users’ performance
has been more successful in studies that have used theme music from television programs that
the children watch regularly (Mitani et al., 2007; Vongpaisal et al, 2009).
The overall strategy in the present thesis was to optimize young CI users’ performance by
using ecologically valid stimuli as much as possible. Accordingly, talker identification was
studied with the use of utterances spoken in a child-directed manner in Studies 1 and 2, including
the voices of TV characters that were familiar and much loved. It also resulted in the use of
theme songs from children’s favorite TV programs, in line with Mitani et al. (2007) and
Vongpaisal et al. (2009). Every effort was made to reduce the cognitive demands of the tasks so
that variations in performance would reflect children’s relative ease or difficulty with the stimuli.
4
Accordingly, all tasks featured forced-choice responses with two or three alternatives, with
feedback in some cases to help children focus on the relevant cues. We also optimized children’s
engagement by embedding all tasks in an interactive game-like environment on a computer.
Most studies of child and adult CI users are marked by enormous individual differences
(Peterson, Pisoni, & Miyamoto, 2010). It is of obvious importance to understand the nature of
such variation and the factors that contribute to it, and there has been some recent headway in
this regard (Geers, Nicholas, & Moog, 2007; Sagi, Kaiser, Meyer, & Svirsky, 2009; Pisoni,
2008). At the same time, there is something to be gained from focusing on the skills of
successful CI users because that provides insight into achievements that are possible for CI users
under optimal or simply reasonable circumstances.
Some of the factors associated with positive language outcomes in children with CIs are
genetic non-syndromic congenital deafness (Kawasaki, Fukushima, Kataoka, Fukuda, &
Nishizaki, 2006; Wu, Lee, Chen, & Hsu, 2008), early age of implantation (Coletti, Carner,
Miorelli, Guida, Coletti, & Fiorino, 2005; Connor, Craig, Raudenbush, Heavner, & Zwolan,
2006; Svirsky, Chin, & Jester, 2007), longer use of processing strategies that emphasize spectral
information (Geers, Brenner, & Davidson, 2003), emphasis on oral communication (Nicholas &
Geers, 2006; Svirsky et al., 2000), non-verbal intelligence (Geers et al., 2003), and cognitive
processing variables, such as working memory capacity and verbal rehearsal speed (Pisoni,
2005). The CI sample of the present investigation incorporated a number of the favorable
circumstances noted above. Specifically, it consisted of a small number of young, congenitally or
prelingually deaf children who had been identified early and implanted at 3.5 years of age or
earlier when residual hearing was insufficient for successful amplification. They used similar
devices programmed with the Advanced Combination Encoder (ACE) processing strategy,
5
which is designed to enhance spectral information (Waltzman & Roland, 2006). At least half of
the CI users were congenitally deaf with a genetic etiology. All CI users were being raised and
educated in an exclusively oral environment, and they were free of cognitive disabilities.
The over-arching goal of this thesis was to study these relatively privileged children with
a view to shedding light on the potential of CI users in three challenging domains: talker
discrimination, emotion identification, and music processing. In principle, the fruits of this
research could advance theory and clinical practice (e.g., therapeutic interventions) with this
population.
6
Study 1: Children with Bilateral Implants Differentiate Familiar and Unfamiliar Talkers
Abstract
The present study examined the ability of prelingually deaf children with bilateral implants to
identify familiar and unfamiliar talkers from utterances of varied duration. In Experiment 1
prelingually deaf children with bilateral cochlear implants classified sentences, short
exclamations, and words as spoken by a man, woman, or child. Child implant users achieved
near-perfect accuracy, as did children with normal hearing. In Experiment 2 children with
bilateral implants were required to identify three familiar cartoon characters from sentence-
length utterances. Their performance was well above chance levels but significantly less accurate
than that of normally hearing children. Several child implant users had error-free performance on
both tasks, which challenges the prevailing views about talker recognition in this population.
7
Introduction
In general, listeners have no difficulty identifying the gender or approximate age (child,
young adult, elderly adult) of unfamiliar talkers on the radio or telephone. At their disposal are
multiple cues to talker identity, including prosody (i.e., intonation and rhythm), voice quality
(i.e., timbre), and pitch level (Van Lancker et al., 1985). The situation is very different for deaf
individuals with cochlear implants (CIs). These prosthetic devices, designed to facilitate access
to spoken language and oral communication, are optimized for speech, which is coded by
amplitude variations over time.
Most CI users can understand speech in favorable (quiet) listening environments, but the
absence of temporal fine structure in the input provides them with degraded pitch and spectral
information (Loizou, 1998; Smith et al., 2002). As a result, they have difficulty deciphering
speech in noise (e.g., Friesen et al., 2001), which is probably exacerbated by unilateral input.
They also have difficulty perceiving music (see McDermott, 2004, for a review), recognizing
vocal emotion (Hopyan-Misakyan et al., 2009), and identifying talkers (Kovačić & Balaban,
2009, 2010). Pitch differences between male and female speakers (~an octave) are often
sufficient for the discrimination of voice gender (Fu et al., 2005), but Kovačić and Balaban
(2009) found that only half of child and adolescent CI users could identify voice gender. Within-
gender contrasts are generally considered difficult or impossible for adult (Fu et al., 2004) and
child CI users (Cleary et al., 2005).
Most studies of talker discrimination by adults and children with CIs have used isolated
syllables (e.g., Fu et al., 2005; Vongphoe & Zeng, 2005) or electronically altered utterances from
a single speaker (Cleary et al., 2005). Both approaches obscure individual variations in speaking
style that may be important for listeners with pitch-processing difficulties. For normal-hearing
8
(NH) listeners, differentiating talkers when pitch and timbre cues are unavailable is helped by
individual differences in phoneme articulation (e.g., Remez et al., 1997) and expressive timing
(Lander et al., 2007).
CIs provide limited pitch and timbre information, but they are effective at transmitting
timing cues. It is possible, then, that talker identification and discrimination would be enhanced
if CI users had access to timing cues from longer or more natural speech samples. Nevertheless,
Kovačić and Balaban (2009) found that children and adolescents with CIs experienced difficulty
with gender identification, even in the context of 2-s excerpts from naturally produced sentences.
They found that duration of deafness, or auditory deprivation, was a better predictor of
performance than age of implantation, with longer periods of auditory deprivation having
particularly adverse consequences. A smaller than usual difference in average fundamental
frequency (approximately half an octave) between the male and female speakers may have posed
an additional source of difficulty for the implant users. Moreover, the linguistic complexity of
the speech samples and the use of multiple talkers may have counteracted the benefits of
“natural” speech samples, especially for the younger participants.
Vongpaisal et al. (2010) examined cross-gender and within-gender identification in
children with CIs using scripted, sentence-length utterances (i.e., same content across speakers)
from familiar (mother) and unfamiliar talkers in a computerized game with feedback. Although
pediatric CI users’ performance was less accurate than that of their hearing peers, they succeeded
in distinguishing their mother’s voice from the voices of a man, a girl, and several unfamiliar
women. Not surprisingly, children were most accurate at distinguishing their mother’s voice
from the highly dissimilar man’s voice and least accurate when distinguishing it from other
women’s voices. The performance of child CI users was slightly less accurate for samples in
9
which the prosodic differences among speakers were reduced, which implies that prosody made
some contribution to identification. Finally, performance improved over the course of the test
session, indicating the contribution of exposure and feedback. Vongpaisal et al. (2010)
speculated that child CI listeners had made use of individual differences in phoneme articulation
and speaking rate to identify the talkers.
The purpose of the present investigation was to ascertain whether CI users younger than
those tested by Vongpaisal et al. (mean of 8.9 years) and Kovačić and Balaban (mean of 12.3
years) could differentiate talkers on the basis of sentence-length as well as briefer utterances. The
focus was on unfamiliar talkers in Experiment 1 and on familiar talkers in Experiment 2. In
contrast to previous studies in this domain, the present CI users were less diverse in
chronological age, hearing history, and prosthetic devices. Among the potential advantages of
child CI users in the current study were bilateral CIs and relatively short durations of deafness.
On the other hand, their young age was a potential disadvantage in view of young children’s
inefficient use of auditory cues when compared with older children (Stalinski, Schellenberg, &
Trehub, 2008).
Experiment 1
CI users 4 to 6 years of age and a control group of hearing children listened to samples of
natural speech from three unfamiliar talkers (man, woman, and girl). The samples included full
sentences, familiar exclamations, and isolated words. Feedback was provided after each trial to
facilitate learning and to motivate the children. Although young CI users were expected to
perform poorly compared to NH children, their modest durations of deafness were expected to
facilitate talker differentiation.
10
Method
Participants. The participants included 14 bilateral CI users (6 girls and 8 boys, M = 5.7
years, SD = 0.8; range 4.1-6.9) who were recruited from a large metropolitan area (for
background information, see Table 1). There were 4 children with progressive hearing loss from
birth and 10 who were congenitally or prelingually deaf. All participants used Nucleus 24
Contour and/or Nucleus Freedom Contour Advance implants programmed to analyze sound
using Advanced Combination Encoder (ACE) processing strategy. The CI users had at least 2
years of implant experience (M = 4.3 years; SD = 0.9 years; range = 2.4−6.1 years). With the
exception of the 4 children with progressive hearing loss, their first implant was activated at 9 to
20 months of age. (M = 1.1 years, SD = 0.3 years). When tested with their implants, absolute
thresholds for tones within the speech range were within normal limits (10-30 dB HL). All CI
children participated in Auditory-Verbal Therapy for at least two years after implantation. They
also communicated exclusively by auditory-oral means and were in age-appropriate school
classes with their NH peers. A comparison sample of NH children consisted of 19 4-year-olds (M
= 4.7 years, SD = 0.3) from the community. No NH child had a personal or family history of
hearing problems, and all were free of colds on the day of testing.
Apparatus and Stimuli. The stimuli consisted of utterances produced by a man, a
woman, and a 10-year-old girl. Each of them produced 18 utterances, consisting of 6 full
sentences, 6 one- or two-word exclamations, and 6 isolated words (nouns) with one to three
syllables (see Table 2). The “actors” were asked to talk in an animated and expressive manner as
if interacting with a child. Two tokens of each utterance type were used from each actor.
11
Table 1. CI participants: Background information.
Participant Gender Age at test (years); E1; E2
Age at 1stand 2ndCI activation (years)
Etiology of hearing loss
CI-1*
CI-2
CI-3
CI-4
CI-5*
CI-6
CI-7
CI-8
CI-9
CI-10*
CI-11
CI-12
CI-13
CI-14*
CI-15+
CI-16+
M
M
M
F
F
M
M
M
F
M
F
F
F
M
M
M
5.8; 5.8
5.2; 4.8
5.3; 5.3
6.3; 5.8
6.9; 6.6
5.3; 5.0
5.8; 5.1
6.1; 6.1
5.8; 5.4
6.5; 6.3
4.8; 4.8
6.9; 6.1
4.1; 4.1
5.5; -
- ; 6.4
- ; 5.5
3.4; 3.4
0.8; 1.7
1.1; 1.1
1.0; 3.6
2.5; 4.0
1.0; 4.6
0.9; 1.8
0.8; 1.5
1.7; 1.7
3.1; 6.3
1.1; 1.1
1.0; 3.5
1.1; 0.8
2.7; 2.7
1.3; 2.3
1.7; 2.7
Genetic
Genetic
Genetic
Genetic
Unknown
Genetic
Genetic
Genetic
Unknown
Mondini dysplasia
Genetic
Unknown
Unknown
Unknown
Genetic
Genetic
* progressive hearing loss from birth + Participated in Experiment 2 only
12
Fundamental frequencies (F0s), amplitude ranges, and utterance durations of the talkers
were calculated using PRAAT software (Boersma & Weenink, 2005). Mean F0s for the
utterances of the man, woman and girl were 106.3 Hz (SD = 9.9 Hz), 243 Hz (SD = 21.8 Hz),
and 263.2 Hz (SD = 16.5 Hz), respectively. Mean amplitude ranges for the man, woman, and girl
were 40 dB (SD = 9.2 dB), 41.6 dB (SD = 5.5 dB), and 43 dB (SD = 4.6 dB), respectively. Figure
1 depicts a spectrogram of a sample utterance. Mean speaking rate values are presented in Table
3.
Table 2. Utterances spoken by a man, woman, and girl. Sentences We are going to the movies tonight.
These flowers are pretty.
Look, it’s raining!
What a beautiful morning!
Do you like this book?
Where’s my hat?
Exclamations Wow!
Look!
Please!
Thank you!
Hi!
Here!
Words cat
dog
rainbow
lion
elephant
umbrella
13
Table 3. Mean speaking rate values (syllables per second) of the three talkers.
Stimuli were recorded in a 3 m x 2.5 m double-walled, sound-attenuating chamber
(Industrial Acoustics Corporation) with a microphone (Sony F-V30T) connected directly to a
Windows XP computer workstation. High-quality digital sound files (44.1 kHz, 16-bit, mono)
were created, and the average amplitude of speech samples was equated with a digital audio
editor (Sound Forge 6.0). Visual stimuli in the talker-identification task consisted of colored
digital photographs (headshots) of the voice actors presented against a white background.
Testing took place in a double-walled sound-attenuating booth, either at a university
laboratory (in the same booth in which the stimuli were recorded) or a comparable facility at a
major children’s hospital, at the convenience of parents. A computer workstation and amplifier
(Harman/Kardon HK3380) outside the university booth were connected with a 17-in touch-
screen monitor (Elo LCD Touch Systems) and two high-quality loudspeakers (Electro-Medical
Instrument Co.) inside the booth. At the hospital venue, a GSI 61 two-channel clinical
audiometer (Grason-Stadler Instruments) replaced the amplifier. In both locations, the
loudspeakers were placed at 45 degrees azimuth to the participant, with the touch-screen monitor
directly in front of the participant. An interactive computer program (customized for Windows
XP) presented stimuli and recorded response selections when the participant touched the screen.
Condition Man Woman Girl
Sentences 4.5 (SD = 1.1) 4.7 (SD = 1.2) 4.0 (SD = 1.0)
Exclamations 1.9 (SD = 0.8) 2.0 (SD = 0.7) 1.7 (SD = 0.7)
Words 3.1 (SD = 1.1) 2.7 (SD = 0.8) 2.8 (SD = 1.0)
14
A portable keyboard was available to the experimenter in case young children preferred to make
their selections by pointing to a picture rather than touching the screen. All stimuli were played
at a comfortable sound level of approximately 65 dB SPL.
Figure 1. A spectrogram of the word “elephant” spoken by a woman. The smooth line represents
the intensity contour and the dotted line represents the F0 contour.
Procedure. Participants were tested individually. At their request, a parent was present in
the booth with some CI participants. Parents were permitted to assist with explanations when the
task was initially described to the child, but they did not interact with the child in any way once
the test phase began. Children were told that they were going to hear people talking, and that
they had to indicate whether the talker was a man, a woman, or a girl by touching one of the
pictures on the screen. There were no practice trials. The stimuli were presented in three blocks,
corresponding to the three conditions. Presentation was in fixed order - sentences first,
15
exclamations next, and isolated words last - for a total of 108 trials. The fixed order was used on
the basis of previous research, which suggested that sentence-length stimuli would be least
difficult and single-word stimuli most difficult (Vongpaisal et al., 2010; Vongphoe & Zeng,
2005). Stimuli within each block (6 utterances X 3 talkers X 2 tokens of each utterance) were
presented randomly. The three colored photographs, consisting of the faces of a man, woman,
and girl, appeared on the screen at the beginning of each trial. Before testing, each participant
identified each photograph as that of a man, woman, or girl. The spatial arrangement of
photographs was identical across participants and trials. After listening to each stimulus,
participants responded by touching the photograph of the presumed talker. They received
feedback after each trial - a schematic smiling face for correct responses and a blank screen for
incorrect responses.
Results and Discussion
Preliminary analyses compared performance collapsed across the different utterances
with chance levels (12 correct) on the three-alternative forced-choice task (36 trials with three
response options on each trial) separately for each speaker and for both groups of children. One-
sample t-tests confirmed that performance exceeded chance levels in each instance, p < .0001.
Figure 2 depicts performance for the two groups of children (CI, NH) on the three utterance
types (Sentences, Exclamations, and Isolated Words). Both groups performed above 90% correct
in all conditions. Figure 3 depicts the performance of CI children as a function of talker (Man,
Woman, and Child). Note the highly accurate classification of the man’s voice. His voice was
misclassified only once in the first condition. In the second and third conditions, performance on
the man’s voice was error-free. In two instances across conditions, the man was misidentified,
16
once as a woman and once as a girl. Otherwise, confusion between the utterances spoken by the
woman and girl was the only source of error for CI participants.
As expected, performance in the CI group was more variable than in the NH group, but
most CI participants scored within one standard deviation of the NH mean in all conditions.
Figure 4 shows individual accuracy scores of CI participants. Because performance by both
groups of children was at ceiling for the male talker’s three utterance types, this condition was
excluded from further consideration.
Figure 2. Performance of CI and NH children as a function of utterance type (Experiment 1).
Error bars represent standard errors.
30
40
50
60
70
80
90
100
Sentences Exclamations Words
% correct
CINH
A two-way mixed-design Analysis of Variance (ANOVA) examined performance as a
function of one between-subjects factor (Group: CI or NH) and two within-subjects factors
(Talker: woman or child; Utterance Type: sentences, exclamations, or isolated words). There was
no effect of Group, indicating that any apparent difference in performance between CI and NH
17
children was not reliable. The main effect of utterance type was significant, F(2, 62) = 5.49, p<
.006. Specifically, performance was significantly better for isolated words than for sentences,
t(32) = 2.93, p < .006, and exclamations, t(32) = 3.25, p < .005, which did not differ. Because
there was no interaction between group and utterance, F < 1, this effect was similar for both
groups of children. There were no other significant main effects or interactions.
Figure 3. Performance of CI children as a function of talker and utterance type (Experiment 1).
Error bars represent standard errors.
30
40
50
60
70
80
90
100
Sentences Exclamations Words
% correct
Man
Woman
Girl
Further consideration of individual data revealed that 11 of the 14 CI participants
classified the woman’s or girl’s utterances correctly before receiving any feedback, that is, on the
very first trial of the first condition, and they continued to perform correctly on subsequent trials.
Moreover, 10 of 14 participants classified both talkers correctly on their respective first trials in
the second condition (Exclamations), and 12 of 14 performed correctly on the first trial in the
18
third condition (Isolated Words). It is likely that increasing familiarity with the talkers coupled
with feedback counteracted any increased difficulty resulting from decreased stimulus duration
(Exclamations and Isolated Words vs Sentences) and prosodic variability (Isolated Words vs
Exclamations or Sentences). Three participants performed at ceiling in all conditions.
Figure 4. Performance of individual CI children in Experiments 1 and 2.
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
% correct
Sentences
Exclamations
Words
Cartoon
Response latencies are imperfect measures in young children because of fluctuations in
attention (Williams, 2006), and were therefore not undertaken in the present study. Informal
observations indicated, however, that the man’s utterances were classified confidently and
quickly, and more hesitation was evident on trials involving the woman and girl. Finally, there
were no systematic relations between performance and chronological age, age of implantation, or
duration of deafness. It is notable, however, that the child with the lowest scores (CI-13) was the
youngest CI user (4.1 years).
19
The principal goal of the present experiment was to determine whether very young
children who use CIs could correctly classify the gender and age of unfamiliar talkers.
Experienced child CI users 4 to 7 years of age successfully judged whether sentences, one-and
two-word exclamations, and isolated one-, two-, or three-syllable words were spoken by a man,
woman, or girl. Moreover, their performance did not differ significantly from that of NH
children. These findings stand in marked contrast with earlier research on talker differentiation in
CI children, which showed performance levels ranging from modest (Vongpaisal et al., 2010) to
poor (Cleary et al., 2005; Kovačić & Balaban, 2009).
The present findings extend those of Vongpaisal et al. (2010), who found that older child
CI users (M = 8.9 years) could differentiate their mother’s utterances from those of an unfamiliar
man, child, or other women. The authors suggested that the familiarity of the mother’s voice was
critical, as was the provision of sentence-length utterances. They suggested, moreover, that child
CI users capitalized on individual differences in temporal cues, specifically, on differences in
phoneme articulation and speaking rate. The present findings indicate that familiar voices are not
essential for successful talker differentiation, nor are full sentences, under certain circumstances,
at least. Nevertheless, motivational aspects of tasks used with children are undoubtedly
important, accounting, perhaps, for the substantial differences between the present findings and
those of Kovačić and Balaban (2009) with much older children (M = 12.3 years).
It is more difficult to pinpoint the contribution of timing cues to young CI users’ success.
The girl’s speaking rate (i.e., syllables per second) was slower than that of the woman for
sentences and exclamations, but differences were negligible for isolated words (see Table 3),
which yielded the highest performance levels. It is likely that CI users in the present study made
productive use of individual differences in consonant and vowel articulation, which enable
20
hearing listeners to identify talkers from severely degraded speech (Remez et al., 1997; Sheffert,
Pisoni, Fellowes, & Remez, 2002).
Child CI users’ greater accuracy of classifying the gender and age of talkers from isolated
words than from exclamations or sentences seems counterintuitive. However, the fixed order of
presentation in the present study, which was prompted by greater anticipated difficulty with
isolated words, precludes effective comparison of the relative ease of identifying the different
classes of stimuli. Because children received feedback on every trial, they had the opportunity to
learn about talker-specific features and to transfer that knowledge to subsequent conditions
involving novel utterances from the same speakers. Such learning effects indicate the potential of
young CI users to benefit from limited training with challenging material.
Child CI users demonstrated exceptional accuracy at classifying the male talker.
According to Fu et al. (2005), CIs provide sufficient pitch information to enable differentiation
of a man’s voice from that of a woman or child. The fact that all children accurately identified
the male talker on the first trial indicates that young CI children, like NH children, have long-
term representations of men’s voices based on their everyday experience.
Although classification of speech samples from the woman and the girl was more
difficult, more than half of the young CI users were correct on their very first trial, before having
an opportunity to compare talkers or make use of feedback. Moreover, three of these children
had error-free performance in all conditions, which is at odds with the view that device
limitations preclude successful talker identification (Kovačić & Balaban, 2009). Further research
could aim to identify the acoustic cues that underlie success in the best CI users as well as those
that are necessary for success in typical CI users.
21
Experiment 2
With a single exception (Vongpaisal et al., 2010), studies of talker identification in CI
users have focused on unfamiliar talkers. In most cases, as in Experiment 1 of the present study,
children and adults were required to differentiate general classes of stimuli (e.g., man or woman;
man, woman, or girl). It is possible, however, that children in Experiment 1 used person-specific
cues instead of, or as well as, general cues that distinguish the three classes of speakers.
Vongpaisal et al. (2010) claimed that the use of maternal voices, in particular, contributed to CI
children’s successful performance. Obviously, the mother’s voice is highly salient to children, as
evident, for example in maternal voice recognition in the neonatal period (DeCasper & Fifer,
1980). By the preschool period, children readily identify their classmates and teachers from
natural voice samples (Bartholomeus, 1973). They also recognize the voices of cartoon
characters from familiar television programs (Spence, Rollins, & Jerger, 2002). Whether children
with CIs would be capable of recognizing the voices of familiar cartoon characters or friends
remains unclear.
The purpose of the present experiment was to evaluate the ability of young bilateral CI
users to identify the voices of cartoon characters from television programs that they watch
regularly. Children would have considerably less exposure to the voices of specific cartoon
characters than to the voices of immediate family members or regular playmates. Nevertheless,
the voice quality and speaking style of TV characters aimed at child audiences are presumably
selected for their distinctiveness and memorability. The visual features of the characters and the
proliferation of toys based on these characters also enhance their overall familiarity and appeal.
We know that child CI users recognize the theme songs of children’s television programs
(Nakata, Trehub, Kanda, Mitani, & Schellenberg, 2005; Vongpaisal et al., 2006), which confirms
that these children attend to the soundtrack.
22
Method
Participants. With one exception (CI-14), bilateral CI users who participated in
Experiment 1 also took part in the present experiment. In addition, there were two boys (CI-15
and CI-16) who did not participate in Experiment 1, resulting in a final sample of 15 children (M
= 5.5 years, SD = 0.7 years, range = 4.1-6.6 years; Table 1). The control sample of NH children
consisted of the 19 children who participated in Experiment 1.
Apparatus and Stimuli. The apparatus was identical to that in Experiment 1.Twelve
cartoon characters were chosen from popular, age-appropriate TV shows. For each character,
five sentence-length utterances were selected from a variety of episodes. Special care was taken
to exclude stereotyped phrases that could provide cues to identity. Utterances were saved at a
sampling rate of 22.1 kHz (16-bit). Average amplitude was normalized across talkers and
resulted in roughly equal average amplitude across utterances (to eliminate loudness cues to
character identity) while preserving amplitude variations within an utterance. An iconic colored
picture of each character was used for purposes of visual identification. Instead of the
photographs of the man, woman, and girl that were displayed on the monitor in Experiment 1,
pictures of three cartoon characters were displayed.
Procedure. Prior to the test session, the experimenter asked the child and parent to
choose 3 characters from TV shows most familiar to the child from the 12 characters that were
available. Before the first trial, pictures of the three characters appeared simultaneously on the
computer monitor, and each child responded accurately when the experimenter pointed to each
picture in turn and asked “Who’s that? Tell me.” The spatial arrangement of pictures remained
constant throughout the test session. Auditory stimuli (3 characters X 5 utterances X 2
23
repetitions), consisting of 30 trials, were presented in random order at an average amplitude of
65 dB SPL. Children received no feedback for correct or incorrect responses.
Table 4. Cartoon characters and their utterances.
TV show Character Utterance
Dora
the Explorer Dora
I miss him so much.
I’m sure of it.
Jump up and down.
I promised you I’d get you out.
Put on your seatbelts.
Boots
That really was a shortcut.
How many umbrellas do we need?
That wouldn’t be too good.
We don’t have fancy clothes.
The contest is starting.
Spongebob Spongebob
So I’m gonna get us back on track.
I’m in big trouble.
Are you ready to give up your life of crime?
Everything I could ever want is right here.
But it doesn’t make any sense.
Patrick
It’s not my wallet.
That’s it alright.
Hold on just one second.
Do it again, I wasn’t looking.
I thought I was doing a pretty good job.
Squidward
Now go spread the word.
Well, you can’t play music with a piece of paper.
He does the opposite.
24
Put my windows back.
I didn’t play any wrong note.
Sandy
That’s gotta hurt.
Well, I can’t argue with that.
I was being a little too sensitive.
You ready to go sand boarding again?
Come and get it.
Bob
the Builder Bob
I think I’m going to need you both.
I’ve already had that idea.
I’ll be right over.
Don’t worry, we’re going to put it up later.
I can’t wait to see it.
Wendy
We had a bit of a slow start.
That must be the special display materials.
And that’s going to be a big job.
We need to measure how deep the trench is.
It’s time to get that screen up.
Sesame Street Bert
So long as I don’t take the pie this time.
Well it’s not exactly my favorite word.
Wait a minute.
I ended up saying you know what.
Forget it.
Ernie
Oh, but this word is different.
He takes the cake.
You’re a goat.
These are all our friends out there.
Have we got a final act yet?
Elmo
Is it fun being a birthday cake?
Are you a banana?
They just try to whistle.
And doggies have great hearing.
25
Tell us about yourself.
Kermit
I ordered a t-shirt with my name on it.
Just like it says on my shirt right here.
And you said it would be ready today?
You got the letters kind of mixed up there.
I don’t believe this.
Max and Ruby Ruby
Now you see why I needed your help.
Don’t touch that lemonade.
Magicians never give away their secrets.
He’s just getting dressed.
Tools do not make music.
Table 5. F0 and speaking rate of TV characters selected by CI children.
Character Average F0 across utterances (Hz)
Speaking rate (syllables/second)
Dora 328.8 (SD = 23.8) 3.0 (SD = 0.9)
Boots 336.4 (SD = 23.3) 3.5 (SD = 0.5)
Ruby 291.8 (SD = 60.2) 3.5 (SD = 0.7)
Sponge Bob 259.2 (SD = 52.4) 3.9 (SD = 0.4)
Patrick 248.0 (SD = 34.9) 3.4 (SD = 1.0)
Squidward 204.8 (SD = 21.4) 3.0 (SD = 0.5)
Elmo 330.0 (SD = 33.6) 3.5 (SD = 0.8)
Results and Discussion
Preliminary analyses compared performance with chance levels on the three-alternative
forced-choice task separately for each group of children. One-sample t-tests confirmed that
26
performance exceeded chance levels in each instance, p < .001. Mean accuracy was 89.7% (SD =
9.3%) for the CI group, and 97% (SD = 4.5%) for the NH group (Figure 5). As in Experiment 1,
performance in the CI group was considerably more variable than in the NH group (see Figure
4). An independent samples, unequal variance t-test revealed that the performance of the CI
children, although highly accurate, was significantly less accurate than that of NH 4-year-olds,
t(19.2) = 2.8, p < .02.
Figure 5. Recognition of cartoon characters by CI and NH children (Experiment 2). Error bars
represent standard errors.
The choice of characters in this experiment was necessarily driven by children’s familiarity
with the programs rather than the relative distinctiveness of talkers. Accordingly, the talkers
differed from one another in various respects, including voice quality, speaking rate, prosody,
and F0. Eight out of 15 CI children chose to listen to a subset of characters portrayed by a
woman (Ruby), a young girl (Dora), and a young boy (Boots). The voices of Dora and Boots
were almost identical in mean F0 across utterances, and the voices of Ruby and Boots were
27
virtually identical in speaking rate (see Table 4). Nevertheless, CI participants were 88% correct
(SD = 7.3%), on average, in differentiating these three characters. Cleary et al. (2005) found that
a difference of two semitones or more was necessary for talker differentiation by NH children
and the best-performing CI children. In the present experiment, CI users discriminated between
voices with much smaller F0 differences. In light of the degraded spectral input provided by their
implants, it is unlikely that children were using F0 differences alone for talker identification.
What was available instead were timing cues, both global cues such as speaking rate as well as
local cues involving idiosyncratic articulation of consonants and vowels. It is likely that
articulation and expressive timing “signatures” of the cartoon characters provided sufficient cues
to identify talkers that were similar in F0 and speaking rate.
General Discussion
The goal of the present study was to examine young bilateral CI users’ ability to identify
familiar and unfamiliar talkers. In Experiment 1, the children were required to differentiate an
unfamiliar man, woman, and girl from a variety of utterance types. Child CI users achieved near-
perfect accuracy at differentiating the talkers, as did a control sample of children with normal
hearing. Although the NH children were one year younger, on average (4.7 years), than the CI
children (5.6 years), the CI children would be considered younger in hearing age (Fagan &
Pisoni, 2010), which reflects the number of years of auditory experience.
In Experiment 2, children were required to identify familiar talkers, specifically, the
cartoon characters from television programs that they watched regularly. Although CI users were
very successful at this task, their performance did not match that of the younger NH children. It
is possible that the provision of feedback for correct responses, like that provided in Experiment
1, would have improved the performance of CI children. There was no way of tracking
children’s cumulative exposure to the television programs, so it is also possible that NH children
28
watched the programs more frequently than CI children did. Parents of CI children are typically
encouraged to interact verbally with their children as much as they can. Nevertheless, the
performance levels of child CI users were impressive and well above the levels reported in other
studies of talker identification among children with CIs.
From the modest size of the present sample, it is impossible to isolate the factors
responsible for the outstanding performance of the CI users. In contrast to previous studies of
talker identification in child CI users, in which most children had unilateral implants (Cleary et
al., 2005; Kovačić & Balaban, 2009; Vongpaisal et al., 2010), all CI participants in the present
study had bilateral implants. To date, bilateral implants have been associated with improved
sound localization and perception of speech in noise (see Johnston, Durieux-Smith, Angus,
O’Connor, & Fitzpatrick, 2009, for a review), but there is no indication that bilateral electrical
input facilitates talker discrimination or identification. This question could be addressed in future
research.
Sharma, Dorman, and Kral (2005) have suggested a sensitive period (approximately 3.5
years of age) for bilateral cortical plasticity. Others have suggested that there are adverse
consequences on auditory plasticity from long delays between the first and second implant
(Gordon, Valero, & Papsin, 2007). Moreover, there are preliminary indications of enhanced
perceptual outcomes for children who receive their implants simultaneously rather than
sequentially (Chadha, Papsin, Jiwani, & Gordon, 2011).
There have been attempts to identify stellar CI performers or “stars” and the factors
associated with their success (e.g., Nicholas & Geers, 2006; Pisoni, 2005; Teagle & Eskridge,
2010). Although all CI users in the present study performed well, the few who exhibited error-
free performance are undeniably stars. Each of these children had demographic factors that have
29
been linked previously with successful auditory and language outcomes, including short duration
of bilateral deafness (Gilley, Sharma, & Dorman, 2008), early implantation (Connor et al., 2006;
Svirsky et al., 2007), emphasis on oral communication (Nicholas & Geers, 2006; Svirsky et al.,
2000), same home language as the language of the general community (Gordon, Tanaka, &
Papsin, 2005), and supportive parents with high levels of education and motivation (see Teagle
& Eskridge, 2010, for a review).
Developmental outcomes of CI users are highly variable (Nicholas & Geers, 2006;
Peterson et al, 2010) and, on average, below those of same-age peers. It is useful, however, to
study a small but privileged sample, like that in the present study, as a window on the potential
of child CI users and as a guide to their habilitation.
30
Study 2: Children with Bilateral Cochlear Implants Identify Emotion in Speech and Music
Abstract
The present study examined the ability of prelingually deaf children with bilateral implants to
identify emotion in speech and music. In Experiment 1 child implant users indicated whether
linguistically neutral utterances spoken in a child-directed manner sounded “happy” or “sad”.
Their performance levels were high but significantly lower than that of children with normal
hearing. Several child implant users had error-free performance, which challenges prevailing
views about their inability to perceive emotion in speech. In Experiment 2 children with bilateral
implants classified short piano excerpts as “happy” or “sad”. Their performance was well above
chance levels but significantly less accurate than that of normally hearing children.
31
Introduction
Cochlear implants (CIs) have made auditory-verbal communication accessible to many
children with profound hearing loss (e.g., Geers, 2003). The benefits for congenitally deaf
children seem greatest when they receive their implants early, say by 1-3 years of age (Geers,
2004; Nicholas & Geers, 2006; Waltzman & Cohen, 1998). Such early access to language
facilitates social and emotional development. For example, deaf children with good language
skills (oral or sign language) are reported to have fewer psychosocial difficulties than their peers
with more limited language skills (Dammeyer, 2010). For child CI users with hearing parents,
exposure to a spoken language has been linked to greater social well-being (Percy-Smith et al.,
2008), perhaps because few hearing parents are sufficiently proficient in sign language to enable
optimal parent-child communication.
A critical aspect of interpersonal communication in general and of parent-child
communication in particular is the exchange of affective messages, both verbal and non-verbal.
In addition to universal means of communicating affect visually by way of facial expression,
posture, and movement, there are auditory means of communicating affect that are equally
universal, including vocal modulation and music (Bachorowski, 1999; Scherer, 2003; Balkwill &
Thompson, 1999; Zentner, Grandjean, & Scherer, 2008). The ease with which children with CIs
identify the emotional valence of vocal, non-verbal sounds has implications for their emotional
well-being, as assessed by self-report (Schorr, Roth, & Fox, 2009). There is relatively little
research, however, on the ability of CI users to identify the intended emotion in speech and
music.
Portrayals of emotion in speech and music are guided both by universal aspects of
emotional expression and by socio-cultural conventions (Bachorowski, 1999; Balkwill &
Thompson, 1999; Scherer, Banse, & Wallbot, 2001). Well before children acquire the rudiments
32
of language, they are exposed to some of these modes of expression. For example, mothers
convey emotion in their speech to pre-verbal infants by means of exaggerated prosody (Fernald,
1991; Papoušek, 1992) and heightened affect (Trainor, Austin, & Desjardins, 2000), which
confer a musical flavor to such speech (Fernald, 1989; Trainor, Clark, Huntley, & Adams, 1997).
From the early months, infants prefer infant-directed speech to adult-directed speech, which is
more neutral in emotional tone (Cooper & Aslin, 1990; Fernald, 1985; Singh, Morgan, & Best,
2002). Mothers across cultures also sing expressively to their infants (Trehub, Trainor, & Unyk,
1993), sometimes for playful purposes and sometimes for soothing purposes (Trehub & Trainor,
1998). Infants prefer infant-directed singing over informal but non-infant-directed singing
(Trainor, 1996), even in the newborn period (Masataka, 1999), which suggests that such
expressive singing may have intrinsic appeal.
Congenitally deaf infants do not have access to these affective vocalizations in the early
months of life because, even with early diagnosis, they are unlikely to receive implants much
before 12 months of age (Holt & Svirsky, 2008). Because of delays in receiving expressive vocal
input, one would expect delays in child CI user’s ability to interpret and transmit vocal affect.
Even in the post-implant period, the degraded pitch and spectral cues provided by implants
(Geurts & Wouters, 2001; Loizou, 1998) interfere with the processing of vocal emotion.
Variations in fundamental frequency (F0), or pitch, contribute to the differentiation of emotions
in speech and music, although amplitude, tempo (rate), and rhythm are also important
(Bachorowski, 1999, Laukka, Juslin & Bresin, 2005; Scherer, 1986, 2003). Amplitude and
timing variations are accessible to CI users (Loizou, 1998), but for normal-hearing (NH)
individuals, these cues are typically used in conjunction with pitch variations to differentiate
vocal emotions (Bachorowski, 1999; Scherer, 2003). The available literature indicates that CI
users have considerable difficulty identifying emotion in speech (Luo, Fu & Galvin, 2007; Most
33
& Aviner, 2009; Hopyan-Misakyan et al., 2009). For example, prelingually deaf children 7-13
years of age with unilateral implants are as accurate at identifying facial expressions (happy, sad,
angry, fearful) as their hearing peers, but the CI users perform at chance levels on affective
speech prosody that poses little difficulty for NH children (Hopyan-Misakyan et al., 2009).
Variations in pitch, amplitude, and timing characterize music as well as speech (Kraus,
Skoe, Parberry-Clark & Ashley, 2009). Music and speech also share some cues to specific
emotions (Juslin & Laukka, 2003). The adverse consequences of degraded pitch and spectral
information for music processing have been well documented (see McDermott, 2004, and Kraus
et al., 2009 for reviews). This research has focused largely on CI users’ difficulty differentiating
melodies (Galvin, Fu, & Nogaki, 2007; Vongpaisal, Trehub, & Schellenberg, 2006, 2009) and
identifying familiar music on the basis of pitch cues alone (Hsiao, 2008; Kong, Cruz, Jones, &
Zeng, 2004; Nimmons, 2007; Stordahl, 2002). Not surprisingly, this difficulty makes music
unpalatable to many postlingually deafened CI users (Gfeller et al., 2000; Lassaletta et al., 2007;
Leal et al., 2003), but congenitally or prelingually deaf children typically enjoy music listening
and music making (Gfeller et al., 2000; Nakata, Trehub, Mitani, & Kanda, 2006; Vongpaisal et
al., 2006). To date, there has been only one study of emotion identification in music by CI users
(Hopyan, Gordon, & Papsin, 2011). Hopyan et al. (2011) found that prelingually deaf children 7-
13 years of age with unilateral implants distinguished happy from sad musical excerpts but their
accuracy was significantly lower than that of same-age NH children.
The goal of the present investigation was to ascertain whether bilateral CI users 4-7 years
of age could identify happiness and sadness in speech and music. Previous research with
unilateral CI users 7-13 years of age indicated that they were unable to differentiate the emotions
of happiness, sadness, anger, and fear expressed prosodically in speech (Hopyan-Misaykan et al.,
34
2009), but they could distinguish happy from sad intentions in instrumental (piano) music
(Hopyan et al., 2011). Children’s judgments of emotion in speech were examined in Experiment
1, and their judgments of emotion in music were tested in Experiment 2.
Experiment 1
In typical studies of emotion identification in speech among NH listeners, participants
hear utterances with semantically neutral content spoken in a manner that portrays specific
emotions. They respond by selecting one of several emotions from a closed set (see Scherer,
2003, for a review). Luo et al. (2007) used such a task to present adult CI users with men’s and
women’s utterances that conveyed happiness, sadness, anger, anxiety, and emotional neutrality.
Amplitude cues were preserved in some cases and normalized in others. Even with amplitude
cues preserved, CI users identified less than half of the emotions, performing substantially below
the levels attained by NH adults. Most and Aviner (2009) compared children and adolescents
with CIs (10-17 years of age) with same-age NH individuals and with deaf hearing-aid users on
their identification of happiness, sadness, fear, anger, surprise, and disgust in auditory, visual,
and audio-visual contexts. The performance of both deaf groups was significantly worse than
that of hearing individuals in the auditory context. Although hearing individuals performed better
in audio-visual than in visual contexts, the addition of auditory cues provided no advantage for
CI and hearing-aid users. In other words, auditory cues resolved the ambiguity of some facial
expressions for hearing listeners but not for deaf listeners. CI users’ difficulty differentiating the
six vocal emotions is in line with 7- to 13-year-olds’ difficulty differentiating happy, sad, angry,
and fearful vocal emotions (Hopyan-Misakyan et al., 2009). The conclusion of the
aforementioned studies was that the pitch and spectral cues provided by CIs are insufficient for
the differentiation of vocal emotions.
35
Unquestionably, NH individuals outperform CI users on the identification of vocal
emotion, but even their performance is far from perfect. In general, NH adults identify 60% to
70% of the target emotions in tasks involving multiple emotions, with some emotions identified
more readily than others (Luo et al., 2007; Scherer, 2003). Moreover, NH individuals perform
better on some voices than on others, and their performance may be affected by talker familiarity
and other talker-specific factors (see Bachorowski, 1999, for a review). Luo et al. (2007) found
that CI users often confused happiness with anger or fear, and sadness with neutrality. Although
NH listeners were much more accurate than CI listeners, both had similar confusion patterns.
Furthermore, the removal of amplitude cues impaired the performance of NH as well as CI
listeners. Most and Aviner (2009) found similar confusion patterns in NH listeners and their
counterparts with hearing loss. Happiness was identified the most accurately, followed by anger
and disgust. It is clear, then, that vocal emotions are more difficult to discern than emotional
facial expressions, even for listeners with access to the full range of pitch and spectral cues.
Acoustic cues may be insufficient for accurate identification of discrete emotional
categories in adult speech (e.g., Bachorowski, 1999). Rather, such cues may be more useful for
indicating the talker’s level of nonspecific arousal (Bachorowski, 1999; Bänziger & Scherer,
2005). Indeed, NH listeners tend to confuse emotions associated with similar arousal levels. For
example, they confuse happiness and anger, which share high arousal levels and the acoustic
cues of high pitch, pitch variability, rapid speaking rate, and amplitude variability. They also
confuse neutrality and sadness, which share low arousal levels and the acoustic cues of low
pitch, slow speaking rate, and reduced pitch and amplitude variability. Perhaps CI listeners
would have more success in differentiating emotions if the emotions in question had contrastive
arousal levels, such as happiness and sadness. They might also be more successful with samples
of speech delivered in a child-directed rather than an adult-directed manner. Aside from the fact
36
that the child-directed manner of speech would be more familiar as well as more engaging to
young children, the emotional intentions are more transparent in child-directed than in adult-
directed speech both within and across cultures (Bryant & Barrett, 2007; Fernald, 1989).
In the present experiment, young bilateral CI users were required to identify child-
directed speech samples as happy or sad. CI users 5-7 years of age and a control group of NH
children listened to natural, expressive utterances produced by a man and a woman. The man’s
speech could offer potential processing advantages for CI users because of its lower pitch range
(Chattarjee & Peng, 2007; Vongpaisal et al., 2010). It is possible, however, that the woman
might express affect more distinctively than the man (Luo et al., 2007). Because of the use of a
child-directed speaking style and emotion categories that contrasted in arousal, young CI users in
the present experiment were expected to differentiate emotions more effectively than older CI
users in previous studies (Luo et al., 2007, Most & Aviner, 2009, Hopyan-Misakyan et al.,
2009). Nevertheless, because of child CI users’ diminished access to pitch and spectral cues, they
were expected to perform poorly compared to their NH peers.
Method
Participants. The participants included 14 bilateral CI users (5 girls and 9 boys, M = 5.8
years, SD = 0.6; range: 5.1-7.0) from middle-class families who were recruited from a large
metropolitan area (see Table 1). Four children had progressive hearing loss from birth and 10
were congenitally or prelingually deaf. As can be seen in Table 1, all participants used Nucleus
implants with Contour or Freedom processors programmed with the Advanced Combinational
Encoder (ACE) processing strategy, and each had a minimum of 2.5 years of implant experience
(M = 4.3 years, SD = 0.8, range: 2.8−5.3).
37
Table 6. CI participants: Background information. Participant codes are preserved across studies.
* progressive hearing loss from birth
With the exception of children with progressive hearing loss, their first implant was
activated between 9 and 27 months of age. When tested with their implants, absolute thresholds
Participant Gender Age at test (years)
E1; E2
Age at 1stand 2ndCI activation
Etiology
CI-1*
CI-2
CI-3
CI-4
CI-5*
CI-6
CI-7
CI-8
CI-9
CI-10*
CI-11
CI-12
CI-13
CI-14*
CI-15
CI-17
M
M
M
F
F
M
M
M
F
M
F
F
F
M
M
F
6.4; 6.4
5.2; 5.5
5.4; 5.4
6.3; 6.3
7.0; 6.9
5.3; 5.3
5.1; 5.8
6.1; 6.3
- ; 5.8
6.4; 6.4
5.1; -
6.3; 6.9
- ; 4.1
5.5; 5.5
6.4; -
5.1; 5.1
3.4; 3.4
0.8; 1.7
1.1; 1.1
1.0; 3.6
2.5; 4.0
1.0; 4.6
0.9; 1.8
0.8; 1.5
1.7; 1.7
3.1; 6.3
1.1; 1.1
1.0; 3.5
1.1; 1.1
2.7; 2.7
1.3; 2.3
1.1; 3.4
Genetic
Genetic
Genetic
Genetic
Unknown
Genetic
Genetic
Genetic
Unknown
Mondini dysplasia
Genetic
Unknown
Unknown
Unknown
Genetic
Unknown
38
for tones within the speech range were within normal limits (10-30 dB HL). All children with
CIs participated in auditory-verbal therapy for at least 2 years after implantation. They also
communicated exclusively by auditory-oral means and were in age-appropriate school classes
with their NH peers. A comparison sample of NH children consisted of 18 preschoolers (12 girls
and 6 boys, M = 5.4 years, SD = 0.5, range: 4.8-6.2) from middle-class families who were
recruited from the local community. Their mean age was slightly younger than the mean for the
CI group, t(30) = 2.05, p = .049. No NH child had a personal or family history of hearing
problems, and all children were free of colds on the day of testing.
Apparatus and stimuli. A man and a woman were instructed to produce “happy” and
“sad” versions of the following three sentences: The lamp is on the table, Flowers grow in the
garden, and A chair has four legs. The “actors” were asked to speak naturally but in an
expressive manner, as if talking to children. F0, amplitude range, and duration of all utterances
from the two talkers were calculated using PRAAT software (Boersma & Weenink, 2005). Table
7 provides information about mean F0, F0 range, amplitude, and duration for happy and sad
utterances produced by the man and woman. F0 contours of the stimuli produced by the two
talkers are illustrated in Figure 6. Stimuli largely conformed to cultural conventions for the vocal
portrayal of happiness and sadness, as described by Johnstone and Scherer (2000). Specifically,
the man’s and woman’s sad utterances had a smaller F0 range and were lower in overall
amplitude than their happy utterances. Although the man’s sad utterances were longer in duration
than his happy utterances, as expected, the woman’s happy utterances were longer in duration
than her sad utterances (Table 7). In other words, duration was not a reliable cue to the target
emotion.
The stimuli were recorded in a 3 m x 2.5 m double-walled, sound-attenuating chamber
(Industrial Acoustics Corporation) with a microphone (Sony F-V30T) connected directly to a
39
Windows XP computer workstation. High-quality digital sound files (44.1 kHz, 16-bit, mono)
were created using a digital audio editor (Sound Forge 6.0). Four colored digital photographs
(headshots) of the speakers were taken against a white background. Two were depictions of the
man with a facial expression consistent with happiness in one and an expression consistent with
sadness in the other. The two others were comparable depictions of the same expressions by the
woman.
Testing took place in a double-walled sound-attenuating booth, either at the university
laboratory (Industrial Acoustics Corporation, 3 m X 2.5 m) or at a local children’s hospital (4.3
m X 2.7 m). A computer workstation and amplifier (Harman/Kardon HK3380) located outside
the university booth were connected with a 17-in touch-screen monitor (Elo LCD
TouchSystems) and two high-quality loudspeakers (Electro-Medical Instrument Co.) inside the
booth. At the hospital venue, a GSI 61 two-channel clinical audiometer (Grason-Stadler
Instruments) replaced the amplifier. The loudspeakers were placed at 45º azimuth to the
participant with the touch-screen monitor at the midpoint. An interactive computer program
(customized for Windows XP) presented stimuli and recorded response selections when the
participant touched the screen. The experimenter could record the participants’ responses using a
portable keyboard connected to the workstation when young children preferred to point to the
selection rather than touching the screen or if their touch was not firm enough to activate the
screen. All stimuli were played at a comfortable sound level of approximately 65 dB SPL.
Procedure. Participants were tested individually. At their request, a parent was present in
the booth with some CI participants. Parents were permitted to assist with explanations when the
task was initially described to the child and during practice trials, but they did not interact with
the child in any way once the test phase began. The experimenter explained to the children that
they were going to hear a man or a lady talk and that they should decide whether the talker
40
sounded happy or sad. The participants were explicitly instructed to listen for how the talker
sounds. Children listened to four practice trials to familiarize them with the task. During practice
trials, the experimenter answered any of children’s questions about the task, but she provided no
feedback about the accuracy of their responses. Children were also told that they could listen to
any stimulus more than once, in fact, as often as they liked.
Table 7. Happy- and sad-sounding speech stimuli.
Emotion/Gender Sentence Mean F0(Hz)
F0 range (Hz)
Mean Amplitude (dB)
Duration(s)
Happy/Female Chair 313.84 335.71 72.55 1.77
Happy/Female Flowers 289.18 349.38 72.86 1.92
Happy/Female Lamp 285.15 330.07 72.8 1.81
Happy/Male Chair 132.35 108.31 71.95 2
Happy/Male Flowers 156.21 136.03 72.86 1.64
Happy/Male Lamp 134.87 107.01 72.48 1.44
Sad/Female Chair 190.19 114.62 70.26 1.66
Sad/Female Flowers 176.23 83.64 69.58 1.57
Sad/Female Lamp 186.28 57.37 71.04 1.41
Sad/Male Chair 94.15 36.12 69.22 1.95
Sad/Male Flowers 96.66 45.42 70.34 1.91
Sad/Male Lamp 102.13 43.06 71.08 1.73
41
Figure 6. Prosodic contours of “The chair has four legs” produced in a happy and a sad manner
by male and female talkers (Experiment 1).
Auditory stimuli were presented in two blocks, one for the male talker and one for the
female talker. The order of blocks was counterbalanced across participants. Stimuli within each
block consisted of 3 sentences in both emotions, with each sentence repeated 3 times and order
randomized for a total of 36 trials in both blocks. As noted, the participant had the option of
hearing a stimulus repeated before making a decision. At the beginning of each trial, colored
photographs of a man or woman (depending on the trial block) with a happy and sad facial
42
expression were presented on the monitor. Before beginning each block of trials, the
experimenter verified that the child could identify the facial expression in each photograph by
asking “Is the man/lady in this picture happy or sad?” The spatial arrangement of photographs
(left/right) was identical for all participants across all trials. After listening to each stimulus,
children responded by touching one of the photographs. No feedback was provided, other than
general encouragement and praise that was offered periodically to maintain the children’s
enthusiasm and cooperation.
Results and Discussion
Preliminary analyses revealed that performance was not distributed normally for either
the man talker, p = .027, or the woman talker, p = .020, or for performance averaged across
talkers, p = .034 (Kolmogorov-Smirnoff tests). Although overall performance in the CI group
was more variable than in the NH group, this difference fell short of statistical significance,
F(13, 17) = 3.37, p = .076 (Levene’s test). Nevertheless, subsequent analyses used nonparametric
tests.
We first examined the number of children who performed significantly better than
chance. With a binomial test (normal approximation, correcting for continuity, p < .05, one-
tailed), overall performance required 24 or more correct responses on the 36 trials to exceed
chance levels. For child CI users, 12 of 14 surpassed chance, and for NH children, 17 of 18 did
so. In short, performance was remarkably good in both groups and consistent across individuals.
A direct comparison of groups on overall performance confirmed that the NH children
performed significantly better than the child CI users, p = .047 (Mann-Whitney U Test). Poorer
performance for child CI users than for NH children was evident for the woman talker, p = .029,
but not for the man, p = .107 (Mann-Whitney U Tests). Although NH children showed
significant improvement from the first to the second block of trials, p = .013, comparable
43
improvement was not evident among the CI children, p = .135 (Wilcoxon Signed Rank Tests).
Performance of both groups as a function of block order is illustrated in Figure 7.
Figure 7. Performance of child CI users and NH children on happy and sad speech as a function
of block order (Experiment 1). Error bars represent standard errors.
Performance of the CI children collapsed across talkers was associated positively with
duration of implant use, r = .60, N = 14, p = .012 (one-tailed), which is an impressive finding in
view of the small sample. Figure 8 depicts individual scores of children with CIs. Two of them
exhibited error-free performance across talkers. An additional two had error-free performance for
the woman talker only, and two others had error-free performance for the man talker only.
Interestingly, only six NH children achieved error-free performance on both trial blocks,
although their performance in general was more consistent than that of children with CIs.
CI users mistook sad-sounding speech as happy (76% of all errors) more often than they
mistook happy-sounding speech as sad. This was especially noticeable for the poorest
44
performing CI users. This pattern of confusions was also reported by Luo et al. (2007) and Most
and Aviner (2009), which may reflect a response bias for positive emotions rather than greater
ease of identifying happy-sounding speech. In any case, young CI users performed successfully
on a two-alternative forced-choice task that required them to differentiate happy from sad
speech, unlike older CI users who failed to identify happy and sad utterances in the context of a
four-alternative forced-choice task (Hopyan-Misakyan et al., 2009).
Figure 8. Performance of individual CI users on happy and sad speech (Experiment 1)
ordered by scores on Block 1 from best to worst. Original participant codes are preserved.
As noted, natural speaking rate and amplitude variations were preserved in the present
stimuli and were available as potential cues. Differences in overall amplitude and amplitude
variation were consistent across talkers, but differences in speaking rate across emotion
categories were inconsistent across speakers (see Table 7). As a result, speaking rate, which
usually distinguishes happy- from sad-sounding speech, was an unreliable cue in the present
45
experiment. Differences in F0 range between happy and sad stimuli were large and consistent
across talkers (Table 7). Differences in F0 contour also provided reliable cues to happy and sad
speech, but they may have been inaccessible to CI users (Geurts and Wouters, 2001; Loizou,
1998). When contour contrasts are substantial, however, as in statements versus questions, CI
users perform above chance levels although well below the levels attained by NH listeners
(Meister et al., 2009; Most and Peled, 2007; Peng et al., 2008). It is possible that the young CI
users in the present experiment used a combination of acoustic cues to differentiate the happy
from sad utterances although amplitude cues would have been the most prominent of these. In
the absence of feedback, however, it is clear that CI users had reasonable representations of
happy and sad vocal qualities on the basis of their everyday experience. Their ability to
differentiate music excerpts expressing happy and sad emotions in conventional ways was
assessed in Experiment 2.
Experiment 2
There is increasing interest in the non-musical as well as musical consequences of
children’s short-and long-term involvement in musical activities (Kirschner & Tomasello, 2010;
Schellenberg, 2006). For example, listening to pleasant music has consequences for children’s
prosocial behavior (Kirschner & Tomasello, 2010) and their performance on a variety of tasks
(Schellenberg & Hallam, 2006; Schellenberg, Nakata, Hunter, & Tamoto, 2007). Moreover,
music lessons have been linked to long-term cognitive outcomes (Schellenberg, 2006, 2011).
There is also long-standing interest in adults’ and children’s ability to understand emotion
in music. Typical tasks in this realm require children to link musical excerpts to discrete
emotional categories such as happiness, sadness, anger, and fear. Despite the ubiquity of this
type of task, it seems less appropriate for describing emotional aspects of music than it does for
speech or facial expressions (Trehub, Hannon, & Schachner, 2010). Such labels aptly describe
46
the feelings associated with vocal or facial emotional expressions, but they seem much less
suitable for instrumental musical excerpts. Instead of discerning the emotions of the performer or
composers or the emotional consequences on the listener, which may involve non-specific
arousal or general feelings of pleasure (Salimpoor, Benovoy, Longo, Cooperstock, & Zatorre,
2009), listeners are typically expected to discern the emotional intentions of the performance. To
do so requires familiarity with culturally typical uses of emotional labels in relation to music.
Adults use some combination of tempo, loudness, pitch level, mode (major or minor),
and consonance or dissonance to judge the emotional intentions of music excerpts (see Hunter &
Schellenberg, 2010, for a review), but tempo and mode have received the most attention. In
general, Western adults judge music in the major mode and with rapid tempo as happy and music
in the minor mode and with slow tempo as sad (Peretz, Gagnon, & Bouchard, 1998). Unlike
tempo, which often differentiates happy- from sad-sounding music across cultures, mode does
not (e.g., Balkwill & Thompson, 1999). Despite the cross-cultural importance of tempo to
musical emotions, it is still necessary to learn the musical conventions involving tempo as well
as mode. With excerpts from the classical repertoire, 4-year-old children fail to link musical
mode, tempo, or their combination with happiness and sadness, 5-year-olds use tempo to
differentiate happy from sad musical excerpts, and 6- to 8-year-olds link both cues, separately
and in combination, to happiness and sadness (Dalla Bella, Peretz, Rousseau & Gosselin, 2001).
When the stimuli consist of children’s songs rather than classical music, 4-year-olds seem to use
tempo to distinguish happy from sad excerpts (Mote, 2011). In general, however, listeners of all
ages find high-arousal emotions (e.g., happiness, anger) easier to identify in music compared to
low-arousal emotions (e.g., sadness or peacefulness; Hunter, Schellenberg, & Stalinski, 2011),
but the difference is particularly strong among young children.
47
Given the musical pitch processing difficulties of CI users (see McDermott, 2004, for
review), it is not surprising that they are unable to discriminate major from minor melodic
patterns (Vongpaisal et al., 2006). Although CI users can resolve timing differences and they use
such differences in music-recognition tasks (Hsiao, 2008; Kong et al., 2004; Stordhal, 2002), it is
unclear when child CI users first link those and other acoustic cues with musical emotions.
Hopyan et al. (2011) found that child CI users 7-13 years of age reliably differentiated happy
from sad musical pieces when the stimuli were synthesized piano versions of classical pieces that
have been used in previous research on musical emotions (Dalla Bella et al., 2001; Hunter et al.,
2011; Peretz et al., 1998; Schellenberg, Peretz, & Vieillard, 2008; Vieillard et al., 2008). Not
surprisingly, however, the CI users performed more poorly than same-age NH children.
Although the stimuli had tempo, mode, and other cues to emotion, the authors speculated that CI
users based their judgments primarily on tempo.
According to Dalla Bella et al. (2001), children younger than 5 are unable to use tempo
and those younger than 6 are unable to use mode to differentiate conventionally happy from sad
musical excerpts from the classical repertoire. In the present experiment, selected piano excerpts
from Vieillard et al. (2008), which were also from the classical repertoire, were used to evaluate
the ability of 4- to 7-year-old child CI users to distinguish happy- from sad-sounding music. On
average, these children had roughly 4 years of implant experience, compared to the 7 years of
average implant experience of children in the Hopyan et al. (2011) study. Clinically, the hearing
age of the present sample would be considered 4 years, corresponding to their years of auditory
input. In line with the findings by Dalla Bella et al. (2001), child CI users with less 5 years of
auditory experience might be unable to associate classical music excerpts with happy and sad
emotions. Aside from their reduced quantity of auditory input relative to same-age peers, child
CI users would also have a considerably reduced quality of musical input. In view of their
48
limited auditory and musical experience, it was important to ascertain whether they would be
able to differentiate emotions from samples of classical music. Unlike the happy and sad speech
samples in Experiment 1, which differed in amplitude variability, the tones in all musical
excerpts were equivalent in amplitude, eliminating an important cue to happiness and sadness.
Mode cues were available in the present stimuli, but they were expected to be potentially useful
only for the control sample of NH listeners.
Method
Participants. The participants were 14 CI users (6 girls and 8 boys), 12 of whom took
part in Experiment 1. The average age of participants was 5.8 years (SD = 0.8, range: 4.1-6.9),
and the average duration of implant experience was 4.2 years (SD = 1.0, range: 2.8-5.9). The two
additional participants, both girls, were congenitally deaf and satisfied the criteria for participants
in Experiment 1. The control sample consisted of the same 18 NH children who were tested in
Experiment 1.
Apparatus and Stimuli. The musical stimuli consisted of 10 short (approximately 10-s)
synthesized piano excerpts, 5 happy and 5 sad, from the corpus of Vieillard et al. (2008), which
includes excerpts for both emotions. The excerpts, which were from the Western classical
repertoire, had the original pitch and duration values (corresponding to the musical score) but all
tones were of equal amplitude.
These particular excerpts were selected from a larger sample because their emotional
status is identified most reliably by adults (Hunter et al., 2011). The happy-sounding excerpts
were in the major mode and had a rapid tempo (mean of 137 beats per minute), in contrast to the
sad-sounding excerpts, which were in the minor mode and had a slow tempo (mean of 46 beats
per minute). Visual depictions of happiness and sadness consisted of close-ups of a frame from
49
each of two animated feature films by Hirao Miyazaki, “My Neighbour Totoro” (1988) and
“Spirited Away” (2001). The apparatus was identical to that of Experiment 1.
Procedure. The procedure was similar to that of Experiment 1. NH children heard 5
happy- and 5 sad-sounding excerpts presented randomly for a total of 10 trials. Children with CIs
heard two blocks of the 10 trials, with the order of stimuli randomized within both blocks. There
were no practice trials and no feedback for correct or incorrect responses.
Results and Discussion
On the first block of trials, NH children performed near ceiling (97.8% correct), and they
were much less variable than the child CI users, F (13, 17) = 22.84, p < .001 (Levene’s Test).
Moreover, performance was not distributed normally, p = .005 (Kolmogorov-Smirnov test).
Thus, nonparametric analyses were used, as in Experiment 1. We initially examined how many
children exceeded chance levels, which required scores of 9 or 10 correct on the 10 trials in each
block (binomial test, p < .05, one-tailed). On the first block, 5 of 14 CI children and 17 of 18 NH
children exceeded chance. On the second block of trials, 10 of 14 CI children surpassed chance.
For the first block (i.e., the only block completed by both groups), the difference between groups
in the proportion of children exceeding chance was significant, χ2(1, N = 32) = 12.64, p < .001.
A non-parametric comparison of actual scores, contrasting the NH children with CI users (first
block), also confirmed an advantage for the NH group, p < .001 (Mann-Whitney U Test). For
child CI users, improvement across trial blocks was not significant, p = .283 (Wilcoxon Signed
Rank Test).
Individual differences among the CI children, collapsed across blocks, are illustrated in
Figure 10. In general, the performance of 4- to 7-year-old CI users (first block: 76.4% correct;
second block: 83.6%) was comparable to that reported by Hopyan et al. (2010) for 7- to 13-year-
50
old CI users (78% correct), who were tested on a similar task with a larger sample of similar
music excerpts.
Child CI users readily distinguished happy- from sad-sounding music although not with
the extraordinary accuracy shown by their hearing peers, who could capitalize on pitch structure
as well as tempo cues. Despite their limited auditory and musical exposure, young CI users’
ability to identify happy and sad emotions in samples of instrumental music devoid of amplitude
cues implies that (1) they would fare far better with real-world samples that feature amplitude as
well as tempo cues, including children’s music (Mote, 2011), and (2) cognitive or task factors
must underlie the inability of some 4-year-old hearing children to identify musical emotions
(Dalla Bella et al., 2001).
For child CI users, the association between duration of implant use and performance
collapsed across blocks was significant, r =. 49, N = 14, p = .038 (one-tailed), as it was in
Experiment 1, which is impressive once again in light of the small sample size. It is likely that
excellent cognitive skills in combination with auditory experience enabled some children to learn
which emotional labels are linked to which acoustic cues in speech and music. We also
correlated performance on the first block of trials with performance from Experiment 1
separately for the 18 NH children and the 12 CI users who participated in both experiments. For
the NH group, the correlation was not significant, p = .367, presumably because of high levels of
performance and little variation in either experiment. For the CI group, however, there was a
positive association (Figure 11), r = .51, N = 12, p = .043 (one-tailed). Although there are some
common cues to emotion in speech and music, such as pitch, tempo, and amplitude (Juslin &
Laukka, 2003), pitch cues to happiness and sadness were unlikely to have been useful for child
CI users, tempo cues were inconsistent in the speech samples of Experiment 1, and amplitude
cues were unavailable in the music excerpts.
51
CI-8, the only CI user who performed perfectly in the speech and music tasks, had the
typical profile of so-called “star” performers, which includes genetic, non-syndromic congenital
deafness (Kawasaki et al., 2006; Wu et al., 2008), well-educated and highly involved parents
(Geers and Brenner, 2003; Teagle and Eskridge, 2010), and, at 6.3 years of age, over 5 years of
implant experience. At the time of testing, moreover, he had been taking piano lessons for 2
years.
Figure 9. Performance of child CI users (Block 1) and NH children on happy and sad music
(Experiment 2). Error bars represent standard errors.
52
Figure 10. Performance of individual child CI users on happy and sad music (Experiment 2).
Performance is averaged across the 2 blocks (20 trials) and ordered from best to worst. Original
participant codes are preserved.
Figure 11. Performance on emotion identification in speech and music for the 12 CI users who
participated in both tasks. Performance on each task is averaged across the 2 blocks (36 and 20
trials, respectively).
53
Figure 12. Accuracy of emotion identification in speech (Experiment 1) as a function of years
of implant use. Performance is averaged across the 2 blocks (36 trials).
Figure 13. Accuracy of emotion identification in music (Experiment 2) as a function of years of
implant use. Performance is averaged across the 2 blocks (20 trials).
54
General Discussion
In two experiments, 4- to 7-year-old deaf children with bilateral CIs and age-matched NH
children identified happiness and sadness in speech and music in the context of a two-alternative
forced-choice task. CI users performed well above chance levels but significantly below their
hearing peers. The present findings with speech stimuli are in marked contrast to the very poor
performance of CI users in previous studies of emotion identification in speech (Luo et al, 2007;
Most & Aviner, 2009; Hopyan-Misaykan et al., 2009). Note, however, that listeners in those
studies were required to identify vocal emotions from four or more alternatives, some of which
had overlapping acoustic cues arising from similar arousal levels (e.g., happiness and anger).
Variations in pitch or intonation, which contribute to emotion identification, are more likely to
pose difficulty for CI users, but intonation differences are often accompanied by differences in
speaking rate and amplitude, which would be accessible to CI users. Although hearing listeners
place considerable reliance on pitch cues to emotion, CI listeners are likely to make greater use
of alternative cues to emotion in the speech signal. Young CI users’ ability to identify basic
vocal emotions such as happiness or sadness on the basis of incidental exposure suggests that
training in this realm could lead to enhanced perception and production of emotional prosody. In
light of the consequences of vocal emotion identification for socialization and well-being in
young CI users (Schorr, 2009), the addition of such training to the current habilitation agenda for
child CI users seems warranted.
The ability of 4- to 7-year-old CI listeners’ to differentiate happy from sad classical
music excerpts extends the findings of Hopyan et al. (2011) to younger children with lesser
implant experience and adds to the growing literature on the accessibility of music to
prelingually deaf implant users. Although the present findings indicate that young CI users can
discern the emotional intentions expressed in musical excerpts, they shed no light on the
55
emotional consequences of music for CI users. There are indications that young CI users enjoy
music (Mitani et al., 2007; Stordahl, 2002; Vongpaisal et al., 2006), but it remains to be
determined whether they experience changes in arousal and mood comparable to those
experienced by individuals with normal hearing (e.g., Balkwill & Thompson, 1999; Husain,
Thompson, & Schellenberg, 2002). There are reports that music training results in improved
speech perception (Moreno et al., 2009), executive function (Kraus & Chandrasekaran, 2010;
Degé, Kubicek, & Schwarzer, 2011), and general cognitive functioning (Schellenberg, 2004) in
NH children. Music training may have even greater benefits for children with CIs. This
possibility awaits further research.
56
Study 3: Pitch and Timing Cues in Child Implant Users’ Recognition of Familiar Melodies
Abstract
The goal of the present study was to ascertain whether prelingually deaf children with bilateral
cochlear implants and a control sample of children with normal hearing could use pitch or timing
cues exclusively or in combination to identify familiar melodies. In the three conditions of
principal interest, children were required to identify the melody from the theme songs of TV
shows that they watched regularly on the basis of musical excerpts that preserved (1) the relative
pitch and timing cues but not the original instrumentation, (2) timing cues only (rhythm and
tempo), and (3) relative pitch cues only (pitch contour and intervals). The performance of child
implant users was well above chance levels and comparable to that of children with normal
hearing, except on the pitch-only condition where they performed at chance levels. This is the
first demonstration that young implant users and normally hearing children can identify familiar
music on the basis of timing cues alone.
57
Introduction
Melodies are defined by relations between their successive pitches (melodic contour and
intervals) and by their temporal organization (meter and rhythm). Adults with normal hearing
(NH) rely primarily on pitch patterns and secondarily on rhythm when identifying songs in an
open-set task (Hébert & Peretz, 1997). The situation is different for listeners with electric rather
than acoustic hearing. Cochlear implants (CIs) were designed to facilitate deaf individuals’
access to speech, which is coded as amplitude variation over time (Loizou, 1998, Smith et al.,
2002). Information about the pitch patterns in speech (i.e., intonation) and music is largely
transmitted by temporal fine structure, which is absent from the input provided by cochlear
prostheses. The result is severe degradation of the pitch and spectral information available to CI
listeners (Gates & Miyamoto, 2003; Geurts & Wouters, 2001; Loizou, 1998; Smith et al., 2002),
with adverse consequences for melodic processing.
Although fundamental frequency, or pitch, variations in speech are relatively large
(Fitzsimmons, Sheahan, & Staunton, 2001) and specific pitch relations are not prescribed, the
perception of intonation is challenging for individuals with CIs (Chatterjee & Peng, 2008;
Meister et al., 2009; Most & Peled, 2007; Peng, Tomblin, & Turner, 2008). By contrast, music
typically moves in small pitch steps and precise pitch relations are prescribed (Vos & Troost,
1989). It is not surprising, then, that melodic processing is even more challenging for CI users
(Drennan & Rubinstein, 2008; Loizou, 1998; McDermott, 2004). Limited pitch resolution cannot
fully account for these difficulties. With isolated or repeating tones and same-different tasks,
some CI users detect pitch changes that are less than a semitone (Vongpaisal, Trehub, &
Schellenberg, 2006), but they are unable to differentiate brief melodies or tone sequences that
differ by one or two semitones (Cooper, Tobey, & Loizou, 2008; Galvin, Fu, & Nogaki, 2007;
58
Vongpaisal et al., 2006). Moreover, their ability to rank one pitch as higher or lower than another
typically requires differences of four or more semitones (Gfeller et al., 2007; Sucher &
McDermott, 2007), which implies that the sensations arising from melodies may be markedly
different for CI and NH listeners, perhaps involving timbre rather than pitch variations
(McDermott, 2004; Moore & Carlyon, 2005).
In contrast to pitch patterning cues in speech and music, timing cues are more readily
available in the input provided by CIs. Child CI users differentiate same-gender talkers on the
basis of subtle timing differences in articulation and global differences in speech rhythm and
speaking rate (Vongpaisal et al., 2010). Adult CI users’ ability to perceive musical tempo and
rhythm is thought to be comparable to that of individuals with normal hearing (Kong et al., 2004;
Cooper et al., 2008; Gfeller & Lansing, 1991; Gfeller et al., 1997). CI users’ recognition of
melodies is considerably poorer than that of NH listeners, not only in the absence of timing cues
(e.g., Nimmons et al., 2007) but also in their presence (Gfeller et al., 2002, 2005; Stordahl, 2002;
Vongpaisal et al., 2006, 2009), even though they derive clear benefit from the preservation of
rhythm cues (Hsiao, 2008; Kong et al., 2004; Stordahl, 2002). In sum, the available evidence
indicates that timing makes a more substantial contribution to music recognition in CI listeners
than it does for NH listeners.
Although timing cues are critical for melody recognition by CI users, it is unclear
whether such cues are sufficient for melody recognition in this population. In addition, relatively
little is understood about the contribution of pitch patterning to CI listeners’ long-term
representations of familiar music because studies comparing melody recognition with and
without timing cues preserve the original pitch patterns in both versions (Kong et al., 2004;
59
Galvin et al., 2007; Nimmons et al., 2007; Hsiao, 2008). In other words, no study compelled CI
users to rely entirely on timing cues, as would be necessary for patterns with unchanging pitch.
Some pitch patterns are meaningful to CI users both in speech and in music. CI users’
modest success in differentiating Cantonese lexical tones (Barry et al., 2002), some of which
contrast in pitch contour, implies that pitch contour processing is possible to some extent.
Moreover, adult CI users benefit from training on contour discrimination (Galvin et al., 2007),
which implies that limitations of the prosthesis for processing pitch patterns may be overstated.
Because of the role of music in the lives of young children (Hallam, 2010; Kirschner &
Tomasello, 2010; Trehub, 2003; Trehub, Hannon, & Schachner, 2010), it is important to
ascertain the cues that child CI users can use for music recognition, with the long-range goal of
enhancing their access to music in everyday contexts. Unfortunately, young CI children and even
NH children make poorer use of available cues than do older children and adults (Stalinski,
Schellenberg, & Trehub, 2008; Vongpaisal et al., 2006).
Our goal in the present study was to evaluate the ability of young CI and NH listeners to
use pitch or timing cues exclusively or in combination to identify familiar melodies. Previous
research demonstrated that child CI users could identify the theme songs of television programs
that they watched regularly (Mitani et al., 2007, Vongpaisal et al., 2009). In some cases, child CI
users could identify the music only when all original cues, instrumental and vocal, were intact
(Mitani et al., 2007). In others, they identified the music more poorly on instrumental and
monophonic flute versions than on the original versions, but their performance was above chance
levels for all versions (Vongpaisal et al., 2009). In the aforementioned studies of TV-song
identification, the original timing cues were preserved and CI children’s performance was
significantly worse than that of NH children. The discrepant performance across the two studies
60
may be attributable, in part, to age and correlated cognitive differences. The CI children who
identified vocal/instrumental versions only (Mitani et al., 2007) averaged 6.5 years of age (range
of 4-8 years), in contrast to an average of 8.4 years (range of 4.7-11.7 years) for those who also
identified instrumental and melody versions (Vongpaisal et al., 2009). To minimize the cognitive
demands on the present CI participants whose average age was 6 years, the current identification
task involved a closed set of two alternatives rather than the three or four alternatives used in the
earlier studies.
Bilateral CI users 5-7 years of age and a control sample of NH listeners listened to theme
songs from familiar television programs in various conditions. Three conditions were of
principal interest. In one, the melody was presented intact, with pitch and temporal patterns
preserved. In a second condition, the original tempo and rhythm were preserved but all pitch
cues were removed by using a percussion instrument with unvarying pitch. In a third condition,
the relative pitch patterns (i.e., melodic contour and intervals) were preserved but timing cues
were removed by having all notes (and inter-onset intervals) of equal duration. In all three
conditions, all notes were of equal amplitude. Isochronous versions of familiar melodies are
sometimes created by replacing long-duration notes with repeated short-duration notes (Kang et
al., 2009; Nimmons et al., 2008), which eliminates rhythmic or grouping cues but preserves
some metrical cues. In the present study, the intact melodies and altered versions had the exact
same number of notes. Finally, to ensure that children in the present study could identify the
theme songs in their original form, even without lyrics, original and instrumental versions like
those in previous research (Mitani et al., 2005; Vongpaisal et al., 2009) were also included.
61
Method
Participants. The participants included eight bilateral CI users (4 girls and 4 boys, M =
6.2 years, SD = 0.7; range: 5.1-7.2) who were recruited from a large metropolitan area (for
background information, see Table 8). One child had progressive hearing loss from birth and
seven were congenitally or prelingually deaf. All participants used Nucleus 24 Contour and/or
Nucleus Freedom Contour Advance implants programmed with the Advanced Combination
Encoder (ACE) processing strategy, and they all had at least 4 years of implant experience (M =
5.0 years; SD = 0.6; range: 4.0−5.9). When tested with their implants, absolute thresholds for
tones indicated access to speech sounds at normal conversational levels (10-30 dB HL). All CI
children participated in auditory-verbal therapy for at least 2 years after implantation. They also
communicated exclusively by auditory-oral means and were in age-appropriate school classes
with their NH peers. Parents of the CI participants provided information about their children's
musical involvement. At the time of testing, participant CI-8 had been taking private piano
lessons for approximately 2 years, and participant CI-2 for approximately 4 months. Participants
CI-11 and CI-12 had no formal musical training, but they were participating in extracurricular
choral activities at their respective schools. The rest of the CI children were not involved in any
extracurricular musical activities, but were a part of the regular school arts program. A
comparison sample consisted of 16 NH children from the community, roughly matched to the CI
participants by hearing age (M = 5.1 years, SD = 0.6, range: 4.3-6.3). No NH child had a
personal or family history of hearing problems, and all were free of colds on the day of testing.
62
Table 8. CI participants: Background information. Participant codes are preserved across studies.
Participant Gender Age at test
(years)
Age at 1stand 2ndCI
activation
Etiology
CI-2
CI-3
CI-4
CI-5*
CI-7
CI-8
CI-11
CI-12
M
M
F
F
M
M
F
F
5.7
5.5
6.8
7.2
5.8
6.3
5.1
6.9
0.8; 1.7
1.1; 1.1
1.0; 3.6
2.5; 4.0
0.9; 1.8
0.8; 1.5
1.1; 1.1
1.0; 3.5
Genetic
Genetic
Genetic
Unknown
Genetic
Genetic
Genetic
Unknown
* progressive hearing loss from birth
Apparatus and Stimuli. Testing took place in a double-walled sound-attenuating booth,
either at a university laboratory or a comparable facility at a major children’s hospital, according
to the convenience of parents. A computer workstation and amplifier (Harman/Kardon HK3380)
outside the university booth were connected with a 17-inch touch-screen monitor (Elo LCD
Touch Systems) and two high-quality loudspeakers (Electro-Medical Instrument Co.) inside the
booth. At the hospital, a GSI 61 two-channel clinical audiometer (Grason-Stadler Instruments)
replaced the amplifier. In both locations, the loudspeakers were placed at 45 degrees azimuth to
63
the participant, with the touch-screen monitor directly in front of the participant. An interactive
computer program (customized for Windows XP) presented stimuli and recorded response
selections when the participant touched the screen. A portable keyboard was available to the
experimenter in case young children preferred to make their selections by pointing to a picture
rather than touching the screen. All stimuli were played at a comfortable sound level of
approximately 65 dB SPL.
Table 9. Key, pitch range and tempo of melodies extracted from the TV-show theme songs.
* songs not chosen by CI children
The 40 stimuli consisted of 8 musical excerpts, with each excerpt presented in 5 different
versions: original, instrumental, melodic, timing-only, and pitch-only. The originals were taken
directly from theme songs played at the beginning of popular children’s TV programs (Table 9)
Show/Song Key Pitch range (semitones, Hz) Tempo
(BPM)
Dora the Explorer C major C5-A5 (523-880), instrumental
and melodic
C4-A4 (262-440), pitch-only
107
Diego E major D#4-C#5 (311-554) 118
Backyardigans D major D4-D5 (293-587) 95
Franklin D major F#4-F#5 (369-738) 94
Hannah Montana Db major Bb3-Bb4 (233-466) 124
Suitelife on Deck C major C4-A4 (261-440) 108
Blues’ Clues* E major C#4-B4 (277-493) 107
Wiggles* A major D#4-C5 (311-523) 95
64
by re-recording the audio track as digital sound files. Instrumental and melodic versions were
created by a professional musician in a recording studio. In the instrumental versions, the
original vocal portions (i.e., the sung melody with lyrics) were replaced by a synthesized flute,
and the accompaniment duplicated the timbre and timing of the original recordings, as in Mitani
et al. (2007), Nakata et al. (2005), and Vongpaisal et al. (2009). The melodic versions consisted
of the same flute melodies in the original tempo and key but without instrumental
accompaniment (Table 9). The synthesized flute melody in the instrumental and melodic
versions of one song (from Dora the Explorer) was approximately one octave higher than the
other melodies. Timing-only and pitch-only versions were created with Finale 2009 software
(MakeMusic Inc., 2008) and converted to digital audio files.
Examples of melodic, timing-only, and pitch-only versions are depicted in Figure 14. The
timing-only versions, rendered in Wood Blocks timbre (selected from the Musical Instrument
Digital Interface, or MIDI, Instruments list), preserved the tempo and rhythmic structure of the
original melodies without reference to pitch. A meter track—rendered in a different timbre (Bass
Drum, MIDI)—provided a regular accompanying beat. The pitch-only versions, rendered in a
synthetic flute timbre in the original key, preserved the original intervals between successive
tones. All songs in this condition were presented in a similar pitch register. As a result, the pitch
level of one song (from Dora the Explorer) was one octave lower than its melodic version.
Moreover, all tones for each excerpt were of equal duration, and the tempo was normalized (to
90 beats per minute) across excerpts. Notably, in these versions, long-duration notes in the
originals were not represented by short-duration notes. Instead, the long-duration notes were
shortened to match all other note durations. This manipulation resulted in a disruption of the
original tempo, rhythm, and meter, in effect leaving no distinctive timing cues. Excerpts in the
original, instrumental, and timing-only conditions were approximately 15 s in duration. Because
65
of the substitution of short-duration notes for long-duration notes in the pitch-only condition,
those excerpts were approximately 10 s in duration.
Procedure. Participants were tested individually. At their request, a parent was present
in the booth with some CI participants. Parents were permitted to assist with explanations when
the task was initially described to the child and during practice trials, but they did not interact
with the child in any way once the test phase began. Prior to the test session, the experimenter
asked the child and parent to choose two TV shows most familiar to the child from the eight that
were available. Before the first trial, pictorial representations of the two shows appeared
simultaneously on the computer monitor, and each child responded accurately when the
experimenter pointed to each picture in turn and asked, “Who’s that? Tell me.” Children were
told that they were going to hear songs from the two TV shows, and that they were to indicate,
“which show the song belongs to” by touching one of the pictures on the screen. The stimuli
were presented in five blocks, corresponding to the five conditions. Presentation was in fixed
order — the original versions first, followed by the instrumental, melodic, timing-only, and
pitch-only versions. Each block was preceded by two practice trials. Before each trial, children
heard pre-recorded instructions (“Listen to the music! Who’s that? Show me!”) spoken by a
woman in a child-directed manner. Stimuli within each block (2 shows X 5 repetitions of each
song) were presented randomly for a total of 10 trials per block. After listening to each stimulus,
participants responded by touching the picture corresponding to the presumed show. They were
free to respond as soon as they recognized the music. Children received feedback after each trial
(including practice trials)—a smiley face for correct responses and a blank screen for incorrect
responses.
66
Figure 14. Examples of the melodic, timing-only and pitch-only conditions for two TV-show
theme songs: “Backyardigans” and “Diego”.
Melodic:
Timing-only:
Pitch-only:
67
Results
Preliminary analyses compared performance in each condition with chance levels (i.e., 5 correct
on 10 trials, with 2 response options per trial) on the two-alternative forced-choice task
separately for both groups of children. For the NH group, one-sample t-tests confirmed that
performance exceeded chance levels in each instance, p < .0001. For the CI group, performance
was above chance in the original, instrumental, and melodic conditions (p < .0001). In the
timing-only condition, the difference from chance approached significance, t(7) = 2.26, p < .06.
In the pitch-only condition, CI children’s accuracy did not exceed chance levels. The
performance of child CI users and NH listeners is depicted in Figure 15.
Figure 15. Performance of child CI users and NH listeners. Error bars represent standard errors.
We first verified that all children could recognize the two target songs, with and without
the lyrics, by examining performance on the original and instrumental conditions. Performance
68
in these conditions was unrelated theoretically to our principal question about melody
perception. A two-way mixed-design analysis of variance (ANOVA) examined identification
accuracy as a function of one between-subjects factor (group: CI or NH) and one within-subjects
factor (condition: original or instrumental). Because performance approached ceiling levels for
both groups in both conditions, neither main effect was significant, p > .05, and there was no
two-way interaction, p > .1. Rather, consistently high levels of performance (> 91% correct)
confirmed that both groups could recognize the songs even without lyrics, which legitimized our
subsequent tests of melody recognition.
The principal analyses examined performance differences among the melodic, timing-
only, and pitch-only conditions. Because the assumption of sphericity was violated, p < .05, we
used a repeated-measures multivariate analysis of variance (MANOVA) with condition as a
repeated measure and group as a between-subjects variable. Although there was no main effect
of group, p > .1, the main effect of condition was significant, F(2, 21) = 11.48, p < .001, as was
the two-way interaction between condition and group, F(2, 21) = 6.57, p < .01. Follow-up tests
revealed that the two groups did not differ in the melodic, p > .3, or timing-only, p > .1,
conditions, but the NH group outperformed the CI group in the pitch-only condition, t(22) =
2.24, p < .05. Alternative analyses compared differences between conditions separately for the
two groups. The CI group performed better in the melodic condition than in either the timing-
only condition, t(7) = 2.59, p < .05, or the pitch-only condition, t(7) = 4.25, p < .005, which did
not differ, p > .50. By contrast, performance of the NH group did not differ across all three
conditions, ps > .40.
Examination of individual performance (see Figure 16) revealed a more complex picture.
Bearing in mind that the probability of guessing 8 or more answers out of 10 correctly is less
than 5% (binomial test), only 3 CI participants (CI-5, CI-7, and CI-11) actually performed at
69
chance levels in both timing-only and pitch-only conditions, with the youngest participant, CI-
11, demonstrating the poorest accuracy. Three child CI users achieved perfect (CI-3 and CI-4) or
near-perfect (CI-2) accuracy in the timing-only condition, but were at chance in the pitch-only
condition. In contrast, participant CI-12 was error-free in the pitch-only condition but performed
very poorly in the timing-only condition. Participant CI-8 performed reasonably, albeit modestly
(80%), in the pitch-only condition, but only achieved 70% accuracy in the timing-only condition.
With the worst performer (CI-11) excluded, CI children’s scores in these two conditions were
negatively correlated, r = -.73, at levels approaching significance, p = .06, suggesting the
possibility of a “trade-off” between the use of timing and pitch cues by CI children.
Figure 16. Performance of individual CI children. Original participant codes are preserved.
Despite the discrepancies in performance, the individual data confirm that the pitch-only
condition generally presented greater problems for CI listeners than did the timing-only
condition. In contrast, the majority of NH children performed comparably well in both
conditions, and only three demonstrated a comparable “trade-off” between rhythm and pitch. We
70
did not systematically document the strategies used by CI participants, but participant CI-2
commented that the difference in tempo was helpful in the timing-only condition (“This guy was
faster than that guy”), and participant CI-4 reported linking the rhythm in the timing-only
condition to the lyrics (“counted where the words were supposed to be”).
Figure 17. Performance of individual NH children in the timing-only and pitch-only conditions.
Discussion
The goal of the present investigation was to ascertain whether 5- to 7-year-old children
with bilateral CIs could identify familiar TV songs from pitch or timing cues alone or in
combination. First, we established that young CI users were highly accurate at identifying the
original vocal/instrumental versions as well as versions that preserved the original
instrumentation without the lyrics. These results confirm earlier findings in some respects
(Mitani et al., 2007; Vongpaisal et al., 2009). Unlike previous findings, however, the
71
performance levels of young CI users matched those of NH children, and the instrumental
versions were identified as successfully as the original versions. Undoubtedly, the reduced
cognitive demands of the present task, which featured two alternative responses rather than the
three or four in previous studies, contributed to the exceptionally high performance levels.
Of principal interest were the melodic condition, which provided pitch and timing cues,
the timing-only condition, which provided timing cues but no pitch cues, and the pitch-only
condition, which provided pitch cues but no timing cues. As was the case for NH children, child
CI users performed comparably on the melodic condition and on the original and instrumental
conditions (see Figure 17). In absolute terms, moreover, child CI users actually performed better
than NH children in the melodic condition. In previous research, Japanese children of similar age
were unable to recognize TV songs from comparable melodic cues (Mitani et al., 2007), and
older Canadian children could do so but they performed more poorly on melodic versions than
on the original versions. In both cases, child CI users exhibited a substantial decrement in
performance when the cues available at test were different from the cues at original exposure
(i.e., while watching TV at home). Those findings were attributed to CI children’s less robust
representation of the music than NH children, who were less affected by the elimination of
timbre and texture cues. With the minimal cognitive demands of the current two-alternative task,
children’s performance was unaffected by such changes, which implies that child CI users’
representation of music is more general than previously envisioned.
The finding that children with CIs and NH controls could identify familiar music on the
basis of timing cues alone is the unique contribution of the present study. In fact, the
performance of the CI and NH groups did not differ significantly on this task, which is consistent
with the results from studies of rhythm perception in adult CI users (Cooper et al., 2008; Gfeller
72
et al., 1991, 1997; Kong et al., 2004). It is also consistent with child CI users’ reliance on timing
cues to differentiate one talker from another (Vongpaisal et al., 2010). In principle, child CI users
and NH controls could have used tempo in addition to rhythm cues to identify the timing-only
patterns, but the extent to which they did so remains unclear. Undoubtedly, children would have
much greater difficulty identifying timing-only versions from three or more alternatives and they
would be entirely unsuccessful on an open-set task. Adults with normal hearing correctly name
only about 5% of highly familiar songs from timing cues alone, 50% from pitch cues alone, and
90% from combined pitch and timing cues (Hébert & Peretz, 1997). Interestingly, the pitch-only
versions that adults cannot name sound familiar to them, in contrast to the timing-only versions,
which do not.
Child CI users’ performance differed significantly from that of NH children only on the
pitch-only versions, where their performance was at chance levels. Their failure to identify songs
on the basic of pitch cues alone might lead one to conclude that they rely entirely on timing cues.
That interpretation is not borne out by child CI users’ significantly better performance on the
melody versions, which had pitch and timing cues, than on the timing-only versions, which had
timing alone. The implication is that child CI users derived some benefit from pitch cues.
Children whose program selections included Dora the Explorer could have used pitch register
cues instead of or in addition to pitch contour cues in the melody condition but not in the pitch-
only condition.
Although the overall performance of child CI users was at chance levels on the pitch-only
versions, CI-12 achieved error-free performance on this and other versions except for the timing-
only version, on which she performed poorly. Instead of musical pitch cues being inaccessible to
child CI users because of device limitations, these cues may be weak or of relatively low
73
salience. Perhaps the effective salience of pitch cues could be enhanced by training, which would
also have implications for the perception of speech in noise.
The tendency for child CI users who performed well on the timing-only versions to
perform poorly on the pitch-only versions implies that they were relatively inflexible in their
listening strategies, unlike NH children, who readily switched from one strategy to another,
depending on the task at hand. For children with CIs, listening in general is likely to be more
effortful or cognitively demanding than it is for NH children, with listening to music being
particularly effortful. One consequence may be the use of similar listening strategies across
disparate contexts, even when those strategies are ineffectual.
In short, the present findings suggest that CI users who receive their implants early have
more complex representations of music than one would predict based on previous research
(Mitani et al., 2007; Vongpaisal et al., 2009). These representations may include precise
information about timing and coarser information about pitch contour and pitch register. Further
research with a larger sample is necessary to establish the contribution of demographic and
experiential factors to music recognition and the links between music and speech perception.
Finally, in light of the increasing links that have been identified between music and well-being
(Hanser, 2010) and between musical and non-musical skills (Degé et al., in press; Kraus &
Chandrasekaran, 2010; Moreno et al., 2009; Schellenberg, 2004; Wong, Skoe, Russo, Dees, &
Kraus, 2007), it is important to ascertain the extent to which music perception in child CI users
can be improved with limited intervention.
Supplementary Comments
This thesis examined the ability of bilateral child CI users who were 4-7 years of age to perceive
speech and music in optimal circumstances. In Study 1, the children were asked to differentiate
74
talkers contrasting in gender and age. In Study 2, they were required to identify affective
intentions (happy or sad) in speech and music, and in Study 3, they attempted to identify familiar
melodies. In previous studies, these types of tasks posed substantial problems for adults and
children with CIs (Cleary et al., 2005; Fu et al., 2004, 2005; Hopyan-Misakyan et al., 2009; Luo
et al., 2007; Vongpaisal et al., 2006, 2009), and the problems were attributed to intrinsic
limitations of implants for pitch and spectral processing (Loizou, 1998; Smith et al., 2002).
Unquestionably, such tasks pose difficulty for CI users, but do they pose insurmountable
difficulty for all CI users? The approach, in the present study, was to evaluate a small,
advantaged sample of prelingually deaf children, with the goal of shedding light on the potential
of young CI users in contrast to the usual approach of focusing on typical or average
performance in this population.
Overall, the performance of this selective group of child CI users surpassed that of adult
and child CI users in previous investigations, except for the identification of emotion in music. In
that instance, the present CI users performed equivalently to older CI users (7-13 years) on a
similar task (Hopyan et al., 2011). Remarkably, the performance of the present child CI users did
not differ significantly from that of NH children on a number of tasks. For example, child CI
users in Study 1 identified the gender and age of talkers (man, woman, or girl) with comparable
accuracy to that shown by NH children. Moreover, CI users in Study 3 were as accurate as their
NH counterparts in identifying familiar melodies when relative pitch and timing cues were
available. On other tasks involving the identification of familiar talkers (Study 1, Experiment 2),
emotion in speech and music (Study 2), and familiar melodies from relative pitch cues alone
(Study 3), the performance of CI users did not equal that of their NH peers. In each case,
however, one or more child CI users performed as well as NH children, sometimes achieving
error-free performance. Regardless of the specific factors that underlie the success of these CI
75
users, which are as yet undermined, one thing is clear. The poor performance of CI users in other
studies of talker, emotion, and melody identification cannot be attributed to device limitations
alone. After all, no cues were available to the present children beyond those provided by their
implants.
Aside from the advantageous circumstances of the present sample of child CI users—
early implantation and committed, well-educated parents, among others—the specific tasks in
the present study optimized children’s performance by using highly engaging speech or musical
stimuli (excerpts of classical music being one exception), closed-set tasks and, in some cases,
feedback. The findings are consistent with the view that timing cues are critical for CI users
when differentiating talkers (Vongpaisal et al., 2010), melodies (Hsiao, 2008; Kong et al., 2004;
Stordahl, 2002), and emotions (Hopyan-Misakyan et al., 2009), in contrast to NH listeners, who
rely primarily on pitch and spectral cues for those purposes (Nimmons et al., 2007; Scherer,
2003; Van Lancker et al., 1985). For example, children with CIs identified familiar melodies
when timing cues were available but not otherwise. On the whole, their performance was more
similar to that of NH children when timing cues were consistent (e.g., talker identification,
melody identification) rather than inconsistent (e.g., emotion identification in speech,
identification of isochronous melodies), with one notable exception (emotion identification in
music).
An important question is whether child CI users relied exclusively on timing cues in
performing the various tasks in the present study. This possibility seems unlikely for a variety of
reasons. For one thing, all CI users identified the male talker correctly from the very first trial,
before receiving any feedback, and several CI users did likewise for the woman and girl. It is
plausible that they capitalized on voice quality and pitch register cues from their everyday
76
experience with male, female, and child talkers. The implausible alternative is that they had clear
expectations about articulatory timing or speaking rate for various classes of talkers. In future
research, young CI users’ sensitivity to the spectral attributes of talker identity could be
examined directly by using temporally reversed speech samples (e.g., Sheffert et al., 2003) that
retain pitch and voice quality cues while removing phonetic and articulatory timing cues.
Child CI users’ successful identification of happy and sad utterances in the absence of
timing cues (Study 2, Experiment 1) or feedback also raises the possibility that they relied, to
some extent, on pitch variability or pitch contour. This evidence is less compelling, however,
because amplitude cues were also available. The contribution of pitch-related cues could be
established definitively by controlling amplitude cues in future research. In any case, the fact that
several CI users achieved perfect or near-perfect accuracy on this task implies that they were
using knowledge gained from everyday listening experience.
Finally, CI users’ identification of familiar melodies was significantly more accurate when
pitch and timing cues were available rather than timing cues alone, suggesting that pitch contour,
pitch range, or pitch level played some role. Moreover, although CI users as a group performed
at chance levels at identifying melodies on the basis of pitch cues alone, two children with CIs
were successful on this task, indicating that pitch relations are perceptible to some listeners with
electrical hearing.
The relatively high performance levels in the present investigation and the exceptional
performance of some individuals suggest that the potential of CI users with respect to talker
discrimination, emotion differentiation in speech and music, and melody identification has been
underestimated. The findings suggest, moreover, that habilitation or training efforts in these
77
domains should be pursued. Such training could have direct benefits in the trained domains as
well as potential transfer to other domains.
A much larger sample than that of the present investigation would be necessary to
ascertain the background factors linked to enhanced or compromised performance. Nevertheless,
a number of children participated in several tasks, so it is possible to review their performance
across tasks and speculate about possible combinations of background factors that may have
enabled the top performers to realize their potential as implant users. In fact, 11 CI users
participated in 4 or more of 5 representative tasks from the present investigation — talker
classification, talker recognition recognition, vocal emotion identification, musical emotion
identification, and melody recognition — with 7 participants completing all tasks.
A cumulative score, expressed as percent (%) correct averaged across tasks, provided a
very rough estimate of individual CI users’ overall success. The cumulative score was based on
four or all five of the following five tasks, as available (see Table 10). Scores for tasks with more
than one block were calculated by averaging the participant’s % correct scores across blocks.
§ talker classification (3 blocks)
§ average talker recognition (1 block)
§ average vocal emotion identification (2 blocks)
§ average musical emotion identification (2 blocks)
§ melody recognition (unaccompanied melody with pitch and timing cues preserved; 1
block). The original and instrumental conditions of the music recognition task (Study 3)
were excluded from consideration because they were of secondary interest, used only for
purposes of replication. The timing-only and pitch-only conditions were excluded
78
because of performance at or below chance levels by more than half of CI participants,
rendering the relations among those scores meaningless.
The cumulative scores of 10 CI users ranged from 88% to 100%, with one child performing at
65% correct.
An examination of the children’s case histories, early communication assessments, and
reports from their auditory-verbal therapists revealed that few children had a completely
uneventful history (e.g., one or more problematic episodes with their implants). However, all
children appeared to have normal cognitive abilities, although this was not confirmed
definitively by psychometric assessment, and all were in age-appropriate classes in regular
schools. Standardized open-set speech perception tests, which were administered at regular
intervals as part of their clinical follow-up, were within the normal range of other CI users of
similar age and hearing history.
Electronically evoked auditory brain stem responses (EABRs) were collected as part of
an ongoing study on the effects of bilateral implantation on auditory development and plasticity.
Changes in EABR latencies with implant use provide an index of auditory brainstem maturation
in individuals with electrical hearing (Gordon et al., 2005; Thai-Van et al., 2007). In all
instances, the EABRs for the CI users in the present sample were age-appropriate, at least in the
better ear.
Socio-economic status (SES) of the families of these child implant users was estimated
from census data, based on median after-tax income of families with children in their
neighborhood and compared with the median after-tax income for families with children in the
province of Ontario (data from 2005 census, available online through Statistics Canada:
http://www12.statcan.gc.ca/census-recensement/2006/dp-pd/prof/92-597/index.cfm?Lang=E).
79
Based on this measure, participating families were in the mid-range of the Ontario
population. SES estimated by family income rather than educational attainment of parents may
underestimate SES in this sample because a number of mothers of CI children had chosen to
remain out of the workforce to optimize their child’s opportunities.
One factor that distinguishes the families in the present sample of child CI users from
those in the general population of child CI users is parental willingness to commit to the
demands of present research project. Completion of four or more tasks in the present
investigation required several laboratory visits, often after school and on weekends over and
above other research, medical appointments related to the implants, and auditory-verbal therapy.
As a result, the current sample consisted of a self-selected group of enthusiastic, well-informed,
and supportive middle-class parents whose children were doing well enough to motivate
participation in time-consuming research that had no direct benefits.
Although child CI participants performed well overall, they found some tasks more difficult than
others. One notable exception was participant CI-8, who was the only child to achieve a perfect
overall score. His etiology of genetic, non-syndromic congenital deafness, which was identified
early, has been associated with better speech outcomes than other etiologies of deafness
(Kawasaki et al., 2006; Wu et al., 2008). He had no notable health issues at birth or thereafter.
He was at or slightly above the mean age of CI participants at the time of participating in all
tasks. His relatively long experience as a CI user (5.3-5.5 years for various tasks) was
advantageous for him, as it was for the group as a whole, as reflected in significantly positive
associations between duration of implant use and performance on the emotion identification
tasks (Study 2). CI-8 received his initial implant before his first birthday and his second implant
some months later. As such, he avoided some of the adverse consequences associated with
80
longer delays between implants (Gordon et al., 2007). He never experienced technical problems
with either implant, which occurred sporadically for other child CI users in the present sample.
His parents were highly educated and relatively affluent, and they enrolled him in many
extracurricular activities, including piano lessons. English was used exclusively at home, and it
was the language of school and auditory-verbal therapy.
As noted, no assessments of intelligence, non-verbal or otherwise, were available for any
children in the sample. It was clear, however, that CI-8 grasped all tasks quickly, and he
exhibited highly focused attention and goal-directedness. He had all the hallmarks of a highly
intelligent, conscientious, and cooperative child. No other participant had a constellation of
background factors as favorable as that of CI-8. The second best performer was CI-4. Her
etiology of deafness, early diagnosis and implantation, CI experience, apparent cognitive ability,
and motivation were similar to CI-8’s. Clearly, it is foolhardy to search for a pattern of
background variables to account for the data from such a small sample. It is notable, however,
that the children with higher cumulative scores tended to have congenital, genetic non-syndromic
hearing loss whereas those with lower overall scores (including the poorest performer, CI-10)
differed in their onset and etiology of hearing loss. Four of the children with lower cumulative
scores were exposed to more than one language at home; for three, English was not the primary
language of the family. Several children (including CI-4) had experienced technical problems
with their implants, with one case (CI-2) necessitating re-implantation. Despite variations in age,
motivation, compliance, procedure, and family SES, most CI users achieved high levels of
performance that exceeded expectations based on the available literature. It remains to be
determined whether their stellar performance will be sustained over the long run.
81
Table 10. Cumulative scores and demographic profiles of the 11 CI users who completed four or
more tasks comprising the present investigation. C
ode
Cum
ulat
ive
scor
e (%
)
Hea
ring
lo
ss(o
nset
, et
iolo
gy)
Age
rang
e ac
ross
ses
sion
s (y
ears
)
Age
at
activ
atio
n 1
and
2 (y
ears
)
His
tory
of
prob
lem
s w
ith
CI u
se
Lan
guag
e at
ho
me
Est
imat
ed
inco
me,
pe
rcen
tile
CI-8 100 congenital, genetic
non-syndromic 6.1-6.3
0.8; 1.5
delay < 1 no English 86
CI-4 97.4 congenital, genetic
non-syndromic 5.8-6.8
1.0; 3.6
delay > 2 yes** English 51
CI-3 92.4 congenital, genetic
non-syndromic 5.3-5.5
1.1; 1.1
delay = 0 no Mandarin,
English 29
CI-11 92.3*
congenital, genetic
non-syndromic 4.8-5.1
1.1; 1.1
delay = 0 no English 65
CI-1 92.1*
progressive,
genetic non-
syndromic (?)
5.8-6.4 3.4; 3.4
delay = 0 no Dari,
English 73
CI-6 90.3*
congenital, genetic
non-syndromic 5.0-5.3
1.0; 4.6
delay > 3 no Mandarin,
English 86
CI-7 89.6 congenital genetic
non-syndromic 5.1-5.8
0.9; 1.8
delay < 1 yes** English 65
CI-2 88.8 congenital, Usher I
syndrome 4.8-5.7
0.8; 1.7
delay < 2 yes** English 65
CI-12 88.7 pre-lingual, life-
saving intervention
after birth
6.1-6.9 1.0; 3.5
delay > 2 no English,
Punjabi 51
CI-5 87.9 progressive, non-
genetic 6.6-7.2
2.5; 4.0
delay < 2 no English 56
CI-10 65.1*
progressive,
Mondini dysplasia 6.3-6.5
3.1; 6.3
delay > 2 no English 65
*Four tasks completed ** One side only
82
References
Bachorowski, J. A. (1999). Vocal expression and perception of emotion. Current Directions in
Psychological Science, 8, 53-57.
Balkwill, L.-L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of
emotion in music: Psychophysical and cultural cues. Music Perception, 17, 43-64.
Bartholomeus, B. (1973). Voice identification by nursery school children. Canadian Journal of
Psychology, 27, 464–472.
Barry, J. G., Blamey, P. J., Martin, L. F. A., Lee, K. Y.-S., Tang, T., Ming, Y. Y., & Van Hassel,
C. A. (2002). Tone discrimination in Cantonese-speaking children using a cochlear
implant. Clinical Linguistics and Phonetics, 16, 79-99.
Boersma, P., & Weenink, D. (2005). Praat: Doing phonetics by computer (Version 4.3.01)
[Computer program]. Retrieved from http://www.praat.org/
Chatterjee, M., & Peng, S.C. (2007). Processing F0 with cochlear implants: Modulation
frequency discrimination and speech intonation recognition. Hearing Research, 235,
143–156.
Chadha, N. K., Papsin, B. C., Jiwani, S., & Gordon, K. A. (2011). Speech detection in noise and
spatial unmasking in children with simultaneous versus sequential bilateral cochlear
implants. Otology and Neurotology. 32, 1057-1064
Cleary, M., Pisoni, D. B., & Kirk, K. I. (2005). Influence of voice similarity on talker
discrimination in children with normal hearing and children with cochlear implants.
Journal of Speech, Language, and Hearing Research, 48, 204–223.
83
Coletti, V., Carner, M., Miorelli, V., Guida, M., Coletti, L., & Fiorino F.G. (2005). Cochlear
implantation at under 12 months: Report on 10 patients. Laryngoscope, 115, 445-449.
Connor, C. M., Craig, H. K., Raudenbush, S. W., Heavner, K., & Zwolan, T. A. (2006). The age
at which young deaf children receive cochlear implants and their vocabulary and speech-
production growth: Is there an added value for early implantation? Ear and Hearing, 27,
628-644.
Cooper, R. P. & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after
birth. Child Development, 61, 1584-1595.
Cooper, W. B., Tobey, E., & Loizou, P. C. (2008). Music perception by cochlear implant and
normal hearing listeners as measured by the Montreal Battery for Evaluation of Amusia.
Ear and Hearing, 29, 618-626.
Cullington, H. E., & Zeng, F. G. (2008). Speech recognition with varying numbers and types of
competing talkers by normal-hearing, cochlear-implant, and implant simulation subjects.
Journal of the Acoustical Society of America, 123, 450-461.
Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the
affective value of tempo and mode in music. Cognition, 80, B1-B10.
Dammeyer, J. (2010). Psychosocial development in a Danish population of children with
cochlear implants and deaf and hard-of-hearing children. Journal of Deaf Studies and
Deaf Education, 15, 50-58.
Degé, F., Kubicek, C., & Schwarzer, G. (2011) Music lessons and intelligence: A relation
mediated by executive functions. Music Perception, 29, 195-201.
84
Drennan, W. R. & Rubinstein, J. T. (2006). Sound processors in cochlear implants. In S.B.
Waltzman & J.T. Roland (Eds.), Cochlear Implants (2nd Ed, pp. 40-47). New York:
Thieme Medical Publishers.
Drennan, W. R., & Rubinstein, J. T. (2008). Music perception in cochlear implant users and its
relationship with psychophysical capabilities. Journal of Rehabilitation Research and
Development, 45, 779-790.
Fagan, M. K., & Pisoni, D. B. (2010). Hearing experience and receptive vocabulary development
in deaf children with cochlear implants. Journal of Deaf Studies & Deaf Education, 15,
149-161.
Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and
Development, 8, 181-195
Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the
melody the message? Child Development, 60, 1497-1510.
Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic functions. In R.
Vasta (Ed.), Annals of child development (Vol. 8, pp. 43-80). London, UK: Jessica
Kingsley Publishers.
Fitzsimmons, M., Sheahan, N., & Staunton, H. (2001). Gender and the integration of acoustic
dimensions of prosody: Implications for clinical studies. Brain and Language, 78, 94–
108.
85
Friesen, L. M., Shannon, R.V., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a
function of the number of spectral channels: Comparison of acoustic hearing and cochlear
implants. Journal of the Acoustical Society of America, 110, 1150-1163.
Fu, Q. J., Chinchilla, S., Nogaki, G., & Galvin, J. J 3rd. (2005). Voice gender identification by
cochlear implant users: The role of spectral and temporal resolution. Journal of the
Acoustic Society of America, 118, 1711–1718.
Galvin, J. J., Fu, Q. J., & Nogaki, G. (2007). Melodic contour identification by cochlear implant
listeners. Ear and Hearing, 28, 302-319.
Gates, G. A. & Miyamoto, R. T. (2003). Cochlear implants. New England Journal of Medicine,
349, 421–423.
Geers, A. E. (2004). Speech, language, and reading skills after early cochlear implantation. Head
& Neck Surgery, 130, 634-638.
Geers, A. E. (2006). Spoken language in children with cochlear implants. In P. E. Spencer, & M.
Marschark (Eds.), Advances in the spoken language development of deaf and hard-of-
hearing children. Perspectives on deafness (pp. 244-270). New York: Oxford University
Press.
Geers, A. & Brenner, C. (2003). Background and educational characteristics of prelingually deaf
children implanted by five years of age. Ear and Hearing, 24, 2S-14S.
Geers, A., Brenner, C., & Davidson, L. (2003). Factors associated with development of speech
perception skills in children implanted by age five. Ear and Hearing, 24, 24S-35S.
86
Geers, A., Nicholas, J., & Moog, J. (2007). Estimating the influence of cochlear implantation on
language development in children. Audiological Medicine, 5, 262–273.
Geers, A. E., Nicholas, J. G., & Sedey, A. L. (2003). Language skills of children with early
cochlear implantation. Ear and Hearing, 24, 46S–58S.
Geurts, L., & Wouters, J. (2001). Coding of the fundamental frequency in continuous
interleaved sampling processors for cochlear implants. Journal of the Acoustical Society
of America, 109, 713-726.
Gfeller, K., Christ, A., Knutson, J. F.,Witt, S., Murray, K .T., & Tyler, R. S. (2000). Musical
backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant
recipients. Journal of the American Academy of Audiology, 11, 390-406.
Gfeller, K., & Lansing, C. R. (1991). Melodic, rhythmic, and timbral perception of adult
cochlear implant users. Journal of Speech and Hearing Research, 34, 916-920.
Gfeller, K., Olszewski, C., Rychener, M., Sena, K., Knutson, J. F., Witt, S., & and Macpherson,
B. (2005). Recognition of “real-world” musical excerpts by cochlear implant recipients
and normal-hearing adults. Ear and Hearing, 26, 237–250.
Gfeller, K., Turner, C., Mehr, M., Woodworth, G., Fearn, R., Knutson, J. F., Witt, S. and
Stordahl, J. (2002). Recognition of familiar melodies by adult cochlear implant recipients
and normal-hearing adults. Cochlear Implants International, 3, 29-53.
Gfeller, K., Turner, C., Oleson, J., Zhang, X., Gantz, B., Froman, R., & Olszewski, C. (2007).
Accuracy of cochlear implant recipients on pitch perception, melody recognition and
speech reception in noise. Ear and Hearing, 28, 412-423.
87
Gfeller, K., Woodworth, G., Robin, D. A., Witt, S., & Knutson, J. F. (1997). Perception of
rhythmic and sequential pitch patterns by normally hearing adults and cochlear implant
users. Ear and Hearing, 18, 252-260.
Gilley, P. M., Sharma, A., & Dorman, M. F. (2008). Cortical reorganization in children with
cochlear implants. Brain Research, 1239, 56-65.
Gordon, K. A., Tanaka, S., & Papsin, B. C. (2005). Atypical cortical responses underlie poor
speech perception in children using cochlear implants. Neuroreport, 16, 2041-2045.
Gordon, K. A., Valero, J., & Papsin, B. C. (2007). Auditory brainstem activity in children with
9-30 months of bilateral cochlear implant use. Hearing Research, 233, 97-107.
Hallam, S. (2010). Music education: The role of affect. In P. N. Juslin & J. A. Sloboda (Eds.),
Handbook of music and emotion: Theory, research, applications (pp. 791-817). New
York: Oxford University Press.
Hanser, S. B. (2010). Music, health, and well-being. In P. N. Juslin & J. A. Sloboda (Eds),
Handbook of music and emotion: Theory, research, applications (pp. 791-817). New
York: Oxford University Press.
Hébert, S., & Peretz, I. (1997). Recognition of music in long-term memory: Are melodic and
temporal patterns equal partners? Memory and Cognition, 25, 518-533.
Holt, R. F., & Svirsky, M. A. (2008). An exploratory look at pediatric cochlear implantation: is
earliest always best? Ear and Hearing, 29, 492-511.
88
Hopyan, T., Gordon, K. A., & Papsin, B. C. (2011). Identifying emotions in music through
electrical hearing in deaf children using cochlear implants. Cochlear Implants
International, 12, 21-26.
Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., & Papsin, B. C. (2009). Recognition of
affective speech prosody and facial affect in deaf children with unilateral right cochlear
implants. Child Neuropsychology, 15, 136-146.
Hsiao, F. (2008). Mandarin melody recognition by pediatric cochlear implant recipients. Journal
of Music Therapy, 45, 390-404.
Hunter, P. G., & Schellenberg, E. G. (2010). Music and emotion. In M. R Jones, R. R. Fay, & A.
N. Popper. Music perception (pp. 129-164). New York: Springer.
Hunter, P. G., Schellenberg, E. G., & Stalinski, S. M. (2011). Liking and identifying emotionally
expressive music: Age and gender differences. Journal of Experimental Child
Psychology, 110, 80-93.
Husain, G., Thompson, W. F., & Schellenberg, E. G. (2002). Effects of musical tempo and
mode on arousal, mood, and spatial abilities. Music Perception, 20, 151-171.
Johnston, C. J., Durieux-Smith, A., Angus, D., O’Connor, A., & Fitzpatrick, E. (2009). Bilateral
paediatric cochlear implants: A critical review. International Journal of Audiology, 48,
601-617.
Johnstone, T., & Scherer, K. R., 2000. Vocal communication of emotion. In M. Lewis & J.
Haviland (Eds.), Handbook of emotion (2nd Ed, pp. 220–235). New York: Guilford.
89
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music
performance: Different channels, same code? Psychological Bulletin, 129, 770-814.
Kawasaki, A., Fukushima, K., Kataoka, Y., Fukuda, S., & Nishizaki, K. (2006). Using
assessment of higher brain functions of children with GJB2-associated deafness and
cochlear implants as a procedure to evaluate language development. International
Journal of Pediatric Otorhinolaryngology, 70, 1343-1349.
Kirschner, S. & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-
year-old children. Evolution and Human Behavior, 31, 354-364.
Kong, Y.-Y., Cruz, R., Jones, J. A., & Zeng, F.-G. (2004). Music perception with temporal cues
in acoustic and electric hearing. Ear and Hearing, 25, 173-185.
Kovačić, D., & Balaban, E. (2009).Voice gender perception by cochlear implantees. Journal of
the Acoustical Society of America, 126, 762–775.
Kovačić, D., & Balaban, E. (2010).Hearing history influences voice gender perceptual
performance in cochlear implant users. Ear and Hearing, 31, 806-814.
Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills.
Nature Reviews Neuroscience, 11, 599-605.
Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Experience-induced malleability in
neural encoding of pitch, timbre, and timing. Annals of the New York Academy of
Sciences, 1169, 543-557.
90
Lander, K., Hill, H., Kamachi, M., & Vatikiotis-Bateson, E. (2007). It's not what you say but the
way you say it: Matching faces and voices. Journal of Experimental Psychology: Human
Perception and Performance, 33, 905-914.
Lassaletta, L., Castro, A., Bastarrica, M., Pérez-Mora, R., Madero, R., De Sarriá, J., & Gavilán,
J. (2007). Does music perception have an impact on quality of life following cochlear
implantation? Acta Oto-Laryngologica, 127, 682-686.
Laukka, P., Juslin, P. N., & Bresin, R. (2005). A dimensional approach to vocal expression of
emotion. Cognition and Emotion, 19, 633-653.
Leal, M. C., Young, J., Laborde, M.-L., Calmels, M.-N., Verges, S., Lugardon, S., Andrieu, S.,
Deguine, O., & Fraysse, B. (2003). Music perception in adult cochlear implant recipients.
Acta Oto-Laryngologica, 123, 826-835.
Loizou, P. (1998). Mimicking the human ear. IEEE Signal Processing Magazine, 15, 101-130.
Luo, X., Fu, Q. J., & Galvin, J. (2007). Vocal emotion recognition by normal-hearing listeners
and cochlear implant users. Trends in Amplification, 11, 301-315.
Masataka, N. (1999). Preference for infant-directed singing in 2-day-old hearing infants of deaf
parents. Developmental Psychology, 35, 1001-1005.
McDermott, H. J. (2004). Music perception with cochlear implants: A review. Trends in
Amplification, 8, 49-82.
Meister, H., Landwehr, M., Pyschny, V., Walger, M., von Wedel, H. (2009). The perception of
prosody and speaker gender in normal-hearing listeners and cochlear implant recipients.
International Journal of Audiology, 48, 38-48.
91
Mitani, C., Nakata, T., Trehub, S. E., Kanda, Y., Kumagami, H., Takasaki, K., Miyamoto, I., &
Takahashi, H. (2007). Music recognition, music listening, and word recognition by deaf
children with cochlear implants. Ear and Hearing, 28, 29S-33S.
Moore, B. C. J., & Carlyon, R. P. (2005). Perception of pitch by people with cochlear hearing
loss and by cochlear implant users. In C. J. Plack, A. J. Oxenham, R. R. Fay, & A. N.
Popper (Eds.), Pitch: Neural coding and perception (pp. 234-277). New York: Springer.
Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S.L., & Besson, M. (2009). Musical
training influences linguistic abilities in 8-year-old children: More evidence for brain
plasticity. Cerebral Cortex, 19, 712-723.
Most, T., & Aviner, C. (2009). Auditory, visual, and auditory–visual perception of emotions by
individuals with cochlear implants, hearing aids, and normal hearing. Journal of Deaf
Studies and Deaf Education, 14, 449-464.
Most, T., & Peled, M. (2007). Perception of suprasegmental features of speech by children with
cochlear implants and children with hearing aids. Journal of Deaf Studies and Deaf
Education, 12, 350–361.
Mote, J. (2011). The effects of tempo and familiarity on children’s affective interpretation of
music. Emotion, 11, 618-622.
Nakata, T., Trehub, S.E., Kanda, Y., Mitani, C., & Schellenberg, E.G. (2005). Music recognition
by Japanese children with cochlear implants. Journal of Physiological Anthropology and
Applied Human Science, 24, 29-32.
92
Nakata, T., Trehub, S.E., Mitani, C., & Kanda, Y. (2006). Pitch and timing in the songs of deaf
children with cochlear implants. Music Perception, 24, 147-154.
Nicholas, J. G., & Geers, A. E. (2006). Effects of early auditory experience on the spoken
language of deaf children at 3 years of age. Ear and Hearing, 27, 286-298.
Nicholas, J. G., & Geers, A. E. (2007) Will they catch up? The role of age at cochlear
implantation in the spoken language development of children with severe to profound
hearing loss. Journal of Speech, Language, and Hearing Research, 50, 1048-1062.
Nimmons, G. L., Kang, R. S., Drennan, W. R., Longnion, J., Ruffin, C., Worman, T., Yueh, B.,
& Rubinstein, J. T. (2007). Clinical assessment of music perception in cochlear implant
listeners. Otology and Neurotology, 29, 149-155.
Orchard, T. L., & Yarmey, A. D. (1995). The effects of whispers, voice-sample duration, and
voice distinctiveness on criminal speaker identification. Applied Cognitive Psychology, 9,
249-260
Papoušek, M. (1992). Early ontogeny of vocal communication in parent–infant interactions. In
H. Papoušek, U. Jürgens, & M. Papoušek (Eds.), Nonverbal vocal communication:
Comparative and developmental approaches (pp. 230-261). New York: Cambridge
University Press.
Peng, S.-C., Tomblin, J. B., & Turner, C. W. (2008). Production and perception of speech
intonation in pediatric cochlear implant recipients and individuals with normal hearing.
Ear and Hearing, 29, 336–351.
93
Percy-Smith, L., Jensen, J. H., Caye-Thomasen, P., Thomsen, J., Gudman, M., & Lopez, A.G.
(2008). Factors that affect the social well-being of children with cochlear implants.
Cochlear Implants International, 9, 199-214.
Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants,
immediacy, and isolation after brain damage. Cognition, 68, 111-141.
Peterson, N. R., Pisoni, D. B., & Miyamoto, R. T. (2010). Cochlear implants and spoken
language processing abilities: Review and assessment of the literature. Restorative
Neurology and Neuroscience, 28, 237-250.
Pisoni, D. B. (2005). Speech perception in deaf children with cochlear implants. In D. B. Pisoni
& R. E. Remez (Eds.), The handbook of speech perception. Blackwell handbooks in
linguistics (pp. 494-523). Malden: Blackwell Publishing.
Remez, R. E., Fellowes, J. M., & Rubin, P. E. (1997). Talker identification based on phonetic
information. Journal of Experimental Psychology: Human Perception and Performance,
23, 651–666.
Sagi, E., Kaiser, A. R., Meyer, T. A., & Svirsky, M. A. (2009). The effect of temporal gap
identification on speech perception by users of cochlear implants. Journal of Speech,
Language, and Hearing Research, 52, 385-395.
Salimpoor, V. N., Benovoy, M., Longo, G., Cooperstock, J. R., & Zatorre, R. J. (2009). The
rewarding aspects of music listening are related to degree of emotional arousal. PLoS
ONE, 4, e7487.
Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science, 15, 511-514.
94
Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ.
Journal of Educational Psychology, 98, 457-468.
Schellenberg, E. G. (2011). Examining the association between music lessons and intelligence.
British Journal of Psychology, 201, 283-302.
Schellenberg, E. G. & Hallam, S. (2005). Music listening and cognitive abilities in 10- and 11-
year-olds: the Blur effect. Annals of the New York Academy of Sciences, 1060, 202-209.
Schellenberg, E. G., Nakata, T., Hunter, P. G., Tamoto, S. (2007). Exposure to music and
cognitive performance: Tests of children and adults. Psychology of Music, 35, 5-19.
Schellenberg, E. G., Peretz, I., & Vieillard, S. (2008). Liking for happy- and sad-sounding music:
Effects of exposure. Cognition and Emotion, 22, 218-237.
Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research.
Psychological Bulletin, 99, 143-165.
Scherer, K. R. (2003).Vocal communication of emotion: A review of research paradigms. Speech
Communication, 40, 227-256.
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression
correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32, 76-92.
Schorr, E. A., Roth, F. P., & Fox, N. A. (2009). Quality of life for children with cochlear
implants: Perceived benefits and problems and the perception of single words and
emotional sounds. Journal of Speech, Language, and Hearing Research, 52, 141-152.
95
Sharma, A., Dorman, M.F., & Kral, A. (2005). The influence of a sensitive period on central
auditory development in children with unilateral and bilateral cochlear implants. Hearing
Research, 203, 134-143.
Sheffert, S. M., Pisoni, D. B., Fellowes, J. M., & Remez, R. E. (2003). Learning to recognize
talkers from natural, sinewave, and reversed speech samples. Journal of Experimental
Psychology: Human Perception and Performance, 28, 1447-1469.
Singh, L., Morgan, J. L., & Best, C. T. (2002). Infants' listening preferences: Baby talk or happy
talk? Infancy, 3, 365-394.
Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in
auditory perception. Nature, 416, 87-90.
Spence, M. J., Rollins, P. J., & Jerger, S. (2002). Children’s recognition of cartoon voices.
Journal of Speech, Language, and Hearing Research, 45, 214–222.
Stalinski, S. M., Schellenberg, E. G., & Trehub, S. E. (2008). Developmental changes in the
perception of pitch contour: Distinguishing up from down. Journal of the Acoustical
Society of America, 124, 1759-1763.
Stordahl, J. (2002). Song recognition and appraisal: A comparison of children who use cochlear
implants and normally hearing children. Journal of Music Therapy, 39, 2-19.
Sucher, C. M., & McDermott, H. J. (2009). Bimodal stimulation: Benefits for music perception
and sound quality. Cochlear Implants International, 10, 96–99.
96
Svirsky, M. A., Chin, S. B., & Jester, A. (2007). The effects of age at implantation on speech
intelligibility in pediatric cochlear implant users: Clinical outcomes and sensitive periods.
Audiological Medicine, 5, 293-306.
Svirsky, M. A., Robbins, A. M., Kirk, K. I., Pisoni, D. B., & Miyamoto, R. T. (2000). Language
development in profoundly deaf children with cochlear implants. Psychological Science,
11, 153-158.
Teagle, H. F. B. & Eskridge, H. (2010). Predictors of success for children with cochlear
implants: The impact of individual differences. In A. L. Weiss (Ed.). Perspectives on
individual differences affecting therapeutic change in communication disorders. New
directions in communications disorders research (pp. 251-272). New York: Psychology
Press.
Thai-Van, H., Cozma, S., Boutitie, F., Disant, F., Trui, E., & Collet, L. (2007). The pattern of
auditory brainstem response wave V maturation in cochlear-implanted children. Clinical
Neurophysiology, 118, 176-189.
Trainor, L. J. (1996). Infant preferences for infant-directed versus noninfant-directed playsongs
and lullabies. Infant Behavior and Development, 19, 83-92.
Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a
result of the vocal expression of emotion? Psychological Science, 11, 188–195.
Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of
preferences for infant-directed singing. Infant Behavior and Development, 20, 383-396.
97
Trehub, S. E., Hannon, E. E., & Schachner, A. (2010). Perspectives on music and affect in the
early years. In P. N. Juslin & J. A. Sloboda (Eds), Handbook of music and emotion:
Theory, research, applications (pp. 645-668). New York: Oxford University Press.
Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: Lullabies and play songs. Advances in
Infancy Research, 12, 43-77.
Trehub, S. E., Trainor, L. J., & Unyk, A. M. (1993). Music and speech processing in the first
year of life. In H. W. Reese (Ed.), Advances in child development and behavior (Vol. 24,
1-35). San Diego: Academic Press.
Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and
parameters: Part I—Recognition of backward voices. Journal of Phonetics, 13, 19–38.
Vieillard, S., Peretz, I., Gosselin, N., Khalfa, S., Gagnon, L., & Bouchard, B. (2008). Happy, sad,
scary and peaceful musical excerpts for research on emotions. Cognition and Emotion,
22, 720-752.
Vongpaisal, T., Trehub, S. E., & Schellenberg, E. G. (2006). Song recognition by children and
adolescents with cochlear implants. Journal of Speech, Language, and Hearing Research,
49, 1091–1103.
Vongpaisal, T., Trehub, S. E., & Schellenberg, E. G. (2009). Identification of TV tunes by
children with cochlear implants. Music Perception, 27, 17–24.
Vongpaisal, T., Trehub, S. E., Schellenberg, E. G., van Lieshout, P., & Papsin, B. C. (2010).
Children with cochlear implants recognize their mother’s voice. Ear and Hearing, 31,
555-566.
98
Vongphoe, M., & Zeng, F. G. (2005). Speaker recognition with temporal cues in acoustic and
electric hearing. Journal of the Acoustical Society of America, 118, 1055–1061.
Vos, P. G., & Troost, J. M. (1989). Ascending and descending melodic intervals: Statistical
findings and their perceptual relevance. Music Perception, 6, 383-396.
Waltzman, S. B., & Cohen, N. L. (1998). Cochlear implantation in children younger than 2 years
old. American Journal of Otology, 19, 158-162.
Williams, B. R. (2006). Inconsistency in reaction time: Normal development and group
differences between those with attention deficit / hyperactivity disorder and controls.
Unpublished doctoral dissertation, University of Victoria.
Wilson B. S., Schatzer, R., Lopez-Poveda, E. A., Sun, X., Lawson, D. T., & Wolford R. D.
(2005). Two new directions in speech processor design for cochlear implants. Ear and
Hearing, 26, 73S-81S.
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience
shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10,
420-422.
Wu, C. C., Lee Y. C., Chen P. J., & Hsu C. J. (2008). Predominance of genetic diagnosis and
imaging results as predictors in determining the speech perception performance outcome
after cochlear implantation in children. Archives of Pediatrics and Adolescent Medicine,
162, 269-276.
Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music:
Characterization, classification, and measurement. Emotion, 8, 494-521.