Download - Talker Discrimination, Emotion Identification, and Melody ......Anna Volkova Doctor of Philosophy Department of Psychology University of Toronto 2012 Abstract Users of cochlear implants

Talker Discrimination, Emotion Identification, and Melody Recognition by Young Children with Bilateral Cochlear Implants

by

Anna Volkova

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Graduate Department of Psychology University of Toronto

© Copyright by Anna Volkova 2012

ii

Talker Discrimination, Emotion Identification, and Melody Recognition by Young

Children with Bilateral Cochlear Implants

Anna Volkova

Doctor of Philosophy

Department of Psychology University of Toronto

2012

Abstract

Users of cochlear implants typically have difficulty differentiating talkers, identifying vocal

expressions of emotion, and recognizing familiar melodies because of the degraded spectral cues

provided by conventional implants. This thesis examined these abilities in a small, relatively

privileged sample of young bilateral implant users. In Study 1 child implant users and a control

sample of hearing children were required to judge whether various utterances were produced by

a man, woman, or girl (Experiment 1) and to identify the voices of cartoon characters from

familiar television programs (Experiment 2). Child implant users’ performance on talker

classification was comparable to that of hearing children. Their identification of cartoon

characters’ voices was less accurate than that of hearing children but well above chance levels.

These findings challenge conventional wisdom about the talker identification difficulties of

implant users. In Study 2 the children were required to indicate whether semantically neutral

utterances (Experiment 1) or classical piano excerpts (Experiment 2) sounded “happy” or “sad”.

In both cases, implant users performed less accurately than hearing children but well above

chance levels. Although the findings on emotion recognition in music are in line with those of

previous research, the findings on emotion in speech are at odds with claims that young implant

users are insensitive to vocal affect. In Study 3 the children were required to identify the theme

songs from familiar television programs on the basis of combined timing and pitch cues as well

iii

as timing or pitch cues alone. Implant users’ performance was comparable to that of hearing

children except when the cues were restricted to pitch relations, which resulted in performance at

chance levels. The findings suggest that the musical representations of young implanted listeners

include precise information about timing and coarser information about pitch. They also

demonstrate, for the first time, that children, both implant users and those with normal hearing,

can identify familiar music on the basis of timing cues alone. Overall, the findings highlight the

importance of timing cues for implant users, the range of individual differences, and habilitation

possibilities for the recognition of talkers, emotion, and music.

iv

Acknowledgements

I am indebted to participants and their families whose cooperation, enthusiasm, and

commitment made this research possible. I thank my supervisors, Dr. Sandra E. Trehub and Dr.

E. Glenn Schellenberg for generous advice, unfailing support and encouragement. I also thank

Dr. Karen A. Gordon for valuable suggestions and research accommodations.

I am grateful to Deanna Feltracco, Judy Plantinga, Sasha Poon, and Lily Zhou for

assistance in data collection; to Marieke van Heugten, Stephen Feltracco, Rebekah Prince, and

Laura Prince for help with stimuli creation; and to Steve Hong for computer programming and

technical support. I gratefully acknowledge the assistance of Vicky Papaioannou, Gina Goulding,

Jerome Valero, and Stephanie Jewell. I am also grateful to Ann Lang for invaluable help in

administrative matters.

Last, but not least, I thank my daughter, Maria Komech, my parents, Margarita Petrosyan

and Dr. Leonid Volkov, and my friend, Dr. Denis Kosygin, for patience and unconditional

support.

v

Table of Contents

General Abstract: Talker Discrimination, Emotion Identification, and Melody Recognition by Young Children with Bilateral Cochlear Implants ..................................................................... ii

Acknowledgements ........................................................................................................................ iv

Table of Contents ............................................................................................................................ v

List of Tables ................................................................................................................................. vii

List of Figures .............................................................................................................................. viii

Introductory Comments ................................................................................................................... 1

Study 1: Children with Bilateral Implants Differentiate Familiar and Unfamiliar Talkers ............ 6

Introduction ................................................................................................................................ 7

Experiment 1 .............................................................................................................................. 9

Method .............................................................................................................................. 10

Results and Discussion ...................................................................................................... 15

Experiment 2 ............................................................................................................................ 21

Method .............................................................................................................................. 22


General Discussion ................................................................................................................... 27

Study 2: Children with Bilateral Cochlear Implants Identify Emotion in Speech and Music ...... 30

Introduction .............................................................................................................................. 31

Experiment 1 ............................................................................................................................ 34

Method .............................................................................................................................. 36


Experiment 2 ............................................................................................................................ 45

Method .............................................................................................................................. 48


General Discussion ................................................................................................................... 54

vi

Study 3: Pitch and Timing Cues in Child Implant Users’ Recognition of Familiar Melodies ..... 56

Introduction .............................................................................................................................. 57

Method ...................................................................................................................................... 61

Results ...................................................................................................................................... 67

Discussion................................................................................................................................. 70

Supplementary Comments ............................................................................................................ 73

References ..................................................................................................................................... 82

vii

List of Tables

Table 1. CI participants: Background information (Study 1)…………………………………... 11

Table 2. Utterances spoken by a man, woman, and girl………………………………………... 12

Table 3. Mean speaking rate values (syllables per second) of the three talkers………………... 13

Table 4. TV characters and their utterances……………………………………………………..23

Table 5. F0 and speaking rate of TV characters selected by CI children………………………..25

Table 6. CI participants: Background information (Study 2). Participants codes are preserved

across studies………….………………………………………………………………………... 37

Table 7. Happy- and sad-sounding speech stimuli……………………………………………... 40

Table 8. CI participants: Background information (Study 3). Participants codes are preserved

across studies……………………………………………….........................................................62

Table 9. Key, pitch range and tempo of melodies extracted from the TV-show theme songs…..63

Table 10. Cumulative scores and demographic profiles of the 11 CI users who completed four or

more tasks comprising the present investigation………………………………………………...81

viii

List of Figures

Figure 1. A spectrogram of the word “elephant” spoken by a woman. The smooth line represents

the intensity contour and the dotted line represents the F0 contour…………………………….14

Figure 2. Performance of CI and NH children as a function of utterance type (Experiment 1).

Error bars represent standard errors……………………………………………………………..16

Figure 3. Performance of CI children as a function of talker and utterance type (Experiment 1)

…………………………………………………………………………………………………..17

Figure 4. Recognition of cartoon voices by CI and NH children (Experiment 2). Error bars

represent standard errors………………………………………………………………………...18

Figure 5. Performance of individual CI children in Experiments 1 and 2……………………...26

Figure 6. Prosodic contours of “The chair has four legs” produced in a happy and a sad manner

by male and female talkers (Experiment 1)……………………………………………………..41

Figure 7. Performance of child CI users and NH children on happy and sad speech as a function

of block order (Experiment 1). Error bars represent standard errors……………………………43

Figure 8. Performance of individual child CI users on happy and sad speech (Experiment 1)

ordered by scores on Block 1 from best to worst. Original participant codes are

preserved………………………………………………………………………………………...44

Figure 9. Performance of child CI users (Block 1) and NH children on happy and sad music

(Experiment 2). Error bars represent standard errors…………………………………………... 51

ix

Figure 10. Performance of individual child CI users on happy and sad music (Experiment 2).

Performance is averaged across the 2 blocks (20 trials) and ordered from best to worst. Original

participant codes are preserved……………………………………………………………........52

Figure 11. Performance on emotion identification in speech and music for the 12 CI users who

participated in both tasks. Performance on each task is averaged across the 2 blocks (36 and 20

trials, respectively)……………………………………………………………………………....52

Figure 12. Accuracy of emotion identification in speech (Experiment 1) as a function of years of

implant use. Performance is averaged across the 2 blocks (36 trials)…………………………..53

Figure 13. Accuracy of emotion identification in music (Experiment 2) as a function of years of

implant use. Performance is averaged across the 2 blocks (20 trials)…………………………..53

Figure 14. Examples of the melodic, timing-only and pitch-only conditions for two TV-show

theme songs: “Backyardigans” and “Diego”……………………………………………………66

Figure 15. Performance of child CI users and NH listeners. Error bars represent standard

errors…………………………………………………………………………………………….67

Figure 16. Performance of individual CI children. Original participant codes are preserved…..69

Figure 17. Performance of individual NH children in the timing-only and pitch-only conditions

…………………………………………………………………………………………………...70

1

Introductory Comments

Cochlear implants (CIs) were developed to make spoken language accessible to

individuals with profound sensorineural hearing loss. The prostheses elicit auditory sensations by

direct stimulation of the auditory nerve. They restore partial hearing to postlingually deaf adults,

enabling many to resume oral conversational interactions in person and over the telephone

(Loizou, 1998). CIs also provide auditory sensations to congenitally or prelingually deafened

children, many of whom have been able to acquire good oral language skills, attend regular

schools, and function successfully in the general community (Geers, 2004; Svirsky, Robbins,

Kirk, Pisoni, & Miyamoto, 2000). Remarkably, the speech perception and production skills of

some child implant users equal those of their hearing peers (Geers, 2006; Nicholas & Geers,

2007).

The signal-processing techniques of contemporary CIs are designed to mimic the

function of the normal cochlea as closely as possible (Gates & Miyamoto, 2003; Loizou, 1998),

but they are inadequate for transmitting some aspects of the auditory signal. More specifically,

CIs deliver temporal envelope cues that are sufficient for speech intelligibility under favorable

(quiet) conditions (Loizou, 1998), but they largely discard fine-structure cues (Smith, Delgutte,

& Oxenham, 2002), limiting listener’s access to spectral information (Wilson et al., 2005). These

limitations interfere with CI users’ ability to perceive speech in noise (Cullington & Zeng, 2008;

Friesen, Shannon, Baskent, & Wang, 2001), talker identity (Cleary, Pisoni, & Kirk, 2005; Fu,

Chinchilla, & Galvin, 2004; Vongpaisal, Trehub, Schellenberg, Van Lieshout, & Papsin, 2010),

lexical tones (Barry et al., 2002), speech prosody (Chatterjee & Peng, 2007; Luo, Fu & Galvin,

2007; Peng, Tomblin, & Turner, 2008), and music (McDermott, 2004).

2

This thesis focuses on perceptual skills that rely, to a considerable extent, on spectral

features in the auditory signal: talker discrimination (Study 1), identification of emotion in

speech and music (Study 2), and melody recognition (Study 3). Individuals with normal hearing

(NH) use voice pitch and voice quality to determine the gender, age, and identity of talkers (Van

Lancker, Kreiman, & Emmorey, 1985). On the basis of intonation patterns or paralinguistic cues,

they differentiate various vocal emotions (Bachorowski, 1999; Scherer, 2003). NH listeners can

also recognize a familiar melody based on relations between successive pitches (e.g., Nimmons

et al., 2007), which enables them to recognize a familiar tune played at different pitch levels and

on different instruments. By contrast, CI users, whether children, adolescents, or adults, have

difficulty differentiating talkers (Cleary et al., 2005; Fu et al., 2004; Fu, Chinchilla, Nogaki, &

Galvin, 2005; Kovačić & Balaban, 2009, 2010), identifying vocal expressions of emotion

(Hopyan-Misakyian, Gordon, Dennis, & Papsin, 2009; Luo et al., 2007; Most & Aviner, 2009),

and identifying familiar melodies (Leal et al., 2003; Nimmons et al., 2007; Olszewski, Gfeller,

Froman, Stordahl, & Tomblin, 2005; Stordahl, 2002; Vongpaisal, Trehub, & Schellenberg, 2006,

2009).

Many studies with CI users have used stimuli that were not ecologically valid and tasks that

were not particularly engaging. The information provided by those studies is of unquestionable

importance, but it may underestimate the abilities of CI users in everyday contexts in which

additional cues may be accessible and useful. In the case of talker identification, for example,

studies with one-syllable utterances (Fu et al., 2005) have revealed extremely poor performance

by adult CI users, yet we know that for NH listeners, longer utterances can lead to increased

accuracy of talker identification (Orchard & Yarmey, 1995). When Cleary et al. (2005) examined

talker identification in child CI users, they used sentence-length utterances but they simulated

variations among talkers by electronically manipulating the pitch and spectral features of a single

3

talker. Those manipulations eliminated temporal cues to talker identity, which are useful to NH

listeners (Lander, Hill, Kamachi, & Vatikiotis-Bateson, 2007; Remez, Fellowes, & Rubin, 1997).

As a result, child CI users had to rely exclusively on spectral cues, which may account for the

very poor outcomes. Vongpaisal et al. (2010) demonstrated that child CI users can make

effective use of temporal cues, such as speaking rate and individual variations in articulation.

Studies of emotional prosody have had relatively poor outcomes with CI users of various

ages (Hopyan-Misakyan et al., 2009; Luo et al., 2007; Most & Peled, 2007), leading many to

conclude that emotion in speech is largely inaccessible to this population. However, these studies

used four or more emotional categories, some of which had considerable overlap in acoustic cues

(Bachorowski, 1999; Scherer, 2003), raising the possibility of better differentiation of emotional

categories that are more acoustically contrastive. Moreover, studies of child CI users’ recognition

of “familiar” melodies have often used melodies that are familiar to the general population

(Stordahl, 2002; Olszewski et al., 2005), but those melodies may be much less familiar to

children who are prelingually deaf than they are to NH children. Child CI users’ performance

has been more successful in studies that have used theme music from television programs that

the children watch regularly (Mitani et al., 2007; Vongpaisal et al, 2009).

The overall strategy in the present thesis was to optimize young CI users’ performance by

using ecologically valid stimuli as much as possible. Accordingly, talker identification was

studied with the use of utterances spoken in a child-directed manner in Studies 1 and 2, including

the voices of TV characters that were familiar and much loved. It also resulted in the use of

theme songs from children’s favorite TV programs, in line with Mitani et al. (2007) and

Vongpaisal et al. (2009). Every effort was made to reduce the cognitive demands of the tasks so

that variations in performance would reflect children’s relative ease or difficulty with the stimuli.

4

Accordingly, all tasks featured forced-choice responses with two or three alternatives, with

feedback in some cases to help children focus on the relevant cues. We also optimized children’s

engagement by embedding all tasks in an interactive game-like environment on a computer.

Most studies of child and adult CI users are marked by enormous individual differences

(Peterson, Pisoni, & Miyamoto, 2010). It is of obvious importance to understand the nature of

such variation and the factors that contribute to it, and there has been some recent headway in

this regard (Geers, Nicholas, & Moog, 2007; Sagi, Kaiser, Meyer, & Svirsky, 2009; Pisoni,

2008). At the same time, there is something to be gained from focusing on the skills of

successful CI users because that provides insight into achievements that are possible for CI users

under optimal or simply reasonable circumstances.

Some of the factors associated with positive language outcomes in children with CIs are

genetic non-syndromic congenital deafness (Kawasaki, Fukushima, Kataoka, Fukuda, &

Nishizaki, 2006; Wu, Lee, Chen, & Hsu, 2008), early age of implantation (Coletti, Carner,

Miorelli, Guida, Coletti, & Fiorino, 2005; Connor, Craig, Raudenbush, Heavner, & Zwolan,

2006; Svirsky, Chin, & Jester, 2007), longer use of processing strategies that emphasize spectral

information (Geers, Brenner, & Davidson, 2003), emphasis on oral communication (Nicholas &

Geers, 2006; Svirsky et al., 2000), non-verbal intelligence (Geers et al., 2003), and cognitive

processing variables, such as working memory capacity and verbal rehearsal speed (Pisoni,

2005). The CI sample of the present investigation incorporated a number of the favorable

circumstances noted above. Specifically, it consisted of a small number of young, congenitally or

prelingually deaf children who had been identified early and implanted at 3.5 years of age or

earlier when residual hearing was insufficient for successful amplification. They used similar

devices programmed with the Advanced Combination Encoder (ACE) processing strategy,

5

which is designed to enhance spectral information (Waltzman & Roland, 2006). At least half of

the CI users were congenitally deaf with a genetic etiology. All CI users were being raised and

educated in an exclusively oral environment, and they were free of cognitive disabilities.

The over-arching goal of this thesis was to study these relatively privileged children with

a view to shedding light on the potential of CI users in three challenging domains: talker

discrimination, emotion identification, and music processing. In principle, the fruits of this

research could advance theory and clinical practice (e.g., therapeutic interventions) with this

population.

6

Study 1: Children with Bilateral Implants Differentiate Familiar and Unfamiliar Talkers

Abstract

The present study examined the ability of prelingually deaf children with bilateral implants to

identify familiar and unfamiliar talkers from utterances of varied duration. In Experiment 1

prelingually deaf children with bilateral cochlear implants classified sentences, short

exclamations, and words as spoken by a man, woman, or child. Child implant users achieved

near-perfect accuracy, as did children with normal hearing. In Experiment 2 children with

bilateral implants were required to identify three familiar cartoon characters from sentence-

length utterances. Their performance was well above chance levels but significantly less accurate

than that of normally hearing children. Several child implant users had error-free performance on

both tasks, which challenges the prevailing views about talker recognition in this population.

7

Introduction

In general, listeners have no difficulty identifying the gender or approximate age (child,

young adult, elderly adult) of unfamiliar talkers on the radio or telephone. At their disposal are

multiple cues to talker identity, including prosody (i.e., intonation and rhythm), voice quality

(i.e., timbre), and pitch level (Van Lancker et al., 1985). The situation is very different for deaf

individuals with cochlear implants (CIs). These prosthetic devices, designed to facilitate access

to spoken language and oral communication, are optimized for speech, which is coded by

amplitude variations over time.

Most CI users can understand speech in favorable (quiet) listening environments, but the

absence of temporal fine structure in the input provides them with degraded pitch and spectral

information (Loizou, 1998; Smith et al., 2002). As a result, they have difficulty deciphering

speech in noise (e.g., Friesen et al., 2001), which is probably exacerbated by unilateral input.

They also have difficulty perceiving music (see McDermott, 2004, for a review), recognizing

vocal emotion (Hopyan-Misakyan et al., 2009), and identifying talkers (Kovačić & Balaban,

2009, 2010). Pitch differences between male and female speakers (~an octave) are often

sufficient for the discrimination of voice gender (Fu et al., 2005), but Kovačić and Balaban

(2009) found that only half of child and adolescent CI users could identify voice gender. Within-

gender contrasts are generally considered difficult or impossible for adult (Fu et al., 2004) and

child CI users (Cleary et al., 2005).

Most studies of talker discrimination by adults and children with CIs have used isolated

syllables (e.g., Fu et al., 2005; Vongphoe & Zeng, 2005) or electronically altered utterances from

a single speaker (Cleary et al., 2005). Both approaches obscure individual variations in speaking

style that may be important for listeners with pitch-processing difficulties. For normal-hearing

8

(NH) listeners, differentiating talkers when pitch and timbre cues are unavailable is helped by

individual differences in phoneme articulation (e.g., Remez et al., 1997) and expressive timing

(Lander et al., 2007).

CIs provide limited pitch and timbre information, but they are effective at transmitting

timing cues. It is possible, then, that talker identification and discrimination would be enhanced

if CI users had access to timing cues from longer or more natural speech samples. Nevertheless,

Kovačić and Balaban (2009) found that children and adolescents with CIs experienced difficulty

with gender identification, even in the context of 2-s excerpts from naturally produced sentences.

They found that duration of deafness, or auditory deprivation, was a better predictor of

performance than age of implantation, with longer periods of auditory deprivation having

particularly adverse consequences. A smaller than usual difference in average fundamental

frequency (approximately half an octave) between the male and female speakers may have posed

an additional source of difficulty for the implant users. Moreover, the linguistic complexity of

the speech samples and the use of multiple talkers may have counteracted the benefits of

“natural” speech samples, especially for the younger participants.

Vongpaisal et al. (2010) examined cross-gender and within-gender identification in

children with CIs using scripted, sentence-length utterances (i.e., same content across speakers)

from familiar (mother) and unfamiliar talkers in a computerized game with feedback. Although

pediatric CI users’ performance was less accurate than that of their hearing peers, they succeeded

in distinguishing their mother’s voice from the voices of a man, a girl, and several unfamiliar

women. Not surprisingly, children were most accurate at distinguishing their mother’s voice

from the highly dissimilar man’s voice and least accurate when distinguishing it from other

women’s voices. The performance of child CI users was slightly less accurate for samples in

9

which the prosodic differences among speakers were reduced, which implies that prosody made

some contribution to identification. Finally, performance improved over the course of the test

session, indicating the contribution of exposure and feedback. Vongpaisal et al. (2010)

speculated that child CI listeners had made use of individual differences in phoneme articulation

and speaking rate to identify the talkers.

The purpose of the present investigation was to ascertain whether CI users younger than

those tested by Vongpaisal et al. (mean of 8.9 years) and Kovačić and Balaban (mean of 12.3

years) could differentiate talkers on the basis of sentence-length as well as briefer utterances. The

focus was on unfamiliar talkers in Experiment 1 and on familiar talkers in Experiment 2. In

contrast to previous studies in this domain, the present CI users were less diverse in

chronological age, hearing history, and prosthetic devices. Among the potential advantages of

child CI users in the current study were bilateral CIs and relatively short durations of deafness.

On the other hand, their young age was a potential disadvantage in view of young children’s

inefficient use of auditory cues when compared with older children (Stalinski, Schellenberg, &

Trehub, 2008).

Experiment 1

CI users 4 to 6 years of age and a control group of hearing children listened to samples of

natural speech from three unfamiliar talkers (man, woman, and girl). The samples included full

sentences, familiar exclamations, and isolated words. Feedback was provided after each trial to

facilitate learning and to motivate the children. Although young CI users were expected to

perform poorly compared to NH children, their modest durations of deafness were expected to

facilitate talker differentiation.

10

Method

Participants. The participants included 14 bilateral CI users (6 girls and 8 boys, M = 5.7

years, SD = 0.8; range 4.1-6.9) who were recruited from a large metropolitan area (for

background information, see Table 1). There were 4 children with progressive hearing loss from

birth and 10 who were congenitally or prelingually deaf. All participants used Nucleus 24

Contour and/or Nucleus Freedom Contour Advance implants programmed to analyze sound

using Advanced Combination Encoder (ACE) processing strategy. The CI users had at least 2

years of implant experience (M = 4.3 years; SD = 0.9 years; range = 2.4−6.1 years). With the

exception of the 4 children with progressive hearing loss, their first implant was activated at 9 to

20 months of age. (M = 1.1 years, SD = 0.3 years). When tested with their implants, absolute

thresholds for tones within the speech range were within normal limits (10-30 dB HL). All CI

children participated in Auditory-Verbal Therapy for at least two years after implantation. They

also communicated exclusively by auditory-oral means and were in age-appropriate school

classes with their NH peers. A comparison sample of NH children consisted of 19 4-year-olds (M

= 4.7 years, SD = 0.3) from the community. No NH child had a personal or family history of

hearing problems, and all were free of colds on the day of testing.

Apparatus and Stimuli. The stimuli consisted of utterances produced by a man, a

woman, and a 10-year-old girl. Each of them produced 18 utterances, consisting of 6 full

sentences, 6 one- or two-word exclamations, and 6 isolated words (nouns) with one to three

syllables (see Table 2). The “actors” were asked to talk in an animated and expressive manner as

if interacting with a child. Two tokens of each utterance type were used from each actor.

11

Table 1. CI participants: Background information.

Participant Gender Age at test (years); E1; E2

Age at 1stand 2ndCI activation (years)

Etiology of hearing loss

CI-1*

CI-2

CI-3

CI-4

CI-5*

CI-6

CI-7

CI-8

CI-9

CI-10*

CI-11

CI-12

CI-13

CI-14*

CI-15+

CI-16+

M

M

M

F

F

M

M

M

F

M

F

F

F

M

M

M

5.8; 5.8

5.2; 4.8

5.3; 5.3

6.3; 5.8

6.9; 6.6

5.3; 5.0

5.8; 5.1

6.1; 6.1

5.8; 5.4

6.5; 6.3

4.8; 4.8

6.9; 6.1

4.1; 4.1

5.5; -

- ; 6.4

- ; 5.5

3.4; 3.4

0.8; 1.7

1.1; 1.1

1.0; 3.6

2.5; 4.0

1.0; 4.6

0.9; 1.8

0.8; 1.5

1.7; 1.7

3.1; 6.3

1.1; 1.1

1.0; 3.5

1.1; 0.8

2.7; 2.7

1.3; 2.3

1.7; 2.7

Genetic

Genetic

Genetic

Genetic

Unknown

Genetic

Genetic

Genetic

Unknown

Mondini dysplasia

Genetic

Unknown

Unknown

Unknown

Genetic

Genetic

* progressive hearing loss from birth + Participated in Experiment 2 only

12

Fundamental frequencies (F0s), amplitude ranges, and utterance durations of the talkers

were calculated using PRAAT software (Boersma & Weenink, 2005). Mean F0s for the

utterances of the man, woman and girl were 106.3 Hz (SD = 9.9 Hz), 243 Hz (SD = 21.8 Hz),

and 263.2 Hz (SD = 16.5 Hz), respectively. Mean amplitude ranges for the man, woman, and girl

were 40 dB (SD = 9.2 dB), 41.6 dB (SD = 5.5 dB), and 43 dB (SD = 4.6 dB), respectively. Figure

1 depicts a spectrogram of a sample utterance. Mean speaking rate values are presented in Table

3.

Table 2. Utterances spoken by a man, woman, and girl. Sentences We are going to the movies tonight.

These flowers are pretty.

Look, it’s raining!

What a beautiful morning!

Do you like this book?

Where’s my hat?

Exclamations Wow!

Look!

Please!

Thank you!

Hi!

Here!

Words cat

dog

rainbow

lion

elephant

umbrella

13

Table 3. Mean speaking rate values (syllables per second) of the three talkers.

Stimuli were recorded in a 3 m x 2.5 m double-walled, sound-attenuating chamber

(Industrial Acoustics Corporation) with a microphone (Sony F-V30T) connected directly to a

Windows XP computer workstation. High-quality digital sound files (44.1 kHz, 16-bit, mono)

were created, and the average amplitude of speech samples was equated with a digital audio

editor (Sound Forge 6.0). Visual stimuli in the talker-identification task consisted of colored

digital photographs (headshots) of the voice actors presented against a white background.

Testing took place in a double-walled sound-attenuating booth, either at a university

laboratory (in the same booth in which the stimuli were recorded) or a comparable facility at a

major children’s hospital, at the convenience of parents. A computer workstation and amplifier

(Harman/Kardon HK3380) outside the university booth were connected with a 17-in touch-

screen monitor (Elo LCD Touch Systems) and two high-quality loudspeakers (Electro-Medical

Instrument Co.) inside the booth. At the hospital venue, a GSI 61 two-channel clinical

audiometer (Grason-Stadler Instruments) replaced the amplifier. In both locations, the

loudspeakers were placed at 45 degrees azimuth to the participant, with the touch-screen monitor

directly in front of the participant. An interactive computer program (customized for Windows

XP) presented stimuli and recorded response selections when the participant touched the screen.

Condition Man Woman Girl

Sentences 4.5 (SD = 1.1) 4.7 (SD = 1.2) 4.0 (SD = 1.0)

Exclamations 1.9 (SD = 0.8) 2.0 (SD = 0.7) 1.7 (SD = 0.7)

Words 3.1 (SD = 1.1) 2.7 (SD = 0.8) 2.8 (SD = 1.0)

14

A portable keyboard was available to the experimenter in case young children preferred to make

their selections by pointing to a picture rather than touching the screen. All stimuli were played

at a comfortable sound level of approximately 65 dB SPL.

Figure 1. A spectrogram of the word “elephant” spoken by a woman. The smooth line represents

the intensity contour and the dotted line represents the F0 contour.

Procedure. Participants were tested individually. At their request, a parent was present in

the booth with some CI participants. Parents were permitted to assist with explanations when the

task was initially described to the child, but they did not interact with the child in any way once

the test phase began. Children were told that they were going to hear people talking, and that

they had to indicate whether the talker was a man, a woman, or a girl by touching one of the

pictures on the screen. There were no practice trials. The stimuli were presented in three blocks,

corresponding to the three conditions. Presentation was in fixed order - sentences first,

15

exclamations next, and isolated words last - for a total of 108 trials. The fixed order was used on

the basis of previous research, which suggested that sentence-length stimuli would be least

difficult and single-word stimuli most difficult (Vongpaisal et al., 2010; Vongphoe & Zeng,

2005). Stimuli within each block (6 utterances X 3 talkers X 2 tokens of each utterance) were

presented randomly. The three colored photographs, consisting of the faces of a man, woman,

and girl, appeared on the screen at the beginning of each trial. Before testing, each participant

identified each photograph as that of a man, woman, or girl. The spatial arrangement of

photographs was identical across participants and trials. After listening to each stimulus,

participants responded by touching the photograph of the presumed talker. They received

feedback after each trial - a schematic smiling face for correct responses and a blank screen for

incorrect responses.

Results and Discussion

Preliminary analyses compared performance collapsed across the different utterances

with chance levels (12 correct) on the three-alternative forced-choice task (36 trials with three

response options on each trial) separately for each speaker and for both groups of children. One-

sample t-tests confirmed that performance exceeded chance levels in each instance, p < .0001.

Figure 2 depicts performance for the two groups of children (CI, NH) on the three utterance

types (Sentences, Exclamations, and Isolated Words). Both groups performed above 90% correct

in all conditions. Figure 3 depicts the performance of CI children as a function of talker (Man,

Woman, and Child). Note the highly accurate classification of the man’s voice. His voice was

misclassified only once in the first condition. In the second and third conditions, performance on

the man’s voice was error-free. In two instances across conditions, the man was misidentified,

16

once as a woman and once as a girl. Otherwise, confusion between the utterances spoken by the

woman and girl was the only source of error for CI participants.

As expected, performance in the CI group was more variable than in the NH group, but

most CI participants scored within one standard deviation of the NH mean in all conditions.

Figure 4 shows individual accuracy scores of CI participants. Because performance by both

groups of children was at ceiling for the male talker’s three utterance types, this condition was

excluded from further consideration.

Figure 2. Performance of CI and NH children as a function of utterance type (Experiment 1).

Error bars represent standard errors.

30

40

50

60

70

80

90

100

Sentences Exclamations Words

% correct

CINH

A two-way mixed-design Analysis of Variance (ANOVA) examined performance as a

function of one between-subjects factor (Group: CI or NH) and two within-subjects factors

(Talker: woman or child; Utterance Type: sentences, exclamations, or isolated words). There was

no effect of Group, indicating that any apparent difference in performance between CI and NH

17

children was not reliable. The main effect of utterance type was significant, F(2, 62) = 5.49, p<

.006. Specifically, performance was significantly better for isolated words than for sentences,

t(32) = 2.93, p < .006, and exclamations, t(32) = 3.25, p < .005, which did not differ. Because

there was no interaction between group and utterance, F < 1, this effect was similar for both

groups of children. There were no other significant main effects or interactions.

Figure 3. Performance of CI children as a function of talker and utterance type (Experiment 1).

Error bars represent standard errors.

30

40

50

60

70

80

90

100

Sentences Exclamations Words

% correct

Man

Woman

Girl

Further consideration of individual data revealed that 11 of the 14 CI participants

classified the woman’s or girl’s utterances correctly before receiving any feedback, that is, on the

very first trial of the first condition, and they continued to perform correctly on subsequent trials.

Moreover, 10 of 14 participants classified both talkers correctly on their respective first trials in

the second condition (Exclamations), and 12 of 14 performed correctly on the first trial in the

18

third condition (Isolated Words). It is likely that increasing familiarity with the talkers coupled

with feedback counteracted any increased difficulty resulting from decreased stimulus duration

(Exclamations and Isolated Words vs Sentences) and prosodic variability (Isolated Words vs

Exclamations or Sentences). Three participants performed at ceiling in all conditions.

Figure 4. Performance of individual CI children in Experiments 1 and 2.

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

% correct

Sentences

Exclamations

Words

Cartoon

Response latencies are imperfect measures in young children because of fluctuations in

attention (Williams, 2006), and were therefore not undertaken in the present study. Informal

observations indicated, however, that the man’s utterances were classified confidently and

quickly, and more hesitation was evident on trials involving the woman and girl. Finally, there

were no systematic relations between performance and chronological age, age of implantation, or

duration of deafness. It is notable, however, that the child with the lowest scores (CI-13) was the

youngest CI user (4.1 years).

19

The principal goal of the present experiment was to determine whether very young

children who use CIs could correctly classify the gender and age of unfamiliar talkers.

Experienced child CI users 4 to 7 years of age successfully judged whether sentences, one-and

two-word exclamations, and isolated one-, two-, or three-syllable words were spoken by a man,

woman, or girl. Moreover, their performance did not differ significantly from that of NH

children. These findings stand in marked contrast with earlier research on talker differentiation in

CI children, which showed performance levels ranging from modest (Vongpaisal et al., 2010) to

poor (Cleary et al., 2005; Kovačić & Balaban, 2009).

The present findings extend those of Vongpaisal et al. (2010), who found that older child

CI users (M = 8.9 years) could differentiate their mother’s utterances from those of an unfamiliar

man, child, or other women. The authors suggested that the familiarity of the mother’s voice was

critical, as was the provision of sentence-length utterances. They suggested, moreover, that child

CI users capitalized on individual differences in temporal cues, specifically, on differences in

phoneme articulation and speaking rate. The present findings indicate that familiar voices are not

essential for successful talker differentiation, nor are full sentences, under certain circumstances,

at least. Nevertheless, motivational aspects of tasks used with children are undoubtedly

important, accounting, perhaps, for the substantial differences between the present findings and

those of Kovačić and Balaban (2009) with much older children (M = 12.3 years).

It is more difficult to pinpoint the contribution of timing cues to young CI users’ success.

The girl’s speaking rate (i.e., syllables per second) was slower than that of the woman for

sentences and exclamations, but differences were negligible for isolated words (see Table 3),

which yielded the highest performance levels. It is likely that CI users in the present study made

productive use of individual differences in consonant and vowel articulation, which enable

20

hearing listeners to identify talkers from severely degraded speech (Remez et al., 1997; Sheffert,

Pisoni, Fellowes, & Remez, 2002).

Child CI users’ greater accuracy of classifying the gender and age of talkers from isolated

words than from exclamations or sentences seems counterintuitive. However, the fixed order of

presentation in the present study, which was prompted by greater anticipated difficulty with

isolated words, precludes effective comparison of the relative ease of identifying the different

classes of stimuli. Because children received feedback on every trial, they had the opportunity to

learn about talker-specific features and to transfer that knowledge to subsequent conditions

involving novel utterances from the same speakers. Such learning effects indicate the potential of

young CI users to benefit from limited training with challenging material.

Child CI users demonstrated exceptional accuracy at classifying the male talker.

According to Fu et al. (2005), CIs provide sufficient pitch information to enable differentiation

of a man’s voice from that of a woman or child. The fact that all children accurately identified

the male talker on the first trial indicates that young CI children, like NH children, have long-

term representations of men’s voices based on their everyday experience.

Although classification of speech samples from the woman and the girl was more

difficult, more than half of the young CI users were correct on their very first trial, before having

an opportunity to compare talkers or make use of feedback. Moreover, three of these children

had error-free performance in all conditions, which is at odds with the view that device

limitations preclude successful talker identification (Kovačić & Balaban, 2009). Further research

could aim to identify the acoustic cues that underlie success in the best CI users as well as those

that are necessary for success in typical CI users.

21

Experiment 2

With a single exception (Vongpaisal et al., 2010), studies of talker identification in CI

users have focused on unfamiliar talkers. In most cases, as in Experiment 1 of the present study,

children and adults were required to differentiate general classes of stimuli (e.g., man or woman;

man, woman, or girl). It is possible, however, that children in Experiment 1 used person-specific

cues instead of, or as well as, general cues that distinguish the three classes of speakers.

Vongpaisal et al. (2010) claimed that the use of maternal voices, in particular, contributed to CI

children’s successful performance. Obviously, the mother’s voice is highly salient to children, as

evident, for example in maternal voice recognition in the neonatal period (DeCasper & Fifer,

1980). By the preschool period, children readily identify their classmates and teachers from

natural voice samples (Bartholomeus, 1973). They also recognize the voices of cartoon

characters from familiar television programs (Spence, Rollins, & Jerger, 2002). Whether children

with CIs would be capable of recognizing the voices of familiar cartoon characters or friends

remains unclear.

The purpose of the present experiment was to evaluate the ability of young bilateral CI

users to identify the voices of cartoon characters from television programs that they watch

regularly. Children would have considerably less exposure to the voices of specific cartoon

characters than to the voices of immediate family members or regular playmates. Nevertheless,

the voice quality and speaking style of TV characters aimed at child audiences are presumably

selected for their distinctiveness and memorability. The visual features of the characters and the

proliferation of toys based on these characters also enhance their overall familiarity and appeal.

We know that child CI users recognize the theme songs of children’s television programs

(Nakata, Trehub, Kanda, Mitani, & Schellenberg, 2005; Vongpaisal et al., 2006), which confirms

that these children attend to the soundtrack.

22

Method

Participants. With one exception (CI-14), bilateral CI users who participated in

Experiment 1 also took part in the present experiment. In addition, there were two boys (CI-15

and CI-16) who did not participate in Experiment 1, resulting in a final sample of 15 children (M

= 5.5 years, SD = 0.7 years, range = 4.1-6.6 years; Table 1). The control sample of NH children

consisted of the 19 children who participated in Experiment 1.

Apparatus and Stimuli. The apparatus was identical to that in Experiment 1.Twelve

cartoon characters were chosen from popular, age-appropriate TV shows. For each character,

five sentence-length utterances were selected from a variety of episodes. Special care was taken

to exclude stereotyped phrases that could provide cues to identity. Utterances were saved at a

sampling rate of 22.1 kHz (16-bit). Average amplitude was normalized across talkers and

resulted in roughly equal average amplitude across utterances (to eliminate loudness cues to

character identity) while preserving amplitude variations within an utterance. An iconic colored

picture of each character was used for purposes of visual identification. Instead of the

photographs of the man, woman, and girl that were displayed on the monitor in Experiment 1,

pictures of three cartoon characters were displayed.

Procedure. Prior to the test session, the experimenter asked the child and parent to

choose 3 characters from TV shows most familiar to the child from the 12 characters that were

available. Before the first trial, pictures of the three characters appeared simultaneously on the

computer monitor, and each child responded accurately when the experimenter pointed to each

picture in turn and asked “Who’s that? Tell me.” The spatial arrangement of pictures remained

constant throughout the test session. Auditory stimuli (3 characters X 5 utterances X 2

23

repetitions), consisting of 30 trials, were presented in random order at an average amplitude of

65 dB SPL. Children received no feedback for correct or incorrect responses.

Table 4. Cartoon characters and their utterances.

TV show Character Utterance

Dora

the Explorer Dora

I miss him so much.

I’m sure of it.

Jump up and down.

I promised you I’d get you out.

Put on your seatbelts.

Boots

That really was a shortcut.

How many umbrellas do we need?

That wouldn’t be too good.

We don’t have fancy clothes.

The contest is starting.

Spongebob Spongebob

So I’m gonna get us back on track.

I’m in big trouble.

Are you ready to give up your life of crime?

Everything I could ever want is right here.

But it doesn’t make any sense.

Patrick

It’s not my wallet.

That’s it alright.

Hold on just one second.

Do it again, I wasn’t looking.

I thought I was doing a pretty good job.

Squidward

Now go spread the word.

Well, you can’t play music with a piece of paper.

He does the opposite.

24

Put my windows back.

I didn’t play any wrong note.

Sandy

That’s gotta hurt.

Well, I can’t argue with that.

I was being a little too sensitive.

You ready to go sand boarding again?

Come and get it.

Bob

the Builder Bob

I think I’m going to need you both.

I’ve already had that idea.

I’ll be right over.

Don’t worry, we’re going to put it up later.

I can’t wait to see it.

Wendy

We had a bit of a slow start.

That must be the special display materials.

And that’s going to be a big job.

We need to measure how deep the trench is.

It’s time to get that screen up.

Sesame Street Bert

So long as I don’t take the pie this time.

Well it’s not exactly my favorite word.

Wait a minute.

I ended up saying you know what.

Forget it.

Ernie

Oh, but this word is different.

He takes the cake.

You’re a goat.

These are all our friends out there.

Have we got a final act yet?

Elmo

Is it fun being a birthday cake?

Are you a banana?

They just try to whistle.

And doggies have great hearing.

25

Tell us about yourself.

Kermit

I ordered a t-shirt with my name on it.

Just like it says on my shirt right here.

And you said it would be ready today?

You got the letters kind of mixed up there.

I don’t believe this.

Max and Ruby Ruby

Now you see why I needed your help.

Don’t touch that lemonade.

Magicians never give away their secrets.

He’s just getting dressed.

Tools do not make music.

Table 5. F0 and speaking rate of TV characters selected by CI children.

Character Average F0 across utterances (Hz)

Speaking rate (syllables/second)

Dora 328.8 (SD = 23.8) 3.0 (SD = 0.9)

Boots 336.4 (SD = 23.3) 3.5 (SD = 0.5)

Ruby 291.8 (SD = 60.2) 3.5 (SD = 0.7)

Sponge Bob 259.2 (SD = 52.4) 3.9 (SD = 0.4)

Patrick 248.0 (SD = 34.9) 3.4 (SD = 1.0)

Squidward 204.8 (SD = 21.4) 3.0 (SD = 0.5)

Elmo 330.0 (SD = 33.6) 3.5 (SD = 0.8)


Preliminary analyses compared performance with chance levels on the three-alternative

forced-choice task separately for each group of children. One-sample t-tests confirmed that

26

performance exceeded chance levels in each instance, p < .001. Mean accuracy was 89.7% (SD =

9.3%) for the CI group, and 97% (SD = 4.5%) for the NH group (Figure 5). As in Experiment 1,

performance in the CI group was considerably more variable than in the NH group (see Figure

4). An independent samples, unequal variance t-test revealed that the performance of the CI

children, although highly accurate, was significantly less accurate than that of NH 4-year-olds,

t(19.2) = 2.8, p < .02.

Figure 5. Recognition of cartoon characters by CI and NH children (Experiment 2). Error bars

represent standard errors.

The choice of characters in this experiment was necessarily driven by children’s familiarity

with the programs rather than the relative distinctiveness of talkers. Accordingly, the talkers

differed from one another in various respects, including voice quality, speaking rate, prosody,

and F0. Eight out of 15 CI children chose to listen to a subset of characters portrayed by a

woman (Ruby), a young girl (Dora), and a young boy (Boots). The voices of Dora and Boots

were almost identical in mean F0 across utterances, and the voices of Ruby and Boots were

27

virtually identical in speaking rate (see Table 4). Nevertheless, CI participants were 88% correct

(SD = 7.3%), on average, in differentiating these three characters. Cleary et al. (2005) found that

a difference of two semitones or more was necessary for talker differentiation by NH children

and the best-performing CI children. In the present experiment, CI users discriminated between

voices with much smaller F0 differences. In light of the degraded spectral input provided by their

implants, it is unlikely that children were using F0 differences alone for talker identification.

What was available instead were timing cues, both global cues such as speaking rate as well as

local cues involving idiosyncratic articulation of consonants and vowels. It is likely that

articulation and expressive timing “signatures” of the cartoon characters provided sufficient cues

to identify talkers that were similar in F0 and speaking rate.

General Discussion

The goal of the present study was to examine young bilateral CI users’ ability to identify

familiar and unfamiliar talkers. In Experiment 1, the children were required to differentiate an

unfamiliar man, woman, and girl from a variety of utterance types. Child CI users achieved near-

perfect accuracy at differentiating the talkers, as did a control sample of children with normal

hearing. Although the NH children were one year younger, on average (4.7 years), than the CI

children (5.6 years), the CI children would be considered younger in hearing age (Fagan &

Pisoni, 2010), which reflects the number of years of auditory experience.

In Experiment 2, children were required to identify familiar talkers, specifically, the

cartoon characters from television programs that they watched regularly. Although CI users were

very successful at this task, their performance did not match that of the younger NH children. It

is possible that the provision of feedback for correct responses, like that provided in Experiment

1, would have improved the performance of CI children. There was no way of tracking

children’s cumulative exposure to the television programs, so it is also possible that NH children

28

watched the programs more frequently than CI children did. Parents of CI children are typically

encouraged to interact verbally with their children as much as they can. Nevertheless, the

performance levels of child CI users were impressive and well above the levels reported in other

studies of talker identification among children with CIs.

From the modest size of the present sample, it is impossible to isolate the factors

responsible for the outstanding performance of the CI users. In contrast to previous studies of

talker identification in child CI users, in which most children had unilateral implants (Cleary et

al., 2005; Kovačić & Balaban, 2009; Vongpaisal et al., 2010), all CI participants in the present

study had bilateral implants. To date, bilateral implants have been associated with improved

sound localization and perception of speech in noise (see Johnston, Durieux-Smith, Angus,

O’Connor, & Fitzpatrick, 2009, for a review), but there is no indication that bilateral electrical

input facilitates talker discrimination or identification. This question could be addressed in future

research.

Sharma, Dorman, and Kral (2005) have suggested a sensitive period (approximately 3.5

years of age) for bilateral cortical plasticity. Others have suggested that there are adverse

consequences on auditory plasticity from long delays between the first and second implant

(Gordon, Valero, & Papsin, 2007). Moreover, there are preliminary indications of enhanced

perceptual outcomes for children who receive their implants simultaneously rather than

sequentially (Chadha, Papsin, Jiwani, & Gordon, 2011).

There have been attempts to identify stellar CI performers or “stars” and the factors

associated with their success (e.g., Nicholas & Geers, 2006; Pisoni, 2005; Teagle & Eskridge,

2010). Although all CI users in the present study performed well, the few who exhibited error-

free performance are undeniably stars. Each of these children had demographic factors that have

29

been linked previously with successful auditory and language outcomes, including short duration

of bilateral deafness (Gilley, Sharma, & Dorman, 2008), early implantation (Connor et al., 2006;

Svirsky et al., 2007), emphasis on oral communication (Nicholas & Geers, 2006; Svirsky et al.,

2000), same home language as the language of the general community (Gordon, Tanaka, &

Papsin, 2005), and supportive parents with high levels of education and motivation (see Teagle

& Eskridge, 2010, for a review).

Developmental outcomes of CI users are highly variable (Nicholas & Geers, 2006;

Peterson et al, 2010) and, on average, below those of same-age peers. It is useful, however, to

study a small but privileged sample, like that in the present study, as a window on the potential

of child CI users and as a guide to their habilitation.

30

Study 2: Children with Bilateral Cochlear Implants Identify Emotion in Speech and Music

Abstract

The present study examined the ability of prelingually deaf children with bilateral implants to

identify emotion in speech and music. In Experiment 1 child implant users indicated whether

linguistically neutral utterances spoken in a child-directed manner sounded “happy” or “sad”.

Their performance levels were high but significantly lower than that of children with normal

hearing. Several child implant users had error-free performance, which challenges prevailing

views about their inability to perceive emotion in speech. In Experiment 2 children with bilateral

implants classified short piano excerpts as “happy” or “sad”. Their performance was well above

chance levels but significantly less accurate than that of normally hearing children.

31

Introduction

Cochlear implants (CIs) have made auditory-verbal communication accessible to many

children with profound hearing loss (e.g., Geers, 2003). The benefits for congenitally deaf

children seem greatest when they receive their implants early, say by 1-3 years of age (Geers,

2004; Nicholas & Geers, 2006; Waltzman & Cohen, 1998). Such early access to language

facilitates social and emotional development. For example, deaf children with good language

skills (oral or sign language) are reported to have fewer psychosocial difficulties than their peers

with more limited language skills (Dammeyer, 2010). For child CI users with hearing parents,

exposure to a spoken language has been linked to greater social well-being (Percy-Smith et al.,

2008), perhaps because few hearing parents are sufficiently proficient in sign language to enable

optimal parent-child communication.

A critical aspect of interpersonal communication in general and of parent-child

communication in particular is the exchange of affective messages, both verbal and non-verbal.

In addition to universal means of communicating affect visually by way of facial expression,

posture, and movement, there are auditory means of communicating affect that are equally

universal, including vocal modulation and music (Bachorowski, 1999; Scherer, 2003; Balkwill &

Thompson, 1999; Zentner, Grandjean, & Scherer, 2008). The ease with which children with CIs

identify the emotional valence of vocal, non-verbal sounds has implications for their emotional

well-being, as assessed by self-report (Schorr, Roth, & Fox, 2009). There is relatively little

research, however, on the ability of CI users to identify the intended emotion in speech and

music.

Portrayals of emotion in speech and music are guided both by universal aspects of

emotional expression and by socio-cultural conventions (Bachorowski, 1999; Balkwill &

Thompson, 1999; Scherer, Banse, & Wallbot, 2001). Well before children acquire the rudiments

32

of language, they are exposed to some of these modes of expression. For example, mothers

convey emotion in their speech to pre-verbal infants by means of exaggerated prosody (Fernald,

1991; Papoušek, 1992) and heightened affect (Trainor, Austin, & Desjardins, 2000), which

confer a musical flavor to such speech (Fernald, 1989; Trainor, Clark, Huntley, & Adams, 1997).

From the early months, infants prefer infant-directed speech to adult-directed speech, which is

more neutral in emotional tone (Cooper & Aslin, 1990; Fernald, 1985; Singh, Morgan, & Best,

2002). Mothers across cultures also sing expressively to their infants (Trehub, Trainor, & Unyk,

1993), sometimes for playful purposes and sometimes for soothing purposes (Trehub & Trainor,

1998). Infants prefer infant-directed singing over informal but non-infant-directed singing

(Trainor, 1996), even in the newborn period (Masataka, 1999), which suggests that such

expressive singing may have intrinsic appeal.

Congenitally deaf infants do not have access to these affective vocalizations in the early

months of life because, even with early diagnosis, they are unlikely to receive implants much

before 12 months of age (Holt & Svirsky, 2008). Because of delays in receiving expressive vocal

input, one would expect delays in child CI user’s ability to interpret and transmit vocal affect.

Even in the post-implant period, the degraded pitch and spectral cues provided by implants

(Geurts & Wouters, 2001; Loizou, 1998) interfere with the processing of vocal emotion.

Variations in fundamental frequency (F0), or pitch, contribute to the differentiation of emotions

in speech and music, although amplitude, tempo (rate), and rhythm are also important

(Bachorowski, 1999, Laukka, Juslin & Bresin, 2005; Scherer, 1986, 2003). Amplitude and

timing variations are accessible to CI users (Loizou, 1998), but for normal-hearing (NH)

individuals, these cues are typically used in conjunction with pitch variations to differentiate

vocal emotions (Bachorowski, 1999; Scherer, 2003). The available literature indicates that CI

users have considerable difficulty identifying emotion in speech (Luo, Fu & Galvin, 2007; Most

33

& Aviner, 2009; Hopyan-Misakyan et al., 2009). For example, prelingually deaf children 7-13

years of age with unilateral implants are as accurate at identifying facial expressions (happy, sad,

angry, fearful) as their hearing peers, but the CI users perform at chance levels on affective

speech prosody that poses little difficulty for NH children (Hopyan-Misakyan et al., 2009).

Variations in pitch, amplitude, and timing characterize music as well as speech (Kraus,

Skoe, Parberry-Clark & Ashley, 2009). Music and speech also share some cues to specific

emotions (Juslin & Laukka, 2003). The adverse consequences of degraded pitch and spectral

information for music processing have been well documented (see McDermott, 2004, and Kraus

et al., 2009 for reviews). This research has focused largely on CI users’ difficulty differentiating

melodies (Galvin, Fu, & Nogaki, 2007; Vongpaisal, Trehub, & Schellenberg, 2006, 2009) and

identifying familiar music on the basis of pitch cues alone (Hsiao, 2008; Kong, Cruz, Jones, &

Zeng, 2004; Nimmons, 2007; Stordahl, 2002). Not surprisingly, this difficulty makes music

unpalatable to many postlingually deafened CI users (Gfeller et al., 2000; Lassaletta et al., 2007;

Leal et al., 2003), but congenitally or prelingually deaf children typically enjoy music listening

and music making (Gfeller et al., 2000; Nakata, Trehub, Mitani, & Kanda, 2006; Vongpaisal et

al., 2006). To date, there has been only one study of emotion identification in music by CI users

(Hopyan, Gordon, & Papsin, 2011). Hopyan et al. (2011) found that prelingually deaf children 7-

13 years of age with unilateral implants distinguished happy from sad musical excerpts but their

accuracy was significantly lower than that of same-age NH children.

The goal of the present investigation was to ascertain whether bilateral CI users 4-7 years

of age could identify happiness and sadness in speech and music. Previous research with

unilateral CI users 7-13 years of age indicated that they were unable to differentiate the emotions

of happiness, sadness, anger, and fear expressed prosodically in speech (Hopyan-Misaykan et al.,

34

2009), but they could distinguish happy from sad intentions in instrumental (piano) music

(Hopyan et al., 2011). Children’s judgments of emotion in speech were examined in Experiment

1, and their judgments of emotion in music were tested in Experiment 2.

Experiment 1

In typical studies of emotion identification in speech among NH listeners, participants

hear utterances with semantically neutral content spoken in a manner that portrays specific

emotions. They respond by selecting one of several emotions from a closed set (see Scherer,

2003, for a review). Luo et al. (2007) used such a task to present adult CI users with men’s and

women’s utterances that conveyed happiness, sadness, anger, anxiety, and emotional neutrality.

Amplitude cues were preserved in some cases and normalized in others. Even with amplitude

cues preserved, CI users identified less than half of the emotions, performing substantially below

the levels attained by NH adults. Most and Aviner (2009) compared children and adolescents

with CIs (10-17 years of age) with same-age NH individuals and with deaf hearing-aid users on

their identification of happiness, sadness, fear, anger, surprise, and disgust in auditory, visual,

and audio-visual contexts. The performance of both deaf groups was significantly worse than

that of hearing individuals in the auditory context. Although hearing individuals performed better

in audio-visual than in visual contexts, the addition of auditory cues provided no advantage for

CI and hearing-aid users. In other words, auditory cues resolved the ambiguity of some facial

expressions for hearing listeners but not for deaf listeners. CI users’ difficulty differentiating the

six vocal emotions is in line with 7- to 13-year-olds’ difficulty differentiating happy, sad, angry,

and fearful vocal emotions (Hopyan-Misakyan et al., 2009). The conclusion of the

aforementioned studies was that the pitch and spectral cues provided by CIs are insufficient for

the differentiation of vocal emotions.

35

Unquestionably, NH individuals outperform CI users on the identification of vocal

emotion, but even their performance is far from perfect. In general, NH adults identify 60% to

70% of the target emotions in tasks involving multiple emotions, with some emotions identified

more readily than others (Luo et al., 2007; Scherer, 2003). Moreover, NH individuals perform

better on some voices than on others, and their performance may be affected by talker familiarity

and other talker-specific factors (see Bachorowski, 1999, for a review). Luo et al. (2007) found

that CI users often confused happiness with anger or fear, and sadness with neutrality. Although

NH listeners were much more accurate than CI listeners, both had similar confusion patterns.

Furthermore, the removal of amplitude cues impaired the performance of NH as well as CI

listeners. Most and Aviner (2009) found similar confusion patterns in NH listeners and their

counterparts with hearing loss. Happiness was identified the most accurately, followed by anger

and disgust. It is clear, then, that vocal emotions are more difficult to discern than emotional

facial expressions, even for listeners with access to the full range of pitch and spectral cues.

Acoustic cues may be insufficient for accurate identification of discrete emotional

categories in adult speech (e.g., Bachorowski, 1999). Rather, such cues may be more useful for

indicating the talker’s level of nonspecific arousal (Bachorowski, 1999; Bänziger & Scherer,

2005). Indeed, NH listeners tend to confuse emotions associated with similar arousal levels. For

example, they confuse happiness and anger, which share high arousal levels and the acoustic

cues of high pitch, pitch variability, rapid speaking rate, and amplitude variability. They also

confuse neutrality and sadness, which share low arousal levels and the acoustic cues of low

pitch, slow speaking rate, and reduced pitch and amplitude variability. Perhaps CI listeners

would have more success in differentiating emotions if the emotions in question had contrastive

arousal levels, such as happiness and sadness. They might also be more successful with samples

of speech delivered in a child-directed rather than an adult-directed manner. Aside from the fact

36

that the child-directed manner of speech would be more familiar as well as more engaging to

young children, the emotional intentions are more transparent in child-directed than in adult-

directed speech both within and across cultures (Bryant & Barrett, 2007; Fernald, 1989).

In the present experiment, young bilateral CI users were required to identify child-

directed speech samples as happy or sad. CI users 5-7 years of age and a control group of NH

children listened to natural, expressive utterances produced by a man and a woman. The man’s

speech could offer potential processing advantages for CI users because of its lower pitch range

(Chattarjee & Peng, 2007; Vongpaisal et al., 2010). It is possible, however, that the woman

might express affect more distinctively than the man (Luo et al., 2007). Because of the use of a

child-directed speaking style and emotion categories that contrasted in arousal, young CI users in

the present experiment were expected to differentiate emotions more effectively than older CI

users in previous studies (Luo et al., 2007, Most & Aviner, 2009, Hopyan-Misakyan et al.,

2009). Nevertheless, because of child CI users’ diminished access to pitch and spectral cues, they

were expected to perform poorly compared to their NH peers.

Method

Participants. The participants included 14 bilateral CI users (5 girls and 9 boys, M = 5.8

years, SD = 0.6; range: 5.1-7.0) from middle-class families who were recruited from a large

metropolitan area (see Table 1). Four children had progressive hearing loss from birth and 10

were congenitally or prelingually deaf. As can be seen in Table 1, all participants used Nucleus

implants with Contour or Freedom processors programmed with the Advanced Combinational

Encoder (ACE) processing strategy, and each had a minimum of 2.5 years of implant experience

(M = 4.3 years, SD = 0.8, range: 2.8−5.3).

37

Table 6. CI participants: Background information. Participant codes are preserved across studies.

* progressive hearing loss from birth

With the exception of children with progressive hearing loss, their first implant was

activated between 9 and 27 months of age. When tested with their implants, absolute thresholds

Participant Gender Age at test (years)

E1; E2

Age at 1stand 2ndCI activation

Etiology

CI-1*

CI-2

CI-3

CI-4

CI-5*

CI-6

CI-7

CI-8

CI-9

CI-10*

CI-11

CI-12

CI-13

CI-14*

CI-15

CI-17

M

M

M

F

F

M

M

M

F

M

F

F

F

M

M

F

6.4; 6.4

5.2; 5.5

5.4; 5.4

6.3; 6.3

7.0; 6.9

5.3; 5.3

5.1; 5.8

6.1; 6.3

- ; 5.8

6.4; 6.4

5.1; -

6.3; 6.9

- ; 4.1

5.5; 5.5

6.4; -

5.1; 5.1

3.4; 3.4

0.8; 1.7

1.1; 1.1

1.0; 3.6

2.5; 4.0

1.0; 4.6

0.9; 1.8

0.8; 1.5

1.7; 1.7

3.1; 6.3

1.1; 1.1

1.0; 3.5

1.1; 1.1

2.7; 2.7

1.3; 2.3

1.1; 3.4

Genetic

Genetic

Genetic

Genetic

Unknown

Genetic

Genetic

Genetic

Unknown

Mondini dysplasia

Genetic

Unknown

Unknown

Unknown

Genetic

Unknown

38

for tones within the speech range were within normal limits (10-30 dB HL). All children with

CIs participated in auditory-verbal therapy for at least 2 years after implantation. They also

communicated exclusively by auditory-oral means and were in age-appropriate school classes

with their NH peers. A comparison sample of NH children consisted of 18 preschoolers (12 girls

and 6 boys, M = 5.4 years, SD = 0.5, range: 4.8-6.2) from middle-class families who were

recruited from the local community. Their mean age was slightly younger than the mean for the

CI group, t(30) = 2.05, p = .049. No NH child had a personal or family history of hearing

problems, and all children were free of colds on the day of testing.

Apparatus and stimuli. A man and a woman were instructed to produce “happy” and

“sad” versions of the following three sentences: The lamp is on the table, Flowers grow in the

garden, and A chair has four legs. The “actors” were asked to speak naturally but in an

expressive manner, as if talking to children. F0, amplitude range, and duration of all utterances

from the two talkers were calculated using PRAAT software (Boersma & Weenink, 2005). Table

7 provides information about mean F0, F0 range, amplitude, and duration for happy and sad

utterances produced by the man and woman. F0 contours of the stimuli produced by the two

talkers are illustrated in Figure 6. Stimuli largely conformed to cultural conventions for the vocal

portrayal of happiness and sadness, as described by Johnstone and Scherer (2000). Specifically,

the man’s and woman’s sad utterances had a smaller F0 range and were lower in overall

amplitude than their happy utterances. Although the man’s sad utterances were longer in duration

than his happy utterances, as expected, the woman’s happy utterances were longer in duration

than her sad utterances (Table 7). In other words, duration was not a reliable cue to the target

emotion.

The stimuli were recorded in a 3 m x 2.5 m double-walled, sound-attenuating chamber

(Industrial Acoustics Corporation) with a microphone (Sony F-V30T) connected directly to a

39

Windows XP computer workstation. High-quality digital sound files (44.1 kHz, 16-bit, mono)

were created using a digital audio editor (Sound Forge 6.0). Four colored digital photographs

(headshots) of the speakers were taken against a white background. Two were depictions of the

man with a facial expression consistent with happiness in one and an expression consistent with

sadness in the other. The two others were comparable depictions of the same expressions by the

woman.

Testing took place in a double-walled sound-attenuating booth, either at the university

laboratory (Industrial Acoustics Corporation, 3 m X 2.5 m) or at a local children’s hospital (4.3

m X 2.7 m). A computer workstation and amplifier (Harman/Kardon HK3380) located outside

the university booth were connected with a 17-in touch-screen monitor (Elo LCD

TouchSystems) and two high-quality loudspeakers (Electro-Medical Instrument Co.) inside the

booth. At the hospital venue, a GSI 61 two-channel clinical audiometer (Grason-Stadler

Instruments) replaced the amplifier. The loudspeakers were placed at 45º azimuth to the

participant with the touch-screen monitor at the midpoint. An interactive computer program

(customized for Windows XP) presented stimuli and recorded response selections when the

participant touched the screen. The experimenter could record the participants’ responses using a

portable keyboard connected to the workstation when young children preferred to point to the

selection rather than touching the screen or if their touch was not firm enough to activate the

screen. All stimuli were played at a comfortable sound level of approximately 65 dB SPL.

Procedure. Participants were tested individually. At their request, a parent was present in

the booth with some CI participants. Parents were permitted to assist with explanations when the

task was initially described to the child and during practice trials, but they did not interact with

the child in any way once the test phase began. The experimenter explained to the children that

they were going to hear a man or a lady talk and that they should decide whether the talker

40

sounded happy or sad. The participants were explicitly instructed to listen for how the talker

sounds. Children listened to four practice trials to familiarize them with the task. During practice

trials, the experimenter answered any of children’s questions about the task, but she provided no

feedback about the accuracy of their responses. Children were also told that they could listen to

any stimulus more than once, in fact, as often as they liked.

Table 7. Happy- and sad-sounding speech stimuli.

Emotion/Gender Sentence Mean F0(Hz)

F0 range (Hz)

Mean Amplitude (dB)

Duration(s)

Happy/Female Chair 313.84 335.71 72.55 1.77

Happy/Female Flowers 289.18 349.38 72.86 1.92

Happy/Female Lamp 285.15 330.07 72.8 1.81

Happy/Male Chair 132.35 108.31 71.95 2

Happy/Male Flowers 156.21 136.03 72.86 1.64

Happy/Male Lamp 134.87 107.01 72.48 1.44

Sad/Female Chair 190.19 114.62 70.26 1.66

Sad/Female Flowers 176.23 83.64 69.58 1.57

Sad/Female Lamp 186.28 57.37 71.04 1.41

Sad/Male Chair 94.15 36.12 69.22 1.95

Sad/Male Flowers 96.66 45.42 70.34 1.91

Sad/Male Lamp 102.13 43.06 71.08 1.73

41

Figure 6. Prosodic contours of “The chair has four legs” produced in a happy and a sad manner

by male and female talkers (Experiment 1).

Auditory stimuli were presented in two blocks, one for the male talker and one for the

female talker. The order of blocks was counterbalanced across participants. Stimuli within each

block consisted of 3 sentences in both emotions, with each sentence repeated 3 times and order

randomized for a total of 36 trials in both blocks. As noted, the participant had the option of

hearing a stimulus repeated before making a decision. At the beginning of each trial, colored

photographs of a man or woman (depending on the trial block) with a happy and sad facial

42

expression were presented on the monitor. Before beginning each block of trials, the

experimenter verified that the child could identify the facial expression in each photograph by

asking “Is the man/lady in this picture happy or sad?” The spatial arrangement of photographs

(left/right) was identical for all participants across all trials. After listening to each stimulus,

children responded by touching one of the photographs. No feedback was provided, other than

general encouragement and praise that was offered periodically to maintain the children’s

enthusiasm and cooperation.


Preliminary analyses revealed that performance was not distributed normally for either

the man talker, p = .027, or the woman talker, p = .020, or for performance averaged across

talkers, p = .034 (Kolmogorov-Smirnoff tests). Although overall performance in the CI group

was more variable than in the NH group, this difference fell short of statistical significance,

F(13, 17) = 3.37, p = .076 (Levene’s test). Nevertheless, subsequent analyses used nonparametric

tests.

We first examined the number of children who performed significantly better than

chance. With a binomial test (normal approximation, correcting for continuity, p < .05, one-

tailed), overall performance required 24 or more correct responses on the 36 trials to exceed

chance levels. For child CI users, 12 of 14 surpassed chance, and for NH children, 17 of 18 did

so. In short, performance was remarkably good in both groups and consistent across individuals.

A direct comparison of groups on overall performance confirmed that the NH children

performed significantly better than the child CI users, p = .047 (Mann-Whitney U Test). Poorer

performance for child CI users than for NH children was evident for the woman talker, p = .029,

but not for the man, p = .107 (Mann-Whitney U Tests). Although NH children showed

significant improvement from the first to the second block of trials, p = .013, comparable

43

improvement was not evident among the CI children, p = .135 (Wilcoxon Signed Rank Tests).

Performance of both groups as a function of block order is illustrated in Figure 7.

Figure 7. Performance of child CI users and NH children on happy and sad speech as a function

of block order (Experiment 1). Error bars represent standard errors.

Performance of the CI children collapsed across talkers was associated positively with

duration of implant use, r = .60, N = 14, p = .012 (one-tailed), which is an impressive finding in

view of the small sample. Figure 8 depicts individual scores of children with CIs. Two of them

exhibited error-free performance across talkers. An additional two had error-free performance for

the woman talker only, and two others had error-free performance for the man talker only.

Interestingly, only six NH children achieved error-free performance on both trial blocks,

although their performance in general was more consistent than that of children with CIs.

CI users mistook sad-sounding speech as happy (76% of all errors) more often than they

mistook happy-sounding speech as sad. This was especially noticeable for the poorest

44

performing CI users. This pattern of confusions was also reported by Luo et al. (2007) and Most

and Aviner (2009), which may reflect a response bias for positive emotions rather than greater

ease of identifying happy-sounding speech. In any case, young CI users performed successfully

on a two-alternative forced-choice task that required them to differentiate happy from sad

speech, unlike older CI users who failed to identify happy and sad utterances in the context of a

four-alternative forced-choice task (Hopyan-Misakyan et al., 2009).

Figure 8. Performance of individual CI users on happy and sad speech (Experiment 1)

ordered by scores on Block 1 from best to worst. Original participant codes are preserved.

As noted, natural speaking rate and amplitude variations were preserved in the present

stimuli and were available as potential cues. Differences in overall amplitude and amplitude

variation were consistent across talkers, but differences in speaking rate across emotion

categories were inconsistent across speakers (see Table 7). As a result, speaking rate, which

usually distinguishes happy- from sad-sounding speech, was an unreliable cue in the present

45

experiment. Differences in F0 range between happy and sad stimuli were large and consistent

across talkers (Table 7). Differences in F0 contour also provided reliable cues to happy and sad

speech, but they may have been inaccessible to CI users (Geurts and Wouters, 2001; Loizou,

1998). When contour contrasts are substantial, however, as in statements versus questions, CI

users perform above chance levels although well below the levels attained by NH listeners

(Meister et al., 2009; Most and Peled, 2007; Peng et al., 2008). It is possible that the young CI

users in the present experiment used a combination of acoustic cues to differentiate the happy

from sad utterances although amplitude cues would have been the most prominent of these. In

the absence of feedback, however, it is clear that CI users had reasonable representations of

happy and sad vocal qualities on the basis of their everyday experience. Their ability to

differentiate music excerpts expressing happy and sad emotions in conventional ways was

assessed in Experiment 2.

Experiment 2

There is increasing interest in the non-musical as well as musical consequences of

children’s short-and long-term involvement in musical activities (Kirschner & Tomasello, 2010;

Schellenberg, 2006). For example, listening to pleasant music has consequences for children’s

prosocial behavior (Kirschner & Tomasello, 2010) and their performance on a variety of tasks

(Schellenberg & Hallam, 2006; Schellenberg, Nakata, Hunter, & Tamoto, 2007). Moreover,

music lessons have been linked to long-term cognitive outcomes (Schellenberg, 2006, 2011).

There is also long-standing interest in adults’ and children’s ability to understand emotion

in music. Typical tasks in this realm require children to link musical excerpts to discrete

emotional categories such as happiness, sadness, anger, and fear. Despite the ubiquity of this

type of task, it seems less appropriate for describing emotional aspects of music than it does for

speech or facial expressions (Trehub, Hannon, & Schachner, 2010). Such labels aptly describe

46

the feelings associated with vocal or facial emotional expressions, but they seem much less

suitable for instrumental musical excerpts. Instead of discerning the emotions of the performer or

composers or the emotional consequences on the listener, which may involve non-specific

arousal or general feelings of pleasure (Salimpoor, Benovoy, Longo, Cooperstock, & Zatorre,

2009), listeners are typically expected to discern the emotional intentions of the performance. To

do so requires familiarity with culturally typical uses of emotional labels in relation to music.

Adults use some combination of tempo, loudness, pitch level, mode (major or minor),

and consonance or dissonance to judge the emotional intentions of music excerpts (see Hunter &

Schellenberg, 2010, for a review), but tempo and mode have received the most attention. In

general, Western adults judge music in the major mode and with rapid tempo as happy and music

in the minor mode and with slow tempo as sad (Peretz, Gagnon, & Bouchard, 1998). Unlike

tempo, which often differentiates happy- from sad-sounding music across cultures, mode does

not (e.g., Balkwill & Thompson, 1999). Despite the cross-cultural importance of tempo to

musical emotions, it is still necessary to learn the musical conventions involving tempo as well

as mode. With excerpts from the classical repertoire, 4-year-old children fail to link musical

mode, tempo, or their combination with happiness and sadness, 5-year-olds use tempo to

differentiate happy from sad musical excerpts, and 6- to 8-year-olds link both cues, separately

and in combination, to happiness and sadness (Dalla Bella, Peretz, Rousseau & Gosselin, 2001).

When the stimuli consist of children’s songs rather than classical music, 4-year-olds seem to use

tempo to distinguish happy from sad excerpts (Mote, 2011). In general, however, listeners of all

ages find high-arousal emotions (e.g., happiness, anger) easier to identify in music compared to

low-arousal emotions (e.g., sadness or peacefulness; Hunter, Schellenberg, & Stalinski, 2011),

but the difference is particularly strong among young children.

47

Given the musical pitch processing difficulties of CI users (see McDermott, 2004, for

review), it is not surprising that they are unable to discriminate major from minor melodic

patterns (Vongpaisal et al., 2006). Although CI users can resolve timing differences and they use

such differences in music-recognition tasks (Hsiao, 2008; Kong et al., 2004; Stordhal, 2002), it is

unclear when child CI users first link those and other acoustic cues with musical emotions.

Hopyan et al. (2011) found that child CI users 7-13 years of age reliably differentiated happy

from sad musical pieces when the stimuli were synthesized piano versions of classical pieces that

have been used in previous research on musical emotions (Dalla Bella et al., 2001; Hunter et al.,

2011; Peretz et al., 1998; Schellenberg, Peretz, & Vieillard, 2008; Vieillard et al., 2008). Not

surprisingly, however, the CI users performed more poorly than same-age NH children.

Although the stimuli had tempo, mode, and other cues to emotion, the authors speculated that CI

users based their judgments primarily on tempo.

According to Dalla Bella et al. (2001), children younger than 5 are unable to use tempo

and those younger than 6 are unable to use mode to differentiate conventionally happy from sad

musical excerpts from the classical repertoire. In the present experiment, selected piano excerpts

from Vieillard et al. (2008), which were also from the classical repertoire, were used to evaluate

the ability of 4- to 7-year-old child CI users to distinguish happy- from sad-sounding music. On

average, these children had roughly 4 years of implant experience, compared to the 7 years of

average implant experience of children in the Hopyan et al. (2011) study. Clinically, the hearing

age of the present sample would be considered 4 years, corresponding to their years of auditory

input. In line with the findings by Dalla Bella et al. (2001), child CI users with less 5 years of

auditory experience might be unable to associate classical music excerpts with happy and sad

emotions. Aside from their reduced quantity of auditory input relative to same-age peers, child

CI users would also have a considerably reduced quality of musical input. In view of their

48

limited auditory and musical experience, it was important to ascertain whether they would be

able to differentiate emotions from samples of classical music. Unlike the happy and sad speech

samples in Experiment 1, which differed in amplitude variability, the tones in all musical

excerpts were equivalent in amplitude, eliminating an important cue to happiness and sadness.

Mode cues were available in the present stimuli, but they were expected to be potentially useful

only for the control sample of NH listeners.

Method

Participants. The participants were 14 CI users (6 girls and 8 boys), 12 of whom took

part in Experiment 1. The average age of participants was 5.8 years (SD = 0.8, range: 4.1-6.9),

and the average duration of implant experience was 4.2 years (SD = 1.0, range: 2.8-5.9). The two

additional participants, both girls, were congenitally deaf and satisfied the criteria for participants

in Experiment 1. The control sample consisted of the same 18 NH children who were tested in

Experiment 1.

Apparatus and Stimuli. The musical stimuli consisted of 10 short (approximately 10-s)

synthesized piano excerpts, 5 happy and 5 sad, from the corpus of Vieillard et al. (2008), which

includes excerpts for both emotions. The excerpts, which were from the Western classical

repertoire, had the original pitch and duration values (corresponding to the musical score) but all

tones were of equal amplitude.

These particular excerpts were selected from a larger sample because their emotional

status is identified most reliably by adults (Hunter et al., 2011). The happy-sounding excerpts

were in the major mode and had a rapid tempo (mean of 137 beats per minute), in contrast to the

sad-sounding excerpts, which were in the minor mode and had a slow tempo (mean of 46 beats

per minute). Visual depictions of happiness and sadness consisted of close-ups of a frame from

49

each of two animated feature films by Hirao Miyazaki, “My Neighbour Totoro” (1988) and

“Spirited Away” (2001). The apparatus was identical to that of Experiment 1.

Procedure. The procedure was similar to that of Experiment 1. NH children heard 5

happy- and 5 sad-sounding excerpts presented randomly for a total of 10 trials. Children with CIs

heard two blocks of the 10 trials, with the order of stimuli randomized within both blocks. There

were no practice trials and no feedback for correct or incorrect responses.


On the first block of trials, NH children performed near ceiling (97.8% correct), and they

were much less variable than the child CI users, F (13, 17) = 22.84, p < .001 (Levene’s Test).

Moreover, performance was not distributed normally, p = .005 (Kolmogorov-Smirnov test).

Thus, nonparametric analyses were used, as in Experiment 1. We initially examined how many

children exceeded chance levels, which required scores of 9 or 10 correct on the 10 trials in each

block (binomial test, p < .05, one-tailed). On the first block, 5 of 14 CI children and 17 of 18 NH

children exceeded chance. On the second block of trials, 10 of 14 CI children surpassed chance.

For the first block (i.e., the only block completed by both groups), the difference between groups

in the proportion of children exceeding chance was significant, χ2(1, N = 32) = 12.64, p < .001.

A non-parametric comparison of actual scores, contrasting the NH children with CI users (first

block), also confirmed an advantage for the NH group, p < .001 (Mann-Whitney U Test). For

child CI users, improvement across trial blocks was not significant, p = .283 (Wilcoxon Signed

Rank Test).

Individual differences among the CI children, collapsed across blocks, are illustrated in

Figure 10. In general, the performance of 4- to 7-year-old CI users (first block: 76.4% correct;

second block: 83.6%) was comparable to that reported by Hopyan et al. (2010) for 7- to 13-year-

50

old CI users (78% correct), who were tested on a similar task with a larger sample of similar

music excerpts.

Child CI users readily distinguished happy- from sad-sounding music although not with

the extraordinary accuracy shown by their hearing peers, who could capitalize on pitch structure

as well as tempo cues. Despite their limited auditory and musical exposure, young CI users’

ability to identify happy and sad emotions in samples of instrumental music devoid of amplitude

cues implies that (1) they would fare far better with real-world samples that feature amplitude as

well as tempo cues, including children’s music (Mote, 2011), and (2) cognitive or task factors

must underlie the inability of some 4-year-old hearing children to identify musical emotions

(Dalla Bella et al., 2001).

For child CI users, the association between duration of implant use and performance

collapsed across blocks was significant, r =. 49, N = 14, p = .038 (one-tailed), as it was in

Experiment 1, which is impressive once again in light of the small sample size. It is likely that

excellent cognitive skills in combination with auditory experience enabled some children to learn

which emotional labels are linked to which acoustic cues in speech and music. We also

correlated performance on the first block of trials with performance from Experiment 1

separately for the 18 NH children and the 12 CI users who participated in both experiments. For

the NH group, the correlation was not significant, p = .367, presumably because of high levels of

performance and little variation in either experiment. For the CI group, however, there was a

positive association (Figure 11), r = .51, N = 12, p = .043 (one-tailed). Although there are some

common cues to emotion in speech and music, such as pitch, tempo, and amplitude (Juslin &

Laukka, 2003), pitch cues to happiness and sadness were unlikely to have been useful for child

CI users, tempo cues were inconsistent in the speech samples of Experiment 1, and amplitude

cues were unavailable in the music excerpts.

51

CI-8, the only CI user who performed perfectly in the speech and music tasks, had the

typical profile of so-called “star” performers, which includes genetic, non-syndromic congenital

deafness (Kawasaki et al., 2006; Wu et al., 2008), well-educated and highly involved parents

(Geers and Brenner, 2003; Teagle and Eskridge, 2010), and, at 6.3 years of age, over 5 years of

implant experience. At the time of testing, moreover, he had been taking piano lessons for 2

years.

Figure 9. Performance of child CI users (Block 1) and NH children on happy and sad music

(Experiment 2). Error bars represent standard errors.

52

Figure 10. Performance of individual child CI users on happy and sad music (Experiment 2).

Performance is averaged across the 2 blocks (20 trials) and ordered from best to worst. Original

participant codes are preserved.

Figure 11. Performance on emotion identification in speech and music for the 12 CI users who

participated in both tasks. Performance on each task is averaged across the 2 blocks (36 and 20

trials, respectively).

53

Figure 12. Accuracy of emotion identification in speech (Experiment 1) as a function of years

of implant use. Performance is averaged across the 2 blocks (36 trials).

Figure 13. Accuracy of emotion identification in music (Experiment 2) as a function of years of

implant use. Performance is averaged across the 2 blocks (20 trials).

54

General Discussion

In two experiments, 4- to 7-year-old deaf children with bilateral CIs and age-matched NH

children identified happiness and sadness in speech and music in the context of a two-alternative

forced-choice task. CI users performed well above chance levels but significantly below their

hearing peers. The present findings with speech stimuli are in marked contrast to the very poor

performance of CI users in previous studies of emotion identification in speech (Luo et al, 2007;

Most & Aviner, 2009; Hopyan-Misaykan et al., 2009). Note, however, that listeners in those

studies were required to identify vocal emotions from four or more alternatives, some of which

had overlapping acoustic cues arising from similar arousal levels (e.g., happiness and anger).

Variations in pitch or intonation, which contribute to emotion identification, are more likely to

pose difficulty for CI users, but intonation differences are often accompanied by differences in

speaking rate and amplitude, which would be accessible to CI users. Although hearing listeners

place considerable reliance on pitch cues to emotion, CI listeners are likely to make greater use

of alternative cues to emotion in the speech signal. Young CI users’ ability to identify basic

vocal emotions such as happiness or sadness on the basis of incidental exposure suggests that

training in this realm could lead to enhanced perception and production of emotional prosody. In

light of the consequences of vocal emotion identification for socialization and well-being in

young CI users (Schorr, 2009), the addition of such training to the current habilitation agenda for

child CI users seems warranted.

The ability of 4- to 7-year-old CI listeners’ to differentiate happy from sad classical

music excerpts extends the findings of Hopyan et al. (2011) to younger children with lesser

implant experience and adds to the growing literature on the accessibility of music to

prelingually deaf implant users. Although the present findings indicate that young CI users can

discern the emotional intentions expressed in musical excerpts, they shed no light on the

55

emotional consequences of music for CI users. There are indications that young CI users enjoy

music (Mitani et al., 2007; Stordahl, 2002; Vongpaisal et al., 2006), but it remains to be

determined whether they experience changes in arousal and mood comparable to those

experienced by individuals with normal hearing (e.g., Balkwill & Thompson, 1999; Husain,

Thompson, & Schellenberg, 2002). There are reports that music training results in improved

speech perception (Moreno et al., 2009), executive function (Kraus & Chandrasekaran, 2010;

Degé, Kubicek, & Schwarzer, 2011), and general cognitive functioning (Schellenberg, 2004) in

NH children. Music training may have even greater benefits for children with CIs. This

possibility awaits further research.

56

Study 3: Pitch and Timing Cues in Child Implant Users’ Recognition of Familiar Melodies

Abstract

The goal of the present study was to ascertain whether prelingually deaf children with bilateral

cochlear implants and a control sample of children with normal hearing could use pitch or timing

cues exclusively or in combination to identify familiar melodies. In the three conditions of

principal interest, children were required to identify the melody from the theme songs of TV

shows that they watched regularly on the basis of musical excerpts that preserved (1) the relative

pitch and timing cues but not the original instrumentation, (2) timing cues only (rhythm and

tempo), and (3) relative pitch cues only (pitch contour and intervals). The performance of child

implant users was well above chance levels and comparable to that of children with normal

hearing, except on the pitch-only condition where they performed at chance levels. This is the

first demonstration that young implant users and normally hearing children can identify familiar

music on the basis of timing cues alone.

57

Introduction

Melodies are defined by relations between their successive pitches (melodic contour and

intervals) and by their temporal organization (meter and rhythm). Adults with normal hearing

(NH) rely primarily on pitch patterns and secondarily on rhythm when identifying songs in an

open-set task (Hébert & Peretz, 1997). The situation is different for listeners with electric rather

than acoustic hearing. Cochlear implants (CIs) were designed to facilitate deaf individuals’

access to speech, which is coded as amplitude variation over time (Loizou, 1998, Smith et al.,

2002). Information about the pitch patterns in speech (i.e., intonation) and music is largely

transmitted by temporal fine structure, which is absent from the input provided by cochlear

prostheses. The result is severe degradation of the pitch and spectral information available to CI

listeners (Gates & Miyamoto, 2003; Geurts & Wouters, 2001; Loizou, 1998; Smith et al., 2002),

with adverse consequences for melodic processing.

Although fundamental frequency, or pitch, variations in speech are relatively large

(Fitzsimmons, Sheahan, & Staunton, 2001) and specific pitch relations are not prescribed, the

perception of intonation is challenging for individuals with CIs (Chatterjee & Peng, 2008;

Meister et al., 2009; Most & Peled, 2007; Peng, Tomblin, & Turner, 2008). By contrast, music

typically moves in small pitch steps and precise pitch relations are prescribed (Vos & Troost,

1989). It is not surprising, then, that melodic processing is even more challenging for CI users

(Drennan & Rubinstein, 2008; Loizou, 1998; McDermott, 2004). Limited pitch resolution cannot

fully account for these difficulties. With isolated or repeating tones and same-different tasks,

some CI users detect pitch changes that are less than a semitone (Vongpaisal, Trehub, &

Schellenberg, 2006), but they are unable to differentiate brief melodies or tone sequences that

differ by one or two semitones (Cooper, Tobey, & Loizou, 2008; Galvin, Fu, & Nogaki, 2007;

58

Vongpaisal et al., 2006). Moreover, their ability to rank one pitch as higher or lower than another

typically requires differences of four or more semitones (Gfeller et al., 2007; Sucher &

McDermott, 2007), which implies that the sensations arising from melodies may be markedly

different for CI and NH listeners, perhaps involving timbre rather than pitch variations

(McDermott, 2004; Moore & Carlyon, 2005).

In contrast to pitch patterning cues in speech and music, timing cues are more readily

available in the input provided by CIs. Child CI users differentiate same-gender talkers on the

basis of subtle timing differences in articulation and global differences in speech rhythm and

speaking rate (Vongpaisal et al., 2010). Adult CI users’ ability to perceive musical tempo and

rhythm is thought to be comparable to that of individuals with normal hearing (Kong et al., 2004;

Cooper et al., 2008; Gfeller & Lansing, 1991; Gfeller et al., 1997). CI users’ recognition of

melodies is considerably poorer than that of NH listeners, not only in the absence of timing cues

(e.g., Nimmons et al., 2007) but also in their presence (Gfeller et al., 2002, 2005; Stordahl, 2002;

Vongpaisal et al., 2006, 2009), even though they derive clear benefit from the preservation of

rhythm cues (Hsiao, 2008; Kong et al., 2004; Stordahl, 2002). In sum, the available evidence

indicates that timing makes a more substantial contribution to music recognition in CI listeners

than it does for NH listeners.

Although timing cues are critical for melody recognition by CI users, it is unclear

whether such cues are sufficient for melody recognition in this population. In addition, relatively

little is understood about the contribution of pitch patterning to CI listeners’ long-term

representations of familiar music because studies comparing melody recognition with and

without timing cues preserve the original pitch patterns in both versions (Kong et al., 2004;

59

Galvin et al., 2007; Nimmons et al., 2007; Hsiao, 2008). In other words, no study compelled CI

users to rely entirely on timing cues, as would be necessary for patterns with unchanging pitch.

Some pitch patterns are meaningful to CI users both in speech and in music. CI users’

modest success in differentiating Cantonese lexical tones (Barry et al., 2002), some of which

contrast in pitch contour, implies that pitch contour processing is possible to some extent.

Moreover, adult CI users benefit from training on contour discrimination (Galvin et al., 2007),

which implies that limitations of the prosthesis for processing pitch patterns may be overstated.

Because of the role of music in the lives of young children (Hallam, 2010; Kirschner &

Tomasello, 2010; Trehub, 2003; Trehub, Hannon, & Schachner, 2010), it is important to

ascertain the cues that child CI users can use for music recognition, with the long-range goal of

enhancing their access to music in everyday contexts. Unfortunately, young CI children and even

NH children make poorer use of available cues than do older children and adults (Stalinski,

Schellenberg, & Trehub, 2008; Vongpaisal et al., 2006).

Our goal in the present study was to evaluate the ability of young CI and NH listeners to

use pitch or timing cues exclusively or in combination to identify familiar melodies. Previous

research demonstrated that child CI users could identify the theme songs of television programs

that they watched regularly (Mitani et al., 2007, Vongpaisal et al., 2009). In some cases, child CI

users could identify the music only when all original cues, instrumental and vocal, were intact

(Mitani et al., 2007). In others, they identified the music more poorly on instrumental and

monophonic flute versions than on the original versions, but their performance was above chance

levels for all versions (Vongpaisal et al., 2009). In the aforementioned studies of TV-song

identification, the original timing cues were preserved and CI children’s performance was

significantly worse than that of NH children. The discrepant performance across the two studies

60

may be attributable, in part, to age and correlated cognitive differences. The CI children who

identified vocal/instrumental versions only (Mitani et al., 2007) averaged 6.5 years of age (range

of 4-8 years), in contrast to an average of 8.4 years (range of 4.7-11.7 years) for those who also

identified instrumental and melody versions (Vongpaisal et al., 2009). To minimize the cognitive

demands on the present CI participants whose average age was 6 years, the current identification

task involved a closed set of two alternatives rather than the three or four alternatives used in the

earlier studies.

Bilateral CI users 5-7 years of age and a control sample of NH listeners listened to theme

songs from familiar television programs in various conditions. Three conditions were of

principal interest. In one, the melody was presented intact, with pitch and temporal patterns

preserved. In a second condition, the original tempo and rhythm were preserved but all pitch

cues were removed by using a percussion instrument with unvarying pitch. In a third condition,

the relative pitch patterns (i.e., melodic contour and intervals) were preserved but timing cues

were removed by having all notes (and inter-onset intervals) of equal duration. In all three

conditions, all notes were of equal amplitude. Isochronous versions of familiar melodies are

sometimes created by replacing long-duration notes with repeated short-duration notes (Kang et

al., 2009; Nimmons et al., 2008), which eliminates rhythmic or grouping cues but preserves

some metrical cues. In the present study, the intact melodies and altered versions had the exact

same number of notes. Finally, to ensure that children in the present study could identify the

theme songs in their original form, even without lyrics, original and instrumental versions like

those in previous research (Mitani et al., 2005; Vongpaisal et al., 2009) were also included.

61

Method

Participants. The participants included eight bilateral CI users (4 girls and 4 boys, M =

6.2 years, SD = 0.7; range: 5.1-7.2) who were recruited from a large metropolitan area (for

background information, see Table 8). One child had progressive hearing loss from birth and

seven were congenitally or prelingually deaf. All participants used Nucleus 24 Contour and/or

Nucleus Freedom Contour Advance implants programmed with the Advanced Combination

Encoder (ACE) processing strategy, and they all had at least 4 years of implant experience (M =

5.0 years; SD = 0.6; range: 4.0−5.9). When tested with their implants, absolute thresholds for

tones indicated access to speech sounds at normal conversational levels (10-30 dB HL). All CI

children participated in auditory-verbal therapy for at least 2 years after implantation. They also

communicated exclusively by auditory-oral means and were in age-appropriate school classes

with their NH peers. Parents of the CI participants provided information about their children's

musical involvement. At the time of testing, participant CI-8 had been taking private piano

lessons for approximately 2 years, and participant CI-2 for approximately 4 months. Participants

CI-11 and CI-12 had no formal musical training, but they were participating in extracurricular

choral activities at their respective schools. The rest of the CI children were not involved in any

extracurricular musical activities, but were a part of the regular school arts program. A

comparison sample consisted of 16 NH children from the community, roughly matched to the CI

participants by hearing age (M = 5.1 years, SD = 0.6, range: 4.3-6.3). No NH child had a

personal or family history of hearing problems, and all were free of colds on the day of testing.

62

Table 8. CI participants: Background information. Participant codes are preserved across studies.

Participant Gender Age at test

(years)

Age at 1stand 2ndCI

activation

Etiology

CI-2

CI-3

CI-4

CI-5*

CI-7

CI-8

CI-11

CI-12

M

M

F

F

M

M

F

F

5.7

5.5

6.8

7.2

5.8

6.3

5.1

6.9

0.8; 1.7

1.1; 1.1

1.0; 3.6

2.5; 4.0

0.9; 1.8

0.8; 1.5

1.1; 1.1

1.0; 3.5

Genetic

Genetic

Genetic

Unknown

Genetic

Genetic

Genetic

Unknown

* progressive hearing loss from birth

Apparatus and Stimuli. Testing took place in a double-walled sound-attenuating booth,

either at a university laboratory or a comparable facility at a major children’s hospital, according

to the convenience of parents. A computer workstation and amplifier (Harman/Kardon HK3380)

outside the university booth were connected with a 17-inch touch-screen monitor (Elo LCD

Touch Systems) and two high-quality loudspeakers (Electro-Medical Instrument Co.) inside the

booth. At the hospital, a GSI 61 two-channel clinical audiometer (Grason-Stadler Instruments)

replaced the amplifier. In both locations, the loudspeakers were placed at 45 degrees azimuth to

63

the participant, with the touch-screen monitor directly in front of the participant. An interactive

computer program (customized for Windows XP) presented stimuli and recorded response

selections when the participant touched the screen. A portable keyboard was available to the

experimenter in case young children preferred to make their selections by pointing to a picture

rather than touching the screen. All stimuli were played at a comfortable sound level of

approximately 65 dB SPL.

Table 9. Key, pitch range and tempo of melodies extracted from the TV-show theme songs.

* songs not chosen by CI children

The 40 stimuli consisted of 8 musical excerpts, with each excerpt presented in 5 different

versions: original, instrumental, melodic, timing-only, and pitch-only. The originals were taken

directly from theme songs played at the beginning of popular children’s TV programs (Table 9)

Show/Song Key Pitch range (semitones, Hz) Tempo

(BPM)

Dora the Explorer C major C5-A5 (523-880), instrumental

and melodic

C4-A4 (262-440), pitch-only

107

Diego E major D#4-C#5 (311-554) 118

Backyardigans D major D4-D5 (293-587) 95

Franklin D major F#4-F#5 (369-738) 94

Hannah Montana Db major Bb3-Bb4 (233-466) 124

Suitelife on Deck C major C4-A4 (261-440) 108

Blues’ Clues* E major C#4-B4 (277-493) 107

Wiggles* A major D#4-C5 (311-523) 95

64

by re-recording the audio track as digital sound files. Instrumental and melodic versions were

created by a professional musician in a recording studio. In the instrumental versions, the

original vocal portions (i.e., the sung melody with lyrics) were replaced by a synthesized flute,

and the accompaniment duplicated the timbre and timing of the original recordings, as in Mitani

et al. (2007), Nakata et al. (2005), and Vongpaisal et al. (2009). The melodic versions consisted

of the same flute melodies in the original tempo and key but without instrumental

accompaniment (Table 9). The synthesized flute melody in the instrumental and melodic

versions of one song (from Dora the Explorer) was approximately one octave higher than the

other melodies. Timing-only and pitch-only versions were created with Finale 2009 software

(MakeMusic Inc., 2008) and converted to digital audio files.

Examples of melodic, timing-only, and pitch-only versions are depicted in Figure 14. The

timing-only versions, rendered in Wood Blocks timbre (selected from the Musical Instrument

Digital Interface, or MIDI, Instruments list), preserved the tempo and rhythmic structure of the

original melodies without reference to pitch. A meter track—rendered in a different timbre (Bass

Drum, MIDI)—provided a regular accompanying beat. The pitch-only versions, rendered in a

synthetic flute timbre in the original key, preserved the original intervals between successive

tones. All songs in this condition were presented in a similar pitch register. As a result, the pitch

level of one song (from Dora the Explorer) was one octave lower than its melodic version.

Moreover, all tones for each excerpt were of equal duration, and the tempo was normalized (to

90 beats per minute) across excerpts. Notably, in these versions, long-duration notes in the

originals were not represented by short-duration notes. Instead, the long-duration notes were

shortened to match all other note durations. This manipulation resulted in a disruption of the

original tempo, rhythm, and meter, in effect leaving no distinctive timing cues. Excerpts in the

original, instrumental, and timing-only conditions were approximately 15 s in duration. Because

65

of the substitution of short-duration notes for long-duration notes in the pitch-only condition,

those excerpts were approximately 10 s in duration.

Procedure. Participants were tested individually. At their request, a parent was present

in the booth with some CI participants. Parents were permitted to assist with explanations when

the task was initially described to the child and during practice trials, but they did not interact

with the child in any way once the test phase began. Prior to the test session, the experimenter

asked the child and parent to choose two TV shows most familiar to the child from the eight that

were available. Before the first trial, pictorial representations of the two shows appeared

simultaneously on the computer monitor, and each child responded accurately when the

experimenter pointed to each picture in turn and asked, “Who’s that? Tell me.” Children were

told that they were going to hear songs from the two TV shows, and that they were to indicate,

“which show the song belongs to” by touching one of the pictures on the screen. The stimuli

were presented in five blocks, corresponding to the five conditions. Presentation was in fixed

order — the original versions first, followed by the instrumental, melodic, timing-only, and

pitch-only versions. Each block was preceded by two practice trials. Before each trial, children

heard pre-recorded instructions (“Listen to the music! Who’s that? Show me!”) spoken by a

woman in a child-directed manner. Stimuli within each block (2 shows X 5 repetitions of each

song) were presented randomly for a total of 10 trials per block. After listening to each stimulus,

participants responded by touching the picture corresponding to the presumed show. They were

free to respond as soon as they recognized the music. Children received feedback after each trial

(including practice trials)—a smiley face for correct responses and a blank screen for incorrect

responses.

66

Figure 14. Examples of the melodic, timing-only and pitch-only conditions for two TV-show

theme songs: “Backyardigans” and “Diego”.

Melodic:

Timing-only:

Pitch-only:

67

Results

Preliminary analyses compared performance in each condition with chance levels (i.e., 5 correct

on 10 trials, with 2 response options per trial) on the two-alternative forced-choice task

separately for both groups of children. For the NH group, one-sample t-tests confirmed that

performance exceeded chance levels in each instance, p < .0001. For the CI group, performance

was above chance in the original, instrumental, and melodic conditions (p < .0001). In the

timing-only condition, the difference from chance approached significance, t(7) = 2.26, p < .06.

In the pitch-only condition, CI children’s accuracy did not exceed chance levels. The

performance of child CI users and NH listeners is depicted in Figure 15.

Figure 15. Performance of child CI users and NH listeners. Error bars represent standard errors.

We first verified that all children could recognize the two target songs, with and without

the lyrics, by examining performance on the original and instrumental conditions. Performance

68

in these conditions was unrelated theoretically to our principal question about melody

perception. A two-way mixed-design analysis of variance (ANOVA) examined identification

accuracy as a function of one between-subjects factor (group: CI or NH) and one within-subjects

factor (condition: original or instrumental). Because performance approached ceiling levels for

both groups in both conditions, neither main effect was significant, p > .05, and there was no

two-way interaction, p > .1. Rather, consistently high levels of performance (> 91% correct)

confirmed that both groups could recognize the songs even without lyrics, which legitimized our

subsequent tests of melody recognition.

The principal analyses examined performance differences among the melodic, timing-

only, and pitch-only conditions. Because the assumption of sphericity was violated, p < .05, we

used a repeated-measures multivariate analysis of variance (MANOVA) with condition as a

repeated measure and group as a between-subjects variable. Although there was no main effect

of group, p > .1, the main effect of condition was significant, F(2, 21) = 11.48, p < .001, as was

the two-way interaction between condition and group, F(2, 21) = 6.57, p < .01. Follow-up tests

revealed that the two groups did not differ in the melodic, p > .3, or timing-only, p > .1,

conditions, but the NH group outperformed the CI group in the pitch-only condition, t(22) =

2.24, p < .05. Alternative analyses compared differences between conditions separately for the

two groups. The CI group performed better in the melodic condition than in either the timing-

only condition, t(7) = 2.59, p < .05, or the pitch-only condition, t(7) = 4.25, p < .005, which did

not differ, p > .50. By contrast, performance of the NH group did not differ across all three

conditions, ps > .40.

Examination of individual performance (see Figure 16) revealed a more complex picture.

Bearing in mind that the probability of guessing 8 or more answers out of 10 correctly is less

than 5% (binomial test), only 3 CI participants (CI-5, CI-7, and CI-11) actually performed at

69

chance levels in both timing-only and pitch-only conditions, with the youngest participant, CI-

11, demonstrating the poorest accuracy. Three child CI users achieved perfect (CI-3 and CI-4) or

near-perfect (CI-2) accuracy in the timing-only condition, but were at chance in the pitch-only

condition. In contrast, participant CI-12 was error-free in the pitch-only condition but performed

very poorly in the timing-only condition. Participant CI-8 performed reasonably, albeit modestly

(80%), in the pitch-only condition, but only achieved 70% accuracy in the timing-only condition.

With the worst performer (CI-11) excluded, CI children’s scores in these two conditions were

negatively correlated, r = -.73, at levels approaching significance, p = .06, suggesting the

possibility of a “trade-off” between the use of timing and pitch cues by CI children.

Figure 16. Performance of individual CI children. Original participant codes are preserved.

Despite the discrepancies in performance, the individual data confirm that the pitch-only

condition generally presented greater problems for CI listeners than did the timing-only

condition. In contrast, the majority of NH children performed comparably well in both

conditions, and only three demonstrated a comparable “trade-off” between rhythm and pitch. We

70

did not systematically document the strategies used by CI participants, but participant CI-2

commented that the difference in tempo was helpful in the timing-only condition (“This guy was

faster than that guy”), and participant CI-4 reported linking the rhythm in the timing-only

condition to the lyrics (“counted where the words were supposed to be”).

Figure 17. Performance of individual NH children in the timing-only and pitch-only conditions.

Discussion

The goal of the present investigation was to ascertain whether 5- to 7-year-old children

with bilateral CIs could identify familiar TV songs from pitch or timing cues alone or in

combination. First, we established that young CI users were highly accurate at identifying the

original vocal/instrumental versions as well as versions that preserved the original

instrumentation without the lyrics. These results confirm earlier findings in some respects

(Mitani et al., 2007; Vongpaisal et al., 2009). Unlike previous findings, however, the

71

performance levels of young CI users matched those of NH children, and the instrumental

versions were identified as successfully as the original versions. Undoubtedly, the reduced

cognitive demands of the present task, which featured two alternative responses rather than the

three or four in previous studies, contributed to the exceptionally high performance levels.

Of principal interest were the melodic condition, which provided pitch and timing cues,

the timing-only condition, which provided timing cues but no pitch cues, and the pitch-only

condition, which provided pitch cues but no timing cues. As was the case for NH children, child

CI users performed comparably on the melodic condition and on the original and instrumental

conditions (see Figure 17). In absolute terms, moreover, child CI users actually performed better

than NH children in the melodic condition. In previous research, Japanese children of similar age

were unable to recognize TV songs from comparable melodic cues (Mitani et al., 2007), and

older Canadian children could do so but they performed more poorly on melodic versions than

on the original versions. In both cases, child CI users exhibited a substantial decrement in

performance when the cues available at test were different from the cues at original exposure

(i.e., while watching TV at home). Those findings were attributed to CI children’s less robust

representation of the music than NH children, who were less affected by the elimination of

timbre and texture cues. With the minimal cognitive demands of the current two-alternative task,

children’s performance was unaffected by such changes, which implies that child CI users’

representation of music is more general than previously envisioned.

The finding that children with CIs and NH controls could identify familiar music on the

basis of timing cues alone is the unique contribution of the present study. In fact, the

performance of the CI and NH groups did not differ significantly on this task, which is consistent

with the results from studies of rhythm perception in adult CI users (Cooper et al., 2008; Gfeller

72

et al., 1991, 1997; Kong et al., 2004). It is also consistent with child CI users’ reliance on timing

cues to differentiate one talker from another (Vongpaisal et al., 2010). In principle, child CI users

and NH controls could have used tempo in addition to rhythm cues to identify the timing-only

patterns, but the extent to which they did so remains unclear. Undoubtedly, children would have

much greater difficulty identifying timing-only versions from three or more alternatives and they

would be entirely unsuccessful on an open-set task. Adults with normal hearing correctly name

only about 5% of highly familiar songs from timing cues alone, 50% from pitch cues alone, and

90% from combined pitch and timing cues (Hébert & Peretz, 1997). Interestingly, the pitch-only

versions that adults cannot name sound familiar to them, in contrast to the timing-only versions,

which do not.

Child CI users’ performance differed significantly from that of NH children only on the

pitch-only versions, where their performance was at chance levels. Their failure to identify songs

on the basic of pitch cues alone might lead one to conclude that they rely entirely on timing cues.

That interpretation is not borne out by child CI users’ significantly better performance on the

melody versions, which had pitch and timing cues, than on the timing-only versions, which had

timing alone. The implication is that child CI users derived some benefit from pitch cues.

Children whose program selections included Dora the Explorer could have used pitch register

cues instead of or in addition to pitch contour cues in the melody condition but not in the pitch-

only condition.

Although the overall performance of child CI users was at chance levels on the pitch-only

versions, CI-12 achieved error-free performance on this and other versions except for the timing-

only version, on which she performed poorly. Instead of musical pitch cues being inaccessible to

child CI users because of device limitations, these cues may be weak or of relatively low

73

salience. Perhaps the effective salience of pitch cues could be enhanced by training, which would

also have implications for the perception of speech in noise.

The tendency for child CI users who performed well on the timing-only versions to

perform poorly on the pitch-only versions implies that they were relatively inflexible in their

listening strategies, unlike NH children, who readily switched from one strategy to another,

depending on the task at hand. For children with CIs, listening in general is likely to be more

effortful or cognitively demanding than it is for NH children, with listening to music being

particularly effortful. One consequence may be the use of similar listening strategies across

disparate contexts, even when those strategies are ineffectual.

In short, the present findings suggest that CI users who receive their implants early have

more complex representations of music than one would predict based on previous research

(Mitani et al., 2007; Vongpaisal et al., 2009). These representations may include precise

information about timing and coarser information about pitch contour and pitch register. Further

research with a larger sample is necessary to establish the contribution of demographic and

experiential factors to music recognition and the links between music and speech perception.

Finally, in light of the increasing links that have been identified between music and well-being

(Hanser, 2010) and between musical and non-musical skills (Degé et al., in press; Kraus &

Chandrasekaran, 2010; Moreno et al., 2009; Schellenberg, 2004; Wong, Skoe, Russo, Dees, &

Kraus, 2007), it is important to ascertain the extent to which music perception in child CI users

can be improved with limited intervention.

Supplementary Comments

This thesis examined the ability of bilateral child CI users who were 4-7 years of age to perceive

speech and music in optimal circumstances. In Study 1, the children were asked to differentiate

74

talkers contrasting in gender and age. In Study 2, they were required to identify affective

intentions (happy or sad) in speech and music, and in Study 3, they attempted to identify familiar

melodies. In previous studies, these types of tasks posed substantial problems for adults and

children with CIs (Cleary et al., 2005; Fu et al., 2004, 2005; Hopyan-Misakyan et al., 2009; Luo

et al., 2007; Vongpaisal et al., 2006, 2009), and the problems were attributed to intrinsic

limitations of implants for pitch and spectral processing (Loizou, 1998; Smith et al., 2002).

Unquestionably, such tasks pose difficulty for CI users, but do they pose insurmountable

difficulty for all CI users? The approach, in the present study, was to evaluate a small,

advantaged sample of prelingually deaf children, with the goal of shedding light on the potential

of young CI users in contrast to the usual approach of focusing on typical or average

performance in this population.

Overall, the performance of this selective group of child CI users surpassed that of adult

and child CI users in previous investigations, except for the identification of emotion in music. In

that instance, the present CI users performed equivalently to older CI users (7-13 years) on a

similar task (Hopyan et al., 2011). Remarkably, the performance of the present child CI users did

not differ significantly from that of NH children on a number of tasks. For example, child CI

users in Study 1 identified the gender and age of talkers (man, woman, or girl) with comparable

accuracy to that shown by NH children. Moreover, CI users in Study 3 were as accurate as their

NH counterparts in identifying familiar melodies when relative pitch and timing cues were

available. On other tasks involving the identification of familiar talkers (Study 1, Experiment 2),

emotion in speech and music (Study 2), and familiar melodies from relative pitch cues alone

(Study 3), the performance of CI users did not equal that of their NH peers. In each case,

however, one or more child CI users performed as well as NH children, sometimes achieving

error-free performance. Regardless of the specific factors that underlie the success of these CI

75

users, which are as yet undermined, one thing is clear. The poor performance of CI users in other

studies of talker, emotion, and melody identification cannot be attributed to device limitations

alone. After all, no cues were available to the present children beyond those provided by their

implants.

Aside from the advantageous circumstances of the present sample of child CI users—

early implantation and committed, well-educated parents, among others—the specific tasks in

the present study optimized children’s performance by using highly engaging speech or musical

stimuli (excerpts of classical music being one exception), closed-set tasks and, in some cases,

feedback. The findings are consistent with the view that timing cues are critical for CI users

when differentiating talkers (Vongpaisal et al., 2010), melodies (Hsiao, 2008; Kong et al., 2004;

Stordahl, 2002), and emotions (Hopyan-Misakyan et al., 2009), in contrast to NH listeners, who

rely primarily on pitch and spectral cues for those purposes (Nimmons et al., 2007; Scherer,

2003; Van Lancker et al., 1985). For example, children with CIs identified familiar melodies

when timing cues were available but not otherwise. On the whole, their performance was more

similar to that of NH children when timing cues were consistent (e.g., talker identification,

melody identification) rather than inconsistent (e.g., emotion identification in speech,

identification of isochronous melodies), with one notable exception (emotion identification in

music).

An important question is whether child CI users relied exclusively on timing cues in

performing the various tasks in the present study. This possibility seems unlikely for a variety of

reasons. For one thing, all CI users identified the male talker correctly from the very first trial,

before receiving any feedback, and several CI users did likewise for the woman and girl. It is

plausible that they capitalized on voice quality and pitch register cues from their everyday

76

experience with male, female, and child talkers. The implausible alternative is that they had clear

expectations about articulatory timing or speaking rate for various classes of talkers. In future

research, young CI users’ sensitivity to the spectral attributes of talker identity could be

examined directly by using temporally reversed speech samples (e.g., Sheffert et al., 2003) that

retain pitch and voice quality cues while removing phonetic and articulatory timing cues.

Child CI users’ successful identification of happy and sad utterances in the absence of

timing cues (Study 2, Experiment 1) or feedback also raises the possibility that they relied, to

some extent, on pitch variability or pitch contour. This evidence is less compelling, however,

because amplitude cues were also available. The contribution of pitch-related cues could be

established definitively by controlling amplitude cues in future research. In any case, the fact that

several CI users achieved perfect or near-perfect accuracy on this task implies that they were

using knowledge gained from everyday listening experience.

Finally, CI users’ identification of familiar melodies was significantly more accurate when

pitch and timing cues were available rather than timing cues alone, suggesting that pitch contour,

pitch range, or pitch level played some role. Moreover, although CI users as a group performed

at chance levels at identifying melodies on the basis of pitch cues alone, two children with CIs

were successful on this task, indicating that pitch relations are perceptible to some listeners with

electrical hearing.

The relatively high performance levels in the present investigation and the exceptional

performance of some individuals suggest that the potential of CI users with respect to talker

discrimination, emotion differentiation in speech and music, and melody identification has been

underestimated. The findings suggest, moreover, that habilitation or training efforts in these

77

domains should be pursued. Such training could have direct benefits in the trained domains as

well as potential transfer to other domains.

A much larger sample than that of the present investigation would be necessary to

ascertain the background factors linked to enhanced or compromised performance. Nevertheless,

a number of children participated in several tasks, so it is possible to review their performance

across tasks and speculate about possible combinations of background factors that may have

enabled the top performers to realize their potential as implant users. In fact, 11 CI users

participated in 4 or more of 5 representative tasks from the present investigation — talker

classification, talker recognition recognition, vocal emotion identification, musical emotion

identification, and melody recognition — with 7 participants completing all tasks.

A cumulative score, expressed as percent (%) correct averaged across tasks, provided a

very rough estimate of individual CI users’ overall success. The cumulative score was based on

four or all five of the following five tasks, as available (see Table 10). Scores for tasks with more

than one block were calculated by averaging the participant’s % correct scores across blocks.

§ talker classification (3 blocks)

§ average talker recognition (1 block)

§ average vocal emotion identification (2 blocks)

§ average musical emotion identification (2 blocks)

§ melody recognition (unaccompanied melody with pitch and timing cues preserved; 1

block). The original and instrumental conditions of the music recognition task (Study 3)

were excluded from consideration because they were of secondary interest, used only for

purposes of replication. The timing-only and pitch-only conditions were excluded

78

because of performance at or below chance levels by more than half of CI participants,

rendering the relations among those scores meaningless.

The cumulative scores of 10 CI users ranged from 88% to 100%, with one child performing at

65% correct.

An examination of the children’s case histories, early communication assessments, and

reports from their auditory-verbal therapists revealed that few children had a completely

uneventful history (e.g., one or more problematic episodes with their implants). However, all

children appeared to have normal cognitive abilities, although this was not confirmed

definitively by psychometric assessment, and all were in age-appropriate classes in regular

schools. Standardized open-set speech perception tests, which were administered at regular

intervals as part of their clinical follow-up, were within the normal range of other CI users of

similar age and hearing history.

Electronically evoked auditory brain stem responses (EABRs) were collected as part of

an ongoing study on the effects of bilateral implantation on auditory development and plasticity.

Changes in EABR latencies with implant use provide an index of auditory brainstem maturation

in individuals with electrical hearing (Gordon et al., 2005; Thai-Van et al., 2007). In all

instances, the EABRs for the CI users in the present sample were age-appropriate, at least in the

better ear.

Socio-economic status (SES) of the families of these child implant users was estimated

from census data, based on median after-tax income of families with children in their

neighborhood and compared with the median after-tax income for families with children in the

province of Ontario (data from 2005 census, available online through Statistics Canada:

http://www12.statcan.gc.ca/census-recensement/2006/dp-pd/prof/92-597/index.cfm?Lang=E).

http://www12.statcan.gc.ca/census-recensement/2006/dp-pd/prof/92-597/index.cfm?Lang=E

79

Based on this measure, participating families were in the mid-range of the Ontario

population. SES estimated by family income rather than educational attainment of parents may

underestimate SES in this sample because a number of mothers of CI children had chosen to

remain out of the workforce to optimize their child’s opportunities.

One factor that distinguishes the families in the present sample of child CI users from

those in the general population of child CI users is parental willingness to commit to the

demands of present research project. Completion of four or more tasks in the present

investigation required several laboratory visits, often after school and on weekends over and

above other research, medical appointments related to the implants, and auditory-verbal therapy.

As a result, the current sample consisted of a self-selected group of enthusiastic, well-informed,

and supportive middle-class parents whose children were doing well enough to motivate

participation in time-consuming research that had no direct benefits.

Although child CI participants performed well overall, they found some tasks more difficult than

others. One notable exception was participant CI-8, who was the only child to achieve a perfect

overall score. His etiology of genetic, non-syndromic congenital deafness, which was identified

early, has been associated with better speech outcomes than other etiologies of deafness

(Kawasaki et al., 2006; Wu et al., 2008). He had no notable health issues at birth or thereafter.

He was at or slightly above the mean age of CI participants at the time of participating in all

tasks. His relatively long experience as a CI user (5.3-5.5 years for various tasks) was

advantageous for him, as it was for the group as a whole, as reflected in significantly positive

associations between duration of implant use and performance on the emotion identification

tasks (Study 2). CI-8 received his initial implant before his first birthday and his second implant

some months later. As such, he avoided some of the adverse consequences associated with

80

longer delays between implants (Gordon et al., 2007). He never experienced technical problems

with either implant, which occurred sporadically for other child CI users in the present sample.

His parents were highly educated and relatively affluent, and they enrolled him in many

extracurricular activities, including piano lessons. English was used exclusively at home, and it

was the language of school and auditory-verbal therapy.

As noted, no assessments of intelligence, non-verbal or otherwise, were available for any

children in the sample. It was clear, however, that CI-8 grasped all tasks quickly, and he

exhibited highly focused attention and goal-directedness. He had all the hallmarks of a highly

intelligent, conscientious, and cooperative child. No other participant had a constellation of

background factors as favorable as that of CI-8. The second best performer was CI-4. Her

etiology of deafness, early diagnosis and implantation, CI experience, apparent cognitive ability,

and motivation were similar to CI-8’s. Clearly, it is foolhardy to search for a pattern of

background variables to account for the data from such a small sample. It is notable, however,

that the children with higher cumulative scores tended to have congenital, genetic non-syndromic

hearing loss whereas those with lower overall scores (including the poorest performer, CI-10)

differed in their onset and etiology of hearing loss. Four of the children with lower cumulative

scores were exposed to more than one language at home; for three, English was not the primary

language of the family. Several children (including CI-4) had experienced technical problems

with their implants, with one case (CI-2) necessitating re-implantation. Despite variations in age,

motivation, compliance, procedure, and family SES, most CI users achieved high levels of

performance that exceeded expectations based on the available literature. It remains to be

determined whether their stellar performance will be sustained over the long run.

81

Table 10. Cumulative scores and demographic profiles of the 11 CI users who completed four or

more tasks comprising the present investigation. C

ode

Cum

ulat

ive

scor

e (%

)

Hea

ring

lo

ss(o

nset

, et

iolo

gy)

Age

rang

e ac

ross

ses

sion

s (y

ears

)

Age

at

activ

atio

n 1

and

2 (y

ears

)

His

tory

of

prob

lem

s w

ith

CI u

se

Lan

guag

e at

ho

me

Est

imat

ed

inco

me,

pe

rcen

tile

CI-8 100 congenital, genetic

non-syndromic 6.1-6.3

0.8; 1.5

delay < 1 no English 86

CI-4 97.4 congenital, genetic


1.0; 3.6

delay > 2 yes** English 51

CI-3 92.4 congenital, genetic


1.1; 1.1

delay = 0 no Mandarin,

English 29

CI-11 92.3*

congenital, genetic


1.1; 1.1

delay = 0 no English 65

CI-1 92.1*

progressive,

genetic non-

syndromic (?)

5.8-6.4 3.4; 3.4

delay = 0 no Dari,

English 73

CI-6 90.3*

congenital, genetic


1.0; 4.6

delay > 3 no Mandarin,

English 86

CI-7 89.6 congenital genetic


0.9; 1.8

delay < 1 yes** English 65

CI-2 88.8 congenital, Usher I

syndrome 4.8-5.7

0.8; 1.7

delay < 2 yes** English 65

CI-12 88.7 pre-lingual, life-

saving intervention

after birth

6.1-6.9 1.0; 3.5

delay > 2 no English,

Punjabi 51

CI-5 87.9 progressive, non-

genetic 6.6-7.2

2.5; 4.0

delay < 2 no English 56

CI-10 65.1*

progressive,

Mondini dysplasia 6.3-6.5

3.1; 6.3

delay > 2 no English 65

*Four tasks completed ** One side only

82

References

Bachorowski, J. A. (1999). Vocal expression and perception of emotion. Current Directions in

Psychological Science, 8, 53-57.

Balkwill, L.-L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of

emotion in music: Psychophysical and cultural cues. Music Perception, 17, 43-64.

Bartholomeus, B. (1973). Voice identification by nursery school children. Canadian Journal of

Psychology, 27, 464–472.

Barry, J. G., Blamey, P. J., Martin, L. F. A., Lee, K. Y.-S., Tang, T., Ming, Y. Y., & Van Hassel,

C. A. (2002). Tone discrimination in Cantonese-speaking children using a cochlear

implant. Clinical Linguistics and Phonetics, 16, 79-99.

Boersma, P., & Weenink, D. (2005). Praat: Doing phonetics by computer (Version 4.3.01)

[Computer program]. Retrieved from http://www.praat.org/

Chatterjee, M., & Peng, S.C. (2007). Processing F0 with cochlear implants: Modulation

frequency discrimination and speech intonation recognition. Hearing Research, 235,

143–156.

Chadha, N. K., Papsin, B. C., Jiwani, S., & Gordon, K. A. (2011). Speech detection in noise and

spatial unmasking in children with simultaneous versus sequential bilateral cochlear

implants. Otology and Neurotology. 32, 1057-1064

Cleary, M., Pisoni, D. B., & Kirk, K. I. (2005). Influence of voice similarity on talker

discrimination in children with normal hearing and children with cochlear implants.

Journal of Speech, Language, and Hearing Research, 48, 204–223.

83

Coletti, V., Carner, M., Miorelli, V., Guida, M., Coletti, L., & Fiorino F.G. (2005). Cochlear

implantation at under 12 months: Report on 10 patients. Laryngoscope, 115, 445-449.

Connor, C. M., Craig, H. K., Raudenbush, S. W., Heavner, K., & Zwolan, T. A. (2006). The age

at which young deaf children receive cochlear implants and their vocabulary and speech-

production growth: Is there an added value for early implantation? Ear and Hearing, 27,

628-644.

Cooper, R. P. & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after

birth. Child Development, 61, 1584-1595.

Cooper, W. B., Tobey, E., & Loizou, P. C. (2008). Music perception by cochlear implant and

normal hearing listeners as measured by the Montreal Battery for Evaluation of Amusia.

Ear and Hearing, 29, 618-626.

Cullington, H. E., & Zeng, F. G. (2008). Speech recognition with varying numbers and types of

competing talkers by normal-hearing, cochlear-implant, and implant simulation subjects.

Journal of the Acoustical Society of America, 123, 450-461.

Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the

affective value of tempo and mode in music. Cognition, 80, B1-B10.

Dammeyer, J. (2010). Psychosocial development in a Danish population of children with

cochlear implants and deaf and hard-of-hearing children. Journal of Deaf Studies and

Deaf Education, 15, 50-58.

Degé, F., Kubicek, C., & Schwarzer, G. (2011) Music lessons and intelligence: A relation

mediated by executive functions. Music Perception, 29, 195-201.

84

Drennan, W. R. & Rubinstein, J. T. (2006). Sound processors in cochlear implants. In S.B.

Waltzman & J.T. Roland (Eds.), Cochlear Implants (2nd Ed, pp. 40-47). New York:

Thieme Medical Publishers.

Drennan, W. R., & Rubinstein, J. T. (2008). Music perception in cochlear implant users and its

relationship with psychophysical capabilities. Journal of Rehabilitation Research and

Development, 45, 779-790.

Fagan, M. K., & Pisoni, D. B. (2010). Hearing experience and receptive vocabulary development

in deaf children with cochlear implants. Journal of Deaf Studies & Deaf Education, 15,

149-161.

Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and

Development, 8, 181-195

Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the

melody the message? Child Development, 60, 1497-1510.

Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic functions. In R.

Vasta (Ed.), Annals of child development (Vol. 8, pp. 43-80). London, UK: Jessica

Kingsley Publishers.

Fitzsimmons, M., Sheahan, N., & Staunton, H. (2001). Gender and the integration of acoustic

dimensions of prosody: Implications for clinical studies. Brain and Language, 78, 94–

108.

85

Friesen, L. M., Shannon, R.V., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a

function of the number of spectral channels: Comparison of acoustic hearing and cochlear

implants. Journal of the Acoustical Society of America, 110, 1150-1163.

Fu, Q. J., Chinchilla, S., Nogaki, G., & Galvin, J. J 3rd. (2005). Voice gender identification by

cochlear implant users: The role of spectral and temporal resolution. Journal of the

Acoustic Society of America, 118, 1711–1718.

Galvin, J. J., Fu, Q. J., & Nogaki, G. (2007). Melodic contour identification by cochlear implant

listeners. Ear and Hearing, 28, 302-319.

Gates, G. A. & Miyamoto, R. T. (2003). Cochlear implants. New England Journal of Medicine,

349, 421–423.

Geers, A. E. (2004). Speech, language, and reading skills after early cochlear implantation. Head

& Neck Surgery, 130, 634-638.

Geers, A. E. (2006). Spoken language in children with cochlear implants. In P. E. Spencer, & M.

Marschark (Eds.), Advances in the spoken language development of deaf and hard-of-

hearing children. Perspectives on deafness (pp. 244-270). New York: Oxford University

Press.

Geers, A. & Brenner, C. (2003). Background and educational characteristics of prelingually deaf

children implanted by five years of age. Ear and Hearing, 24, 2S-14S.

Geers, A., Brenner, C., & Davidson, L. (2003). Factors associated with development of speech

perception skills in children implanted by age five. Ear and Hearing, 24, 24S-35S.

86

Geers, A., Nicholas, J., & Moog, J. (2007). Estimating the influence of cochlear implantation on

language development in children. Audiological Medicine, 5, 262–273.

Geers, A. E., Nicholas, J. G., & Sedey, A. L. (2003). Language skills of children with early

cochlear implantation. Ear and Hearing, 24, 46S–58S.

Geurts, L., & Wouters, J. (2001). Coding of the fundamental frequency in continuous

interleaved sampling processors for cochlear implants. Journal of the Acoustical Society

of America, 109, 713-726.

Gfeller, K., Christ, A., Knutson, J. F.,Witt, S., Murray, K .T., & Tyler, R. S. (2000). Musical

backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant

recipients. Journal of the American Academy of Audiology, 11, 390-406.

Gfeller, K., & Lansing, C. R. (1991). Melodic, rhythmic, and timbral perception of adult

cochlear implant users. Journal of Speech and Hearing Research, 34, 916-920.

Gfeller, K., Olszewski, C., Rychener, M., Sena, K., Knutson, J. F., Witt, S., & and Macpherson,

B. (2005). Recognition of “real-world” musical excerpts by cochlear implant recipients

and normal-hearing adults. Ear and Hearing, 26, 237–250.

Gfeller, K., Turner, C., Mehr, M., Woodworth, G., Fearn, R., Knutson, J. F., Witt, S. and

Stordahl, J. (2002). Recognition of familiar melodies by adult cochlear implant recipients

and normal-hearing adults. Cochlear Implants International, 3, 29-53.

Gfeller, K., Turner, C., Oleson, J., Zhang, X., Gantz, B., Froman, R., & Olszewski, C. (2007).

Accuracy of cochlear implant recipients on pitch perception, melody recognition and

speech reception in noise. Ear and Hearing, 28, 412-423.

87

Gfeller, K., Woodworth, G., Robin, D. A., Witt, S., & Knutson, J. F. (1997). Perception of

rhythmic and sequential pitch patterns by normally hearing adults and cochlear implant

users. Ear and Hearing, 18, 252-260.

Gilley, P. M., Sharma, A., & Dorman, M. F. (2008). Cortical reorganization in children with

cochlear implants. Brain Research, 1239, 56-65.

Gordon, K. A., Tanaka, S., & Papsin, B. C. (2005). Atypical cortical responses underlie poor

speech perception in children using cochlear implants. Neuroreport, 16, 2041-2045.

Gordon, K. A., Valero, J., & Papsin, B. C. (2007). Auditory brainstem activity in children with

9-30 months of bilateral cochlear implant use. Hearing Research, 233, 97-107.

Hallam, S. (2010). Music education: The role of affect. In P. N. Juslin & J. A. Sloboda (Eds.),

Handbook of music and emotion: Theory, research, applications (pp. 791-817). New

York: Oxford University Press.

Hanser, S. B. (2010). Music, health, and well-being. In P. N. Juslin & J. A. Sloboda (Eds),

Handbook of music and emotion: Theory, research, applications (pp. 791-817). New

York: Oxford University Press.

Hébert, S., & Peretz, I. (1997). Recognition of music in long-term memory: Are melodic and

temporal patterns equal partners? Memory and Cognition, 25, 518-533.

Holt, R. F., & Svirsky, M. A. (2008). An exploratory look at pediatric cochlear implantation: is

earliest always best? Ear and Hearing, 29, 492-511.

88

Hopyan, T., Gordon, K. A., & Papsin, B. C. (2011). Identifying emotions in music through

electrical hearing in deaf children using cochlear implants. Cochlear Implants

International, 12, 21-26.

Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., & Papsin, B. C. (2009). Recognition of

affective speech prosody and facial affect in deaf children with unilateral right cochlear

implants. Child Neuropsychology, 15, 136-146.

Hsiao, F. (2008). Mandarin melody recognition by pediatric cochlear implant recipients. Journal

of Music Therapy, 45, 390-404.

Hunter, P. G., & Schellenberg, E. G. (2010). Music and emotion. In M. R Jones, R. R. Fay, & A.

N. Popper. Music perception (pp. 129-164). New York: Springer.

Hunter, P. G., Schellenberg, E. G., & Stalinski, S. M. (2011). Liking and identifying emotionally

expressive music: Age and gender differences. Journal of Experimental Child

Psychology, 110, 80-93.

Husain, G., Thompson, W. F., & Schellenberg, E. G. (2002). Effects of musical tempo and

mode on arousal, mood, and spatial abilities. Music Perception, 20, 151-171.

Johnston, C. J., Durieux-Smith, A., Angus, D., O’Connor, A., & Fitzpatrick, E. (2009). Bilateral

paediatric cochlear implants: A critical review. International Journal of Audiology, 48,

601-617.

Johnstone, T., & Scherer, K. R., 2000. Vocal communication of emotion. In M. Lewis & J.

Haviland (Eds.), Handbook of emotion (2nd Ed, pp. 220–235). New York: Guilford.

89

Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music

performance: Different channels, same code? Psychological Bulletin, 129, 770-814.

Kawasaki, A., Fukushima, K., Kataoka, Y., Fukuda, S., & Nishizaki, K. (2006). Using

assessment of higher brain functions of children with GJB2-associated deafness and

cochlear implants as a procedure to evaluate language development. International

Journal of Pediatric Otorhinolaryngology, 70, 1343-1349.

Kirschner, S. & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-

year-old children. Evolution and Human Behavior, 31, 354-364.

Kong, Y.-Y., Cruz, R., Jones, J. A., & Zeng, F.-G. (2004). Music perception with temporal cues

in acoustic and electric hearing. Ear and Hearing, 25, 173-185.

Kovačić, D., & Balaban, E. (2009).Voice gender perception by cochlear implantees. Journal of

the Acoustical Society of America, 126, 762–775.

Kovačić, D., & Balaban, E. (2010).Hearing history influences voice gender perceptual

performance in cochlear implant users. Ear and Hearing, 31, 806-814.

Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills.

Nature Reviews Neuroscience, 11, 599-605.

Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Experience-induced malleability in

neural encoding of pitch, timbre, and timing. Annals of the New York Academy of

Sciences, 1169, 543-557.

90

Lander, K., Hill, H., Kamachi, M., & Vatikiotis-Bateson, E. (2007). It's not what you say but the

way you say it: Matching faces and voices. Journal of Experimental Psychology: Human

Perception and Performance, 33, 905-914.

Lassaletta, L., Castro, A., Bastarrica, M., Pérez-Mora, R., Madero, R., De Sarriá, J., & Gavilán,

J. (2007). Does music perception have an impact on quality of life following cochlear

implantation? Acta Oto-Laryngologica, 127, 682-686.

Laukka, P., Juslin, P. N., & Bresin, R. (2005). A dimensional approach to vocal expression of

emotion. Cognition and Emotion, 19, 633-653.

Leal, M. C., Young, J., Laborde, M.-L., Calmels, M.-N., Verges, S., Lugardon, S., Andrieu, S.,

Deguine, O., & Fraysse, B. (2003). Music perception in adult cochlear implant recipients.

Acta Oto-Laryngologica, 123, 826-835.

Loizou, P. (1998). Mimicking the human ear. IEEE Signal Processing Magazine, 15, 101-130.

Luo, X., Fu, Q. J., & Galvin, J. (2007). Vocal emotion recognition by normal-hearing listeners

and cochlear implant users. Trends in Amplification, 11, 301-315.

Masataka, N. (1999). Preference for infant-directed singing in 2-day-old hearing infants of deaf

parents. Developmental Psychology, 35, 1001-1005.

McDermott, H. J. (2004). Music perception with cochlear implants: A review. Trends in

Amplification, 8, 49-82.

Meister, H., Landwehr, M., Pyschny, V., Walger, M., von Wedel, H. (2009). The perception of

prosody and speaker gender in normal-hearing listeners and cochlear implant recipients.

International Journal of Audiology, 48, 38-48.

91

Mitani, C., Nakata, T., Trehub, S. E., Kanda, Y., Kumagami, H., Takasaki, K., Miyamoto, I., &

Takahashi, H. (2007). Music recognition, music listening, and word recognition by deaf

children with cochlear implants. Ear and Hearing, 28, 29S-33S.

Moore, B. C. J., & Carlyon, R. P. (2005). Perception of pitch by people with cochlear hearing

loss and by cochlear implant users. In C. J. Plack, A. J. Oxenham, R. R. Fay, & A. N.

Popper (Eds.), Pitch: Neural coding and perception (pp. 234-277). New York: Springer.

Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S.L., & Besson, M. (2009). Musical

training influences linguistic abilities in 8-year-old children: More evidence for brain

plasticity. Cerebral Cortex, 19, 712-723.

Most, T., & Aviner, C. (2009). Auditory, visual, and auditory–visual perception of emotions by

individuals with cochlear implants, hearing aids, and normal hearing. Journal of Deaf

Studies and Deaf Education, 14, 449-464.

Most, T., & Peled, M. (2007). Perception of suprasegmental features of speech by children with

cochlear implants and children with hearing aids. Journal of Deaf Studies and Deaf

Education, 12, 350–361.

Mote, J. (2011). The effects of tempo and familiarity on children’s affective interpretation of

music. Emotion, 11, 618-622.

Nakata, T., Trehub, S.E., Kanda, Y., Mitani, C., & Schellenberg, E.G. (2005). Music recognition

by Japanese children with cochlear implants. Journal of Physiological Anthropology and

Applied Human Science, 24, 29-32.

92

Nakata, T., Trehub, S.E., Mitani, C., & Kanda, Y. (2006). Pitch and timing in the songs of deaf

children with cochlear implants. Music Perception, 24, 147-154.

Nicholas, J. G., & Geers, A. E. (2006). Effects of early auditory experience on the spoken

language of deaf children at 3 years of age. Ear and Hearing, 27, 286-298.

Nicholas, J. G., & Geers, A. E. (2007) Will they catch up? The role of age at cochlear

implantation in the spoken language development of children with severe to profound

hearing loss. Journal of Speech, Language, and Hearing Research, 50, 1048-1062.

Nimmons, G. L., Kang, R. S., Drennan, W. R., Longnion, J., Ruffin, C., Worman, T., Yueh, B.,

& Rubinstein, J. T. (2007). Clinical assessment of music perception in cochlear implant

listeners. Otology and Neurotology, 29, 149-155.

Orchard, T. L., & Yarmey, A. D. (1995). The effects of whispers, voice-sample duration, and

voice distinctiveness on criminal speaker identification. Applied Cognitive Psychology, 9,

249-260

Papoušek, M. (1992). Early ontogeny of vocal communication in parent–infant interactions. In

H. Papoušek, U. Jürgens, & M. Papoušek (Eds.), Nonverbal vocal communication:

Comparative and developmental approaches (pp. 230-261). New York: Cambridge

University Press.

Peng, S.-C., Tomblin, J. B., & Turner, C. W. (2008). Production and perception of speech

intonation in pediatric cochlear implant recipients and individuals with normal hearing.

Ear and Hearing, 29, 336–351.

93

Percy-Smith, L., Jensen, J. H., Caye-Thomasen, P., Thomsen, J., Gudman, M., & Lopez, A.G.

(2008). Factors that affect the social well-being of children with cochlear implants.

Cochlear Implants International, 9, 199-214.

Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants,

immediacy, and isolation after brain damage. Cognition, 68, 111-141.

Peterson, N. R., Pisoni, D. B., & Miyamoto, R. T. (2010). Cochlear implants and spoken

language processing abilities: Review and assessment of the literature. Restorative

Neurology and Neuroscience, 28, 237-250.

Pisoni, D. B. (2005). Speech perception in deaf children with cochlear implants. In D. B. Pisoni

& R. E. Remez (Eds.), The handbook of speech perception. Blackwell handbooks in

linguistics (pp. 494-523). Malden: Blackwell Publishing.

Remez, R. E., Fellowes, J. M., & Rubin, P. E. (1997). Talker identification based on phonetic

information. Journal of Experimental Psychology: Human Perception and Performance,

23, 651–666.

Sagi, E., Kaiser, A. R., Meyer, T. A., & Svirsky, M. A. (2009). The effect of temporal gap

identification on speech perception by users of cochlear implants. Journal of Speech,

Language, and Hearing Research, 52, 385-395.

Salimpoor, V. N., Benovoy, M., Longo, G., Cooperstock, J. R., & Zatorre, R. J. (2009). The

rewarding aspects of music listening are related to degree of emotional arousal. PLoS

ONE, 4, e7487.

Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science, 15, 511-514.

94

Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ.

Journal of Educational Psychology, 98, 457-468.

Schellenberg, E. G. (2011). Examining the association between music lessons and intelligence.

British Journal of Psychology, 201, 283-302.

Schellenberg, E. G. & Hallam, S. (2005). Music listening and cognitive abilities in 10- and 11-

year-olds: the Blur effect. Annals of the New York Academy of Sciences, 1060, 202-209.

Schellenberg, E. G., Nakata, T., Hunter, P. G., Tamoto, S. (2007). Exposure to music and

cognitive performance: Tests of children and adults. Psychology of Music, 35, 5-19.

Schellenberg, E. G., Peretz, I., & Vieillard, S. (2008). Liking for happy- and sad-sounding music:

Effects of exposure. Cognition and Emotion, 22, 218-237.

Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research.

Psychological Bulletin, 99, 143-165.

Scherer, K. R. (2003).Vocal communication of emotion: A review of research paradigms. Speech

Communication, 40, 227-256.

Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression

correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32, 76-92.

Schorr, E. A., Roth, F. P., & Fox, N. A. (2009). Quality of life for children with cochlear

implants: Perceived benefits and problems and the perception of single words and

emotional sounds. Journal of Speech, Language, and Hearing Research, 52, 141-152.

95

Sharma, A., Dorman, M.F., & Kral, A. (2005). The influence of a sensitive period on central

auditory development in children with unilateral and bilateral cochlear implants. Hearing

Research, 203, 134-143.

Sheffert, S. M., Pisoni, D. B., Fellowes, J. M., & Remez, R. E. (2003). Learning to recognize

talkers from natural, sinewave, and reversed speech samples. Journal of Experimental

Psychology: Human Perception and Performance, 28, 1447-1469.

Singh, L., Morgan, J. L., & Best, C. T. (2002). Infants' listening preferences: Baby talk or happy

talk? Infancy, 3, 365-394.

Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in

auditory perception. Nature, 416, 87-90.

Spence, M. J., Rollins, P. J., & Jerger, S. (2002). Children’s recognition of cartoon voices.

Journal of Speech, Language, and Hearing Research, 45, 214–222.

Stalinski, S. M., Schellenberg, E. G., & Trehub, S. E. (2008). Developmental changes in the

perception of pitch contour: Distinguishing up from down. Journal of the Acoustical

Society of America, 124, 1759-1763.

Stordahl, J. (2002). Song recognition and appraisal: A comparison of children who use cochlear

implants and normally hearing children. Journal of Music Therapy, 39, 2-19.

Sucher, C. M., & McDermott, H. J. (2009). Bimodal stimulation: Benefits for music perception

and sound quality. Cochlear Implants International, 10, 96–99.

http://ovidsp.tx.ovid.com.myaccess.library.utoronto.ca/sp-3.4.1b/ovidweb.cgi?&S=CMNDFPHFACDDPPLGNCBLLBFBDLBDAA00&Complete+Reference=S.sh.14%7c1%7c1



96

Svirsky, M. A., Chin, S. B., & Jester, A. (2007). The effects of age at implantation on speech

intelligibility in pediatric cochlear implant users: Clinical outcomes and sensitive periods.

Audiological Medicine, 5, 293-306.

Svirsky, M. A., Robbins, A. M., Kirk, K. I., Pisoni, D. B., & Miyamoto, R. T. (2000). Language

development in profoundly deaf children with cochlear implants. Psychological Science,

11, 153-158.

Teagle, H. F. B. & Eskridge, H. (2010). Predictors of success for children with cochlear

implants: The impact of individual differences. In A. L. Weiss (Ed.). Perspectives on

individual differences affecting therapeutic change in communication disorders. New

directions in communications disorders research (pp. 251-272). New York: Psychology

Press.

Thai-Van, H., Cozma, S., Boutitie, F., Disant, F., Trui, E., & Collet, L. (2007). The pattern of

auditory brainstem response wave V maturation in cochlear-implanted children. Clinical

Neurophysiology, 118, 176-189.

Trainor, L. J. (1996). Infant preferences for infant-directed versus noninfant-directed playsongs

and lullabies. Infant Behavior and Development, 19, 83-92.

Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a

result of the vocal expression of emotion? Psychological Science, 11, 188–195.

Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of

preferences for infant-directed singing. Infant Behavior and Development, 20, 383-396.

97

Trehub, S. E., Hannon, E. E., & Schachner, A. (2010). Perspectives on music and affect in the

early years. In P. N. Juslin & J. A. Sloboda (Eds), Handbook of music and emotion:

Theory, research, applications (pp. 645-668). New York: Oxford University Press.

Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: Lullabies and play songs. Advances in

Infancy Research, 12, 43-77.

Trehub, S. E., Trainor, L. J., & Unyk, A. M. (1993). Music and speech processing in the first

year of life. In H. W. Reese (Ed.), Advances in child development and behavior (Vol. 24,

1-35). San Diego: Academic Press.

Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and

parameters: Part I—Recognition of backward voices. Journal of Phonetics, 13, 19–38.

Vieillard, S., Peretz, I., Gosselin, N., Khalfa, S., Gagnon, L., & Bouchard, B. (2008). Happy, sad,

scary and peaceful musical excerpts for research on emotions. Cognition and Emotion,

22, 720-752.

Vongpaisal, T., Trehub, S. E., & Schellenberg, E. G. (2006). Song recognition by children and

adolescents with cochlear implants. Journal of Speech, Language, and Hearing Research,

49, 1091–1103.

Vongpaisal, T., Trehub, S. E., & Schellenberg, E. G. (2009). Identification of TV tunes by

children with cochlear implants. Music Perception, 27, 17–24.

Vongpaisal, T., Trehub, S. E., Schellenberg, E. G., van Lieshout, P., & Papsin, B. C. (2010).

Children with cochlear implants recognize their mother’s voice. Ear and Hearing, 31,

555-566.

98

Vongphoe, M., & Zeng, F. G. (2005). Speaker recognition with temporal cues in acoustic and

electric hearing. Journal of the Acoustical Society of America, 118, 1055–1061.

Vos, P. G., & Troost, J. M. (1989). Ascending and descending melodic intervals: Statistical

findings and their perceptual relevance. Music Perception, 6, 383-396.

Waltzman, S. B., & Cohen, N. L. (1998). Cochlear implantation in children younger than 2 years

old. American Journal of Otology, 19, 158-162.

Williams, B. R. (2006). Inconsistency in reaction time: Normal development and group

differences between those with attention deficit / hyperactivity disorder and controls.

Unpublished doctoral dissertation, University of Victoria.

Wilson B. S., Schatzer, R., Lopez-Poveda, E. A., Sun, X., Lawson, D. T., & Wolford R. D.

(2005). Two new directions in speech processor design for cochlear implants. Ear and

Hearing, 26, 73S-81S.

Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience

shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10,

420-422.

Wu, C. C., Lee Y. C., Chen P. J., & Hsu C. J. (2008). Predominance of genetic diagnosis and

imaging results as predictors in determining the speech perception performance outcome

after cochlear implantation in children. Archives of Pediatrics and Adolescent Medicine,

162, 269-276.

Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music:

Characterization, classification, and measurement. Emotion, 8, 494-521.

Download - Talker Discrimination, Emotion Identification, and Melody ......Anna Volkova Doctor of Philosophy Department of Psychology University of Toronto 2012 Abstract Users of cochlear implants

Top Related