musical ability and non-native speech-sound processing are linked through sensitivity to pitch and...

18
British Journal of Psychology (2014) © 2014 The British Psychological Society www.wileyonlinelibrary.com Musical ability and non-native speech-sound processing are linked through sensitivity to pitch and spectral information Vera Kempe 1 *, Dennis Bublitz 2 and Patricia J. Brooks 2 1 Division of Psychology, Abertay University, Dundee, UK 2 Department of Psychology, College of Staten Island, City University of New York, USA Is the observed link between musical ability and non-native speech-sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon’s Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non-native speech-sound processing: Controlling for non-verbal intelligence, prior foreign language-learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non-native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non-native speech-sound processing, complex tests of musical ability also tap into other shared mechanisms. Compared to children, adults show a reduced ability to distinguish and produce non-native speech sounds, due to perceptual narrowing associated with tuning into the ambient language during the first year of life (e.g., Kuhl et al., 2006). Despite this acknowledged disadvantage, adults still exhibit considerable individual differences in their ability to distinguish subtle phonetic contrasts in a non-native language (Chandr- asekaran, Sampath, & Wong, 2010; Golestani & Zatorre, 2009). Recently researchers have begun to explore whether sensitivity to non-native speech contrasts may be linked to an individual’s musical ability, that is, their ability to process and produce musical sounds. This ability relies on sensitivity to relevant acoustic features of musical sounds, but may also include a wider range of associated component abilities, such as superior short-term memory capacity that can facilitate learning of tonal and rhythmical patterns over time (Williamson, Baddeley, & Hitch, 2010). There are two ways in which a link between musical ability and speech-sound processing could arise: On the one hand, *Correspondence should be addressed to Vera Kempe, Division of Psychology, Abertay University, Bell Street, Dundee DD1 1HG, United Kingdom (email: [email protected]). A subset of the findings was presented at the 2013 Eastern Psychological Association meeting, and at the 35th Annual Conference of the Cognitive Science Society. DOI:10.1111/bjop.12092 1

Upload: abertay

Post on 05-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

British Journal of Psychology (2014)

© 2014 The British Psychological Society

www.wileyonlinelibrary.com

Musical ability and non-native speech-soundprocessing are linked through sensitivity to pitchand spectral information

Vera Kempe1*, Dennis Bublitz2 and Patricia J. Brooks2

1Division of Psychology, Abertay University, Dundee, UK2Department of Psychology, College of Staten Island, City University of New York,USA

Is the observed link between musical ability and non-native speech-sound processing due

to enhanced sensitivity to acoustic features underlying both musical and linguistic

processing? To address this question, native English speakers (N = 118) discriminated

Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal,

pitch, and spectral characteristicswere used tomeasure sensitivity to the various acoustic

features implicated in musical and speech processing. Musical ability was measured using

Gordon’s Advanced Measures of Musical Audiation. Results showed that sensitivity to

specific acoustic features played a role in non-native speech-sound processing:

Controlling for non-verbal intelligence, prior foreign language-learning experience, and

sex, sensitivity to pitch and spectral information partially mediated the link between

musical ability and discrimination of non-native vowels and lexical tones. The findings

suggest that while sensitivity to certain acoustic features partially mediates the

relationship between musical ability and non-native speech-sound processing, complex

tests of musical ability also tap into other shared mechanisms.

Compared to children, adults show a reduced ability to distinguish and produce

non-native speech sounds, due to perceptual narrowing associated with tuning into the

ambient language during the first year of life (e.g., Kuhl et al., 2006). Despite thisacknowledged disadvantage, adults still exhibit considerable individual differences in

their ability to distinguish subtle phonetic contrasts in a non-native language (Chandr-

asekaran, Sampath, & Wong, 2010; Golestani & Zatorre, 2009). Recently researchers

have begun to explore whether sensitivity to non-native speech contrasts may be linked

to an individual’s musical ability, that is, their ability to process and produce musical

sounds. This ability relies on sensitivity to relevant acoustic features of musical sounds,

but may also include a wider range of associated component abilities, such as superior

short-term memory capacity that can facilitate learning of tonal and rhythmical patternsover time (Williamson, Baddeley, & Hitch, 2010). There are two ways in which a link

between musical ability and speech-sound processing could arise: On the one hand,

*Correspondence should be addressed to Vera Kempe, Division of Psychology, Abertay University, Bell Street, Dundee DD1 1HG,United Kingdom (email: [email protected]).

A subset of the findings was presented at the 2013 Eastern Psychological Associationmeeting, and at the 35th Annual Conferenceof the Cognitive Science Society.

DOI:10.1111/bjop.12092

1

individual differences in aptitudemight arise from genetically mediated differences in the

neuroanatomy of subcortical and cortical pathways that result in superior auditory

processing, which may benefit the processing of acoustic features shared between the

sounds of music and language. Individuals with greater aptitude may then be more likelyto take up pursuits that are compatible with these pre-dispositions, such as musical

training (Schellenberg, 2011), which can explain the well-attested observation that

musicians exhibit superior processing of speech sounds (e.g., Kraus, Skoe, Par-

bery-Clark, & Ashley, 2009; Musacchia, Sams, Skoe, & Kraus, 2007; Wong, Skoe, Russo,

Dees, & Kraus, 2007). However, since musical aptitude does not necessarily result in

musical training, a relationship between musical aptitude and speech-sound processing

can also be found in non-musicians (e.g., Delogu, Lampis, & Belardinelli, 2006; Slevc &

Miyake, 2006).On the other hand, musical expertise, operationalized as the amount of musical

training, might exert a direct causal influence on an individual’s ability to discriminate

speech sounds because studying and playing music, among other things, requires and

gradually trains the very precise processing of certain acoustical features (Patel, 2011), as

well as the attentional control associated with auditory processing (Kraus, Strait, &

Parbery-Clark, 2012). Even though a causal relationship between musical expertise and

speech-sound processing cannot reliably be established using quasi-experimental designs

that compare musicians and non-musicians, there is evidence for a correlation betweenthe amount ofmusical training and theprocessing of acoustical features relevant in speech

(Kraus et al., 2009), providing tentative evidence that musical expertise might indeed

play a causal role in improving speech processing.

Since native phonetic categories can often be identified by multiple, partially

redundant acoustic and contextual cues (Toscano & McMurray, 2010), the superior

auditory sensitivity associated with musical ability is unlikely to benefit native

speech-sound processing substantially (Surprenant & Watson, 2001) except when it

occurs under adverse conditions, such as age-related hearing loss (Zendel & Alain,2012) or noise (Kraus et al., 2012; Parbery-Clark, Skoe, Lam, & Kraus, 2009).

However, musical ability should prove especially beneficial for the processing of

novel speech sounds in non-native languages. Indeed, a number of studies have

confirmed a positive relationship between musical ability and non-native speech--

sound processing, be it in second language learners (Milovanov, Huotilainen,

V€alim€aki, Esquef, & Tervaniemi, 2008; Posedel, Emery, Souza, & Fountain, 2011;

Slevc & Miyake, 2006), or in laboratory studies testing discrimination of pitch

differences in non-native prosody (Besson, Sch€on, Moreno, Santos, & Magne, 2007;Deguchi et al., 2012; Magne, Sch€on & Besson, 2006; Marques, Moreno, Castro &

Besson, 2007) and of unfamiliar non-native speech sounds (Delogu et al., 2006;

Delogu, Lampis, & Belardinelli, 2010; Marie, Delogu, Lampis, Belardinelli, & Besson,

2011; Wong et al., 2007).

Regardless of whether musical ability benefits non-native speech-sound processing

because of innate aptitude, training, or both (Nardo & Reiterer, 2009), the central idea

linking these two domains involves shared processing of acoustic features, such that

increased sensitivity to specific features associated with musical ability might also benefitlanguage. In other words, sensitivity to certain acoustic features might mediate the link

between musical ability and non-native speech-sound processing. However, very few

studies have explored the mediating effects of sensitivity to various acoustic features

directly, and those that have done so have typically focused mainly on the mediating role

of sensitivity to pitch (Deguchi et al., 2012; Milovanov et al., 2008; Posedel et al., 2011).

2 Vera Kempe et al.

What is missing is a fuller picture of the range of acoustic features that might mediate the

links between musical ability and processing of specific non-native speech sounds. This

study aims to fill this gap.

Sensitivity to pitch is, of course, themost obvious candidate tomediate a link betweenmusical ability and non-native speech processing (Deguchi et al., 2012; Milovanov et al.,

2008; Posedel et al., 2011), although the degree of precision in pitch processing required

for music is likely to surpass the precision required for processing phonological

information in speech (Patel, 2011). In addition, pitch sensitivity may be critical for the

processing of lexical tones or pitch accents in languages such as Mandarin, Thai or

Norwegian, wherein tonal information requires listeners to process pitch values along

with the direction of pitch change (Chandrasekaran et al., 2010). In line with this idea,

findings linking non-native speech processing with musical ability, operationalized eitherby tests of musical aptitude or by amount of musical training, have been more consistent

for tonal contrasts (Marie et al., 2011; Wong et al., 2007) than for phonological contrasts

(Delogu et al., 2006, 2010; Slevc & Miyake, 2006).

In addition to pitch processing, musical ability is also associated with processing of a

wide variety of other features of sound, such as harmony, timbre, loudness, duration,

tempo, and melodic contour (Patel, 2003), and the enhanced sensitivity to these features

may benefit discrimination of certain non-native phonological contrasts. For example,

high precision in identifyingmusical harmony or the timbre ofmusical instruments – bothof which are features characterized by spectral information –may benefit the processing

of vowels, which are distinguished by the spectral information associated with the first

and second formants. Furthermore, sensitivity to temporal information associated with

beat and rhythm may benefit the perception of consonants, which are distinguished by

rapid temporal changes in various features, such as voice onset time or formant

transitions. This study aims to test these more specific predictions by exploring whether

sensitivity to several non-linguistic acoustic features, encompassing pitch, spectral and

temporal information,mediates links betweenmusical ability and sensitivity to non-nativephonological and tonal contrasts.

Method

Musical aptitude and musical experience were assessed separately to ascertain whether

they are differentially predictive of individual differences in non-native speech-soundprocessing.

Assessment of musical aptitude typically involves administration of complex

measures such as the Seashore test (Seashore, 1919, 1939), the Wing test (Wing,

1968) or Gordon’s Advanced Measures of Musical Audiation (AMMA; Gordon, 1989).

These tests require participants to compare musical phrases to detect subtle

differences, for example, in melody or rhythm, thereby tapping into sensory and

perceptual abilities associated with musical ability, but also into attention control and

working memory capacity required to remember and compare the phrases (Nardo &Reiterer, 2009). They do not, however, assess the motor skills attained with musical

practice, presumably a reflection of the fact that they are designed for selection of

individuals into musical training. In this study, we administer the AMMA, because it is

considered to be independent of an individual’s level of musical expertise (Nardo &

Reiterer, 2009). Furthermore, we used AX discrimination tasks to assess the potential

mediating effects of sensitivity to three types of non-linguistic acoustic information –

Musical ability and non-native speech-sound processing 3

temporal, pitch, and spectral – on sensitivity to two unfamiliar Norwegian speech

sounds – a vowel and a tonal contrast.1 The Norwegian vowel contrast was the /i/ - /y/

contrast which does not exist in English. The tonal contrast was based on one of several

Norwegian dialects with lexical tones consisting of rising and falling-rising pitchaccents, which distinguish minimal pairs of segmentally identical bi-syllabic words. For

example, ‘Hammer’, spoken with the rising tone, is a Norwegian proper noun while

‘hammer’, spoken with the falling-rising tone, denotes the tool. These contrasts

encompass temporal changes in fundamental frequency extending over several

hundreds of milliseconds. Testing Norwegian tonal contrasts provides an interesting

cross-linguistic extension to the more commonly studied non-native tonal contrasts in

languages like Mandarin or Thai. To ensure that it was the pitch contour contained in

these natural language stimuli, and not some other phonetic information, that was thecritical feature used for discrimination, we also tested sensitivity to artificially

synthesized pitch contours which were non-linguistic, pure-tone analogues of the

Norwegian tonal contrasts.

Crucially, we examined to what extent temporal, pitch, and spectral sensitivity

mediated links between musical ability and non-native speech-sound processing. To

control for non-verbal intelligence, we administered Cattell’s Culture Fair Intelligence

Test (Cattell & Cattell, 1973). Music and language background questionnaires inquired

about participants’ length of musical training and number of languages learned athome or at school, and elicited self-ratings of proficiency in each language (L2 and

L3). Previous research (Bowles, Silbert, Jackson, & Doughy, 2011; Kempe et al., 2012)

had found a male advantage in processing of non-speech and speech sounds

encompassing rapid temporal changes; hence, participant sex was also considered in

the analyses.

ParticipantsOne hundred and eighteen speakers of American English (59 women, 59 men), mean age

20 years, range 19–31 years, participated in the study. L3 proficiency self-ratings were

missing from threeparticipants, andpitch discrimination data fromoneparticipant. These

participants were excluded from analyses including these variables.

Materials

Advanced Measures of Musical Audiation (AMMA)

Gordon (1989) developed the AMMA to measure the ability to detect subtle tonal and

rhythmical differences in music. It consists of 30 items, each comprising two musical

phrases – a short musical ‘statement’ followed after four seconds by an ‘answer’ of thesame length. The duration of the musical phrases ranges from 4 to 11 s. Each item

contains either one or more tonal changes, or one or more rhythmic changes, but

never both. Participants have to decide whether the phrases are the same or different.

For ‘different’ phrases, participants have to decide whether the difference involves a

tonal or rhythm change. The test yields tone and rhythm scores, as well as a composite

score.

1 In a previous study (Kempe, Thoresen, Kirk, Schaeffler, & Brooks, 2012), we had also included an unfamiliar Norwegianconsonant contrast, the /c�/ - /∫/ contrast, which proved to be too easy for native English participants resulting in ceiling effects.Wetherefore did not include this contrast in this study.

4 Vera Kempe et al.

AX (‘same-different’) tasks

All six AX tasks comprised 32 ‘same’ and 32 ‘different’ trials. The individual sound files

comprising the stimulus pairs are provided in the Supplementary Materials.

(a) Testing sensitivity to temporal information

We synthesized eight 250 Hz sinusoidal carrier waves with an overall duration of 600 ms

differing in amplitude envelope onset rise times, and otherwise devoid of segmental,

spectral, and pitch information. The onset of the amplitude envelope was faded in with a

depth of 100% so that rise times reached a maximum amplitude at 0, 10, 20, 30, 60, 70, 80

and 90 ms. ‘Different’ trials comprised pairs of sounds differing in rise times by 60 ms

(e.g., 0 ms vs. 60 ms, 10 ms vs. 70 ms, etc.), centred around 45 ms, a value which hasbeen reported as the category boundary between ‘bowed’ and ‘plucked’ sounds (Cutting

& Rosner, 1974). All sounds had a fade-out period of 50 ms so that the overall duration of

the sound at maximum amplitude was comparable to the duration of sounds in the pitch

and spectral sensitivity tasks described below.

(b) Testing sensitivity to pitch

We created eight 500 ms pure tone sinusoidal carrier waves ranging from 100 to3,000 Hz. The tones increased following a quadratic trend resulting in tones of 100, 200,

400, 700, 1,100, 1,600, 2,200, and 3,000 Hz. Each tone was paired with a contrast tone

with a frequency exceeding its counterpart by 2%, resulting in tones of 102, 204, 408, 714,

1,122, 1,632, 2,244, and 3,060 Hz. The 2% difference was chosen to be slightly below

pitch discrimination thresholds established in previous studies examining untrained

listeners (e.g., Halliday, Moore, Taylor, & Amitay, 2011; Surprenant &Watson, 2001). The

cumulative increase was designed to create sound pairs that subjectively sampled the

pitch range at roughly similar intervals, taking into account the non-linearity of pitchperception. For the ‘different’ trials, each sound was paired with its corresponding

contrast sound, resulting in pairs of 100 Hz versus 102 Hz, 200 Hz versus 204 Hz, 400 Hz

versus 408 Hz etc.

(c) Testing sensitivity to spectral information

For this test, we combined the pure tones created for the pitch test into complex tones of

500 ms duration, with each complex tone comprising low (e.g., 100 Hz or 200 Hz),middle (e.g., 700 Hz or 1,100 Hz) and high (e.g., 2,200 Hz or 3,000 Hz) frequencies.

These frequencies were chosen to broadly mimic the fundamental frequency and the first

two formants of speech, which are crucial for vowel perception. For ‘different’ pairs, one

of the component toneswas increasedby 2%, and this change affected either themiddle or

the high frequency, for example, a ‘different’ pair might include a complex tone

consisting of frequencies of 100, 1,100 and 3,000 Hz and a complex tone consisting of

frequencies of 100, 1,122 and 3,000 Hz.

(d) Testing sensitivity to the Norwegian tonal contrast

Recordings by a male native speaker of Norwegian of eight minimal pairs of Norwegian

words containing a contrast between rising and falling-rising tonal contours (see

Appendix) were taken from Kempe et al. (2012). Each stimulus word was recorded

Musical ability and non-native speech-sound processing 5

twice, with ‘same’ trials using two within-category exemplars (e.g., ‘Hammer1’,

‘Hammer2’) and ‘different trials using two different-category exemplars (e.g., ‘Hammer1’,

‘hammer1’). Four word pairs contained short vowels in the first (stressed) syllable (mean

length 64 ms); the remaining four pairs contained long vowels (mean length 187 ms).Crucially, words with rising and with falling-rising tones did not differ in the length of the

first vowel (118 vs. 133 ms, p = .5), overall word length (396 vs. 417 ms, p = .2), and

metric stress; thus, duration and stress could not be used as additional cues to tonal

contrasts. Corresponding short and long vowel pairs were matched for initial phonemes.

To ensure that a male advantage in discriminating the tonal contrasts (as reported in

Kempe et al., 2012)was not just an artifact of using amale voice in the recordings,we also

created a synthetic female voice version of the stimuli. Synthesizing the female voice was

necessary because our previous recordings by a female speaker from the same region ofNorway had a slower speech rate, and retained greater articulatory clarity even after time

compression tomatch themale speech rate while maintaining the pitch characteristics of

the female speech. Pilot studies demonstrated that these differing features of the female

recordings affected the degree and range of sensitivity to the tonal contrast. To match for

articulatory clarity and indexical features, we therefore submitted the male voice stimuli

to the voice gender change algorithm in PRAAT (Boersma & Weenink, 2011) using a

fundamental frequency of 220 Hz and scaling the first formant up by 20%. All results

below are averaged over the male- and female-voiced versions of the AX task.

(e) Testing sensitivity to pitch contours comprising non-speech analogues of the tonal contrast

The non-speech analogues of the Norwegian tonal contrast comprised sine waves with

pitch contours extracted from the fundamental frequency modulation of both the male-

and female-voiced versions of the Norwegian tonal contrasts, resulting in contours in the

male pitch range and contours in the female pitch range. These stimuli contained no other

information than the pitch contour of the Norwegian tones. The male-voiced andsimulated female-voiced versions of the AX task were administered separately; sensitivity

measures were then averaged over these male- and female-voiced versions.

(f) Norwegian vowel contrast

We used eight minimal pairs of Norwegian monosyllabic words containing the vowel /i:/

or /I/ versus /y:/ or /Y/ to test discrimination of a contrast between high front unrounded

and rounded vowels that does not exist in English (see Appendix). Recordings of a malenative speaker of Norwegian were taken from Kempe et al. (2012). Each stimulus word

was recorded twice, with ‘same’ trials using twowithin-category exemplars (e.g., ‘rykk1’,

‘rykk2’) and ‘different trials using two different-category exemplars’ (e.g., ‘rykk1’, ‘rikk1’).

Four word pairs contained the short vowels /I/ and /Y/ (mean length 67 ms), and the

remaining four word pairs contained the long vowels /i:/ and /y:/ (mean length 150 ms).

Again, on average both members of a minimal pair did not differ in vowel length (108 vs.

108 ms, p = .9) and overall word length (381 vs. 382 ms, p = .95; thus, duration could

not serve as additional cue.

Other measures

Participants also completed (1) Cattell’s Culture-Fair Test of Nonverbal Intelligence, Scale

3, Form A (Cattell & Cattell, 1973); (2) a music background questionnaire on which they

6 Vera Kempe et al.

provided information about the extent of their musical training (if any); and (3) a language

background questionnaire onwhich they indicated the number of languages learned, and

rated their reading, writing, speaking and comprehension abilities in all languages on a

scale from 1 (very poor) to 6 (native-like). If participants listed no foreign language,proficiency was coded as 0.

Procedure

AXdiscrimination taskswere presented in three blocks,with the first block containing the

temporal, pitch and spectral AX tasks, the second block containing the male- and

female-voiced tonal AX tasks as well as the vowel AX task, and the third block containing

the two AX tasks presenting the extracted non-speech pitch contours of the male- andfemale-voiced Norwegian tonal stimuli. The fixed block sequence ensured that the

non-speech pitch contours could not prime the processing of the tonal contrast, and that

variance due to order effectswas not confoundedwith participant variance, although task

order was counterbalanced within each block. AMMA and Culture Fair Intelligence Test

were interspersed between blocks with their order counterbalanced as well. Informed

consent was obtained and background questionnaires were completed prior to any of the

tasks. The entire session lasted around 90 min.

In each of the AX tasks, participants received eight practice trials with feedback,randomly selected from the entire set of trials. These were followed by 64 test trials

presented without feedback, 32 ‘same’ and 32 ‘different’ trials. Each AX task was

constructed to test eight different instantiations of the contrast of interest. In the

‘different’ trials, each of the eight stimulus pairings representing the contrast was

repeated four times, twice in each order. In the ‘same’ trials, each of the 16 items was

paired with itself; these identical pairs were repeated twice. Note, however, that for the

Norwegian tonal and vowel contrasts, identical pairs comprised different within-category

exemplars, obtained by recording multiple examples of each word, as described above.Within each trial, sound stimuli were separated by an inter-stimulus interval of 200 ms,

and the inter-trial interval was 500 ms. Participants were asked to press the ‘s’ key if they

perceived the sounds to be the same and the ‘d’ key if they perceived them to be different.

Each AX task lasted approximately 5 min.

Results

Participants’ performance on the AX tasks was converted into A0, a sensitivity measure

that corrects for differences in response bias; A0 scores range from 0 to 1, with 0.5

corresponding to chance. Table 1 presents themeans, standard deviations, and ranges for

musical training (years), foreign language backgroundmeasures, AMMA tone, rhythm and

composite scores, as well as the auditory sensitivity and speech-sound processing tasks.

Comparing performance of different speech sounds

Since previous research had demonstrated a male advantage in discriminating some

non-native speech sounds (Bowles et al., 2011; Kempe et al., 2012), results for all

measures are presented for male and female participants separately. For the speech

processing tasks, a 3 (Sound Type: tonal, pitch contour, vowel) 9 2 (Sex) mixed-type

ANOVAwith Sound Type as a within-subjects factor yielded a main effect of Sound Type,

Musical ability and non-native speech-sound processing 7

F(2, 232) = 11.9, p < .001. Bonferroni-corrected post-hoc tests with a revised alpha-level

of .017 indicated that performancewas superior for the vowels compared to the tonal and

pitch contour contrasts, all t’s > 3.9, all p’s < .001, and that tonal contrasts and pitch

contour contrasts did not differ from each other, p = .7. The interaction between Sex and

Sound Type fell short of significance, F(2, 202) = 2.05, p = .13, indicating that the

expected male advantage for processing the tonal contrasts (Bowles et al., 2011; Kempe

et al., 2012) could not be confirmed in this study.

Zero-order correlations

For a preliminary exploration of the links between linguistic background, musical

aptitude, musical expertise, sensitivity to acoustic features, and non-native speech-sound

processing, we computed zero-order correlations, see Table 2. After Bonferroni correc-

tionusing a revised alpha-level of 0.0006,we found that all languagebackgroundmeasures

were positively correlated. There was also a strong positive correlation between the

AMMA tonal and rhythm scores. Sensitivity to the tonal contrast and the non-linguisticpitch contourwere also strongly correlated, suggesting that pitch contourwas indeed the

dominant cue for discrimination of the Norwegian tonal stimuli. Note that the positive

correlation between years of musical training and musical aptitude fell short of

significance only after Bonferroni correction, which is a very conservative way to guard

against Type 1 error (p = .002 for AMMA tonal score, p = .006 for AMMA rhythm score).

Crucially, the zero-order correlations confirmed the link betweenmusical aptitude and

non-native speech-sound processing: The AMMA tonal score was positively correlated

with discrimination of the tonal and vowel contrasts, and the AMMA rhythm score waspositively correlated with discrimination of the tonal contrast. Note that musical training

failed to show any significant zero-order correlations with non-native speech-sound

processing.

Table 1. Means, standard deviations and ranges (the latter two in parentheses) for all variables, aswell as

results of t-tests comparing men and women. AMMA, Advanced Measures of Musical Audiation

Men Women t (df), p

Culture fair intelligence 24.1 (4.5; 12–37) 23.8 (5.1; 14–36) 0.3 (116), n.s.

No. of languages (incl. Eng.) 2.5 (0.7; 1–4) 2.4 (0.6; 1–4) 0.7 (116), n.s.

L2-rating 2.6 (1.4; 0–6) 2.5 (1.4; 0–6) 0.3 (114), n.s.

L3-rating 0.8 (1.3; 0–5) 0.7 (1.1; 0–3.75) 0.5 (112), n.s.

Musical training (yrs) 1.7 (2.3; 0–8) 4.3 (5.5; 0–20) �3.5 (116), <.001AMMA tone 23.2 (4.0; 14–32) 23.7 (4.5; 13–34) �0.6 (116), n.s.

AMMA rhythm 25.3 (3.5; 15–31) 25.4 (4.0; 15–36) �0.2 (116), n.s.

AMMA composite 48.5 (6.6; 35–63) 48.9 (8.6; 27–70) �0.3 (116), n.s.

Auditory sensitivity measures (A0)Temporal sensitivity 0.64 (0.14) 0.65 (0.15) �0.4 (116), n.s.

Pitch sensitivity 0.67 (0.15) 0.61 (0.14) 2.1 (116), <.05Spectral sensitivity 0.61 (0.13) 0.59 (0.15) 0.9 (116), n.s.

Speech-sound processing (A0)Tonal contrast 0.77 (0.11) 0.75 (0.10) 1.2 (116), n.s.

Pitch contour 0.78 (0.10) 0.74 (0.10) 1.9 (116), .06

Vowel contrast 0.80 (0.10) 0.80 (0.11) �0.1 (116), n.s.

8 Vera Kempe et al.

We also found a link between sensitivity to pitch and spectral information and

discrimination of the tonal contrast, and between sensitivity to temporal and spectralinformation and discrimination of the vowel contrast. The zero-order correlations

between musical aptitude and musical training on the one hand, and sensitivity to the

acoustical features on the other hand were not significant except for the correlation

between the AMMA rhythm score and sensitivity to pitch.However, these links need to be

revisited after controlling for the background variables (sex, non-verbal intelligence and

prior language experience), which is achieved by the regression and mediation analyses

reported below.

Regression analyses

Next, we performed regression analyses to examine whether musical aptitude and

musical trainingmade independent contributions to explaining the variance in non-native

speech-sound processing, after sex, non-verbal intelligence and prior language experi-

encewere partialled out. Prior language experience comprised three variables: number of

known languages and self-rated proficiency in L2 and L3. Since the ANOVA presented

above had not revealed any differences in overall performance between linguistic andnon-linguistic pitch contour discrimination, and because of the strong correlation

between these two measures, we omitted pitch contour discrimination as a criterion

variable from further analyses. Furthermore, because the AMMA tonal and rhythm scores

were highly correlated, we used the composite AMMA score in subsequent analyses to

avoid collinearity in the models.

For tonal contrasts as the criterion variable, a stepwise regression analysis revealed that

the AMMA composite score accounted for 12.5% of adjusted variance over and above the

background variables, partial F(1,106) = 17.13, p < .001, and years of musical trainingdid not explain any additional adjusted variance (partial F < 1). In contrast, entering years

of musical training first after the background variables accounted for a marginally

Table 2. Zero-order Pearson correlations between predictor variables

2 3 4 5 6 7 8 9 10 11 12 13

1. Culture fair intelligence .32 .02 .22 .11 .24 .28 .26 .13 .01 .23 .22 .14

2. No. of languages .44* .78* .08 .16 .24 .29 .18 .09 .19 .09 .16

3. L2-rating .56* .15 .06 .05 .18 .08 .07 .18 .03 .17

4. L3-rating .12 .21 .19 .24 .13 .08 .20 .06 .30

5. Musical training (years) .29 .25 .13 .12 .27 .18 .05 .12

6. AMMA tone .71* .18 .27 .21 .39* .24 .32*

7. AMMA rhythm .13 .33* .26 .45* .36* .28

8. Temporal sensitivity .13 .05 .29 .23 .34*

9. Pitch sensitivity .50* .49* .29 .28

10. Spectral sensitivity .42* .33* .36*

11. Tonal contrast .62* .47*

12. Pitch contrast .42*

13. Vowel contrast

Note. N’s range from 112 to 118 depending on missing values. AMMA, Advanced Measures of Musical

Audiation.

*Indicates significance after Bonferroni correction at p < .0006.

Musical ability and non-native speech-sound processing 9

significant 2.0% of adjusted variance, partial F(1,106) = 3.21, p = .08, while the AMMA

composite score entered next accounted for another 10.3% of adjusted variance, partial F

(1,105) = 14.14, p < .001.

For vowel contrasts as criterion variable, the AMMA composite score, entered after thebackground variables, accounted for 8.0% of adjusted variance, partial F(1,106) = 10.94,

p < .01, while years of musical training did not explain any additional adjusted variance

(partial F < 1). When years of musical training were entered first after the background

variables, they accounted for no additional adjusted variance (partial F < 1) while the

AMMA composite score, entered next, accounted for another 7.5% of adjusted variance,

partial F(1,105) = 10.26, p < .01. This analysis indicates that musical aptitude, but not

musical experience, was related to non-native speech-sound processing.

Mediation analyses

The next set of analyses explored whether the link between musical aptitude and

non-native speech-sound processing was mediated by sensitivity to temporal, pitch and

spectral information. To this end,we performed a series ofmediation analyses, employing

bootstrapping to estimate the 95% confidence intervals of the indirect effect using a

procedure introduced by Hayes (2013) (the SPSS-macro PROCESS, downloadable from

http://afhayes.com/introduction-to-mediation-moderation-and-conditional-pro-cess-analysis.html). An indirect effect is deemed to be statistically significant at p = .05 if

the confidence interval does not include zero. The mediation analyses included Culture

Fair intelligence, sex, number of studied languages, and self-ratings in L2 and L3 as

covariates. For the tonal contrast (see Figure 1), the mediation analysis (based on 10,000

bootstrap samples) with temporal, pitch, and spectral sensitivity as mediators confirmed

an indirect effect of the AMMA score through pitch sensitivity (ab = .0011, 95% CI:

.0003–.0025) and through spectral sensitivity (ab = .0007, 95% CI: .0001–.0020) as well

as a direct effect of the AMMA score (c0 = .0031, 95% CI: .0008–.0053). For the vowelcontrast (see Figure 2), the mediation analysis also confirmed an indirect effect through

spectral sensitivity (ab = .0009, 95% CI: .0002–.0024) and a direct effect of the AMMA

score (c0 = .0028, 95% CI: .0004–.0051). There was no indirect effect through temporal

sensitivity in themediation analyses, as temporal sensitivitywas not linked to theAMMA in

the first place. These analyses show that the link betweenmusical aptitude and non-native

speech-sound processing is partially mediated by spectral sensitivity, and, for tonal

contrasts, it is also partially mediated by pitch sensitivity; however, the findings also show

that there remains a residual direct link between musical aptitude and sensitivity to thenon-native speech sound contrasts.

Even though therewas no significant correlation between years ofmusical training and

sensitivity to the non-native speech contrasts, we fitted the same mediation model using

years of musical training as a predictor.2 This analysis revealed only indirect effects

through spectral sensitivity on performance for each of the speech sound tasks (ab’s from

.0016 to .0021, CI’s between .0001 and .0053), but showed no direct effects of years of

musical training (all CI’s encompassed 0). Thus, the mediation analyses suggest that both

2Wealso obtainedmusicality self-ratings whichwere highly correlatedwith years ofmusical training (r = .71, p < .001) but onlyweakly correlated with the AMMA scores (r = .33, p < .001) suggesting that participants based the appraisal of their ownmusical abilities predominantly on howmuchmusical training they had received. The results of themediation analyses are virtuallyidentical when musicality self-ratings are used instead of years of musical training as a predictor.

10 Vera Kempe et al.

musical expertise and musical aptitude are linked to non-native speech-sound processing

via pitch and spectral processing.

Discussion

The aim of this study was to determine whether sensitivity to specific acoustic features

mediates the link between musical ability and non-native speech-sound processing, and

whether musical aptitude or musical expertise is the better predictor of non-native

speech-sound processing.

Links between musical ability and sensitivity to acoustic features

When effects of sex, intelligence, and language background were controlled for, both

musical aptitude and musical expertise were linked to the ability to process frequency

(pitch and spectral) information, but not temporal information in the range below100 ms

(see paths on the left in Figures 1 and 2). This is perhaps not surprising given that the

temporal changes relevant tomusic involve a longer time scale in the order of hundreds or

thousands ofmilliseconds,which is different from the rapid temporal changes in the order

of tens ofmilliseconds that are present in segmental linguistic units, andwere simulated inour temporal sensitivity test. While such sensitivity to very rapid temporal changes is

fundamental for language processing (e.g., Goswami et al., 2002), as indicated by the link

between the ability to discriminate subtle temporal changes in amplitude envelope onset

rise times and discrimination of the non-native tonal and vowel contrasts, it seems to be

Figure 1. Results of mediation analysis testing the direct link, and the indirect link through temporal,

pitch, and spectral sensitivity, between the composite Advanced Measures of Musical Audiation (AMMA)

score and Norwegian tonal contrast processing, controlling for sex, Culture Fair Intelligence, number of

previously learned languages, and proficiency self-ratings in L2 and L3.Only significant links and associated

standardized coefficients are shown. Significant effects of the covariates are indicated by solid grey lines.

***p < .001; **p < .01; *p < .05.

Musical ability and non-native speech-sound processing 11

less relevant for musical processing. This qualifies claims that music benefits speech due

to requirements for more precise processing of auditory information (Patel, 2011) by

suggesting that this does not apply as much to temporal processing as it applies to

frequency-related information.

Links between sensitivity to acoustic features and non-native speech-sound processing

We had hypothesized that the discrimination of tonal contrasts would be linked totemporal, pitch, and spectral sensitivity,whereas discrimination of vowel contrastswould

mainly be linked to spectral sensitivity. Indeed, for tonal contrasts this prediction bore out

after controlling for sex, language background and intelligence, especially with respect to

the effects of temporal and spectral sensitivity (see the paths on the right in Figure 1).

Thus, the ability to process both rapid temporal changes and frequency information

proved important for the discrimination of the tonal contrasts. This confirms the

previously observed role of temporal information (Kempe et al., 2012), in addition to

pitch and spectral information, in the processing of pitch contours – a finding that is in linewith other investigations of tonal contour discrimination, which have shown that when

non-musicians were successful in processing non-native tonal contrasts they tended to

rely on the direction of pitch change over time, whereas non-musicians who were

unsuccessful in the same task tended to rely on pitch height discrimination (Chandr-

asekaran et al., 2010).

Figure 2. Results of mediation analyses testing the direct link, and the indirect link through temporal,

pitch, and spectral sensitivity, between the composite AdvancedMeasures of Musical Audiation (AMMA)

score and Norwegian /i/ - /y/ vowel discrimination, controlling for sex, Culture Fair Intelligence, number

of previously learned languages, and proficiency self-ratings in L2 and L3. Only significant links and

associated standardized coefficients are shown. Significant effects of the covariates are indicated by solid

grey lines. The figure also lists the L2swhich aremost likely responsible for the link between L3-rating and

vowel processing. ***p < .001; **p < .01; *p < .05.

12 Vera Kempe et al.

For vowel contrasts, we found the expected link to spectral sensitivity as well as a link

to temporal sensitivity (see the paths on the right in Figure 2). The link between spectral

sensitivity and vowel processing is not surprising, as vowels are characterized by

differences in spectral information. However, the finding that temporal sensitivity wasalso a significant predictor for vowel processing was unexpected. Although we carefully

controlled for vowel length and metrical stress to exclude temporal information as a cue,

participants might have been sensitive to subtle durational differences in other segments

when trying to discriminate the Norwegian words that differed with respect to their

vowels. It should also be noted that performance on the vowel contrast was positively

related to self-ratings in an L3. Most likely this reflects the fact that some participants had

studied languages containing the /i/ - /y/ contrast (Albanian, French, and German), and

that prior experience with this contrast may have benefitted vowel processing. Purelycoincidentally, in our sample these languages had been learned more often as L3s rather

than as L2s.

The observation that temporal, spectral and, to some extent, pitch sensitivity were all

linked to non-native speech-sound processing complements approaches that tend to

emphasize the role of rapid temporal auditory processing as themain sensory component

of language processing (Goswami et al., 2002; Tallal, 1980). While rapid temporal

information plays a greater role in the discrimination of consonants and in identifying

patterns of metrical stress, frequency information may be more important for thediscrimination of vowels. Most likely, both types of information are crucial when it comes

to the processing of more complex segmental and suprasegmental aspects of speech,

such as lexical tones and prosodic contours.

Does sensitivity to acoustic features mediate the link between musical ability and

non-native speech-sound processing?

We found that the link between musicality and non-native speech-sound processing waspartially mediated by spectral sensitivity and to some extent pitch sensitivity, but not

temporal sensitivity. This is in line with previous findings showing pitch and chord

processing to mediate observed links between musical ability and non-native

speech-sound processing (Milovanov et al., 2008; Posedel et al., 2011). The fact that

spectral sensitivity was the more consistent mediator suggests that the processing of

complex spectral information may be of greater relevance to both music and language

than the processing of pure tones.

As discussed above, the lack of a mediating effect of temporal sensitivity most likelyreflects the fact that temporal changes in music take place on a slower time scale than

temporal changes in language. This finding is only partially compatible with claims that

musical and linguistic processing exploit different cues –with languagemainly relying on

rapid temporal processing and music relying on processing of pitch and spectral

information (Zatorre, Belin, & Penhune, 2002). Instead, our findings support the idea of

shared mechanisms betweenmusic and language (Patel, 2003; Strait, Hornickel, & Kraus,

2011) by suggesting that processing of frequency information is one of the mechanisms

that may be shared across the two domains, while processing of rapid temporalinformation appears to be more important for language.

Finally, links with musical ability – direct and indirect ones –were found, not just for

the tonal contrast, but also for the /i/ - /y/ vowel contrast. This finding differs from studies

reporting links with musicality for non-native lexical tones but not for non-native

segmental contrasts (Delogu et al., 2006, 2010). The discrepancy between the results of

Musical ability and non-native speech-sound processing 13

these earlier studies and ours may have arisen from the fact that we tested only one

particular vowel contrast, whereas Delogu and colleagues tested a variety of non-native

Mandarin vowels and consonants. It is conceivable that the inclusion of consonants,

which in many instances requires processing of rapid temporal information, for example,to detect voice onset times or frequency transitions, may have attenuated overall links

between processing of phonemes and musical ability in their studies.

The mediation analyses also revealed a direct link between musicality and non-native

speech-sound processing, which was not explained by shared variance with the

sensitivity to acoustic features. This finding corroborates very closely the results of a

similar studypublished at the time of submission: Perrachione, Fedorenko, Vinke,Gibson,

and Dilley (2013) also reported a link between the processing of linguistic prosody and

musical pitch contours, after controlling for basic pitch, temporal and visuo-spatialprocessing. While their results were obtained for the processing of suprasegmental

linguistic information, our study extends the finding to the processing of segmental

information, specifically, to the processing of vowel contrasts. A minor discrepancy

between the results of the Perrachione et al. study and ours arises onlywith respect to the

role of rapid temporal processing, which was linked to speech-sound processing in our

study but not in theirs. This discrepancy may be due to the different ways in which rapid

temporal processing was tested in the two studies: While in the present study, temporal

processing required detecting differences in amplitude onset rise time, in Perrachioneet al. (2013), it required detection of differences in interval durations of click trains,

which are akin to ‘acoustic flutter’. Nonetheless, the converging evidence for a direct link

between the processing of musical and linguistic stimuli from these two studies implies

that, in addition to basic pitch and spectral sensitivity, there may be other, as yet

unspecified, mechanisms that mediate processing of novel speech sounds and measures

of musical aptitude.

Although our study was not designed to explore these other mechanisms, we would

like to offer some speculations as to what they might be. Tests of musical aptitude andspeech-sound processing rely on task-specificworkingmemory capacity: Both the AMMA

and the AX task require participants to retain strings of sounds in auditory working

memory for purposes of comparison. At present, it is unclear to what extent holding

musical phrases in mind relies on general working memory capacity (Williamson et al.,

2010) or recruits different neural pathways, such as a phonological loop, supporting

rehearsal of phonological information, or amusical loop, supporting rehearsal of tonal and

rhythmic information (Schulze, Zysset, M€uller, Friederici, & Koelsch, 2011). Neuro-imag-

ing provides some evidence for distinct working memory mechanisms underlying skilledprocessing in the musical and linguistic domains (Schulze et al., 2011). However,

individuals who have not received much musical training may initially recruit the

phonological loop to performmusical tasks, thereby engaging the same circuits that play a

role in speech processing. Given that none of our participants were professional

musicians, it is conceivable that they relied on the phonological loopwhenprocessing the

musical stimuli, which could account for the direct link with non-native speech-sound

processing. This suggestion is consistent with neuro-imaging studies showing a

substantial overlap of neural pathways underpinning verbal and tonal working memoryin non-musicians, but a separation of these pathways in musicians, distinguished by

differences in the motor requirements for executing music and speech (Koelsch et al.,

2009; Schulze et al., 2011). Future researchwill have to determinewhether themediating

role of workingmemory is task-dependent or more general, andwhether such potentially

shared mechanisms diverge with increasing musical expertise.

14 Vera Kempe et al.

Musical aptitude versus musical expertise

A number of studies have conceptualized musicality as musical expertise, and suggested

that musical training may hone sensory and cognitive abilities, which, in turn, benefit

non-native speech-sound processing (Marie et al., 2011; Wong et al., 2007). Under thisscenario, one would expect the amount of musical training to exert an effect on

non-native speech-sound processing. In our sample, there was a trend towards a

correlation between musical aptitude and reported years of musical training, which,

however did not reach significance after Bonferroni correction, despite a range of years of

musical training (0–20 years) that is compatible with other studies (e.g., Kraus et al.,

2009). Perhaps the correlationwas not very strong because participants reported years of

musical tuition in school, yet musical school programmes vary in the degree with which

they employ selection by musical aptitude for admission. Still, we did find that musicaltraining was linked with non-native speech-sound processing through the mediating

effect of enhanced spectral processing. However, there was no residual direct link

between musical training and non-native speech-sound processing. Thus, unlike the

AMMAmeasure of musical aptitude, a simple self-report measure of musical expertise did

not seem to tap into the additional mechanisms shared between musical and linguistic

processing that might support non-native speech-sound discrimination over and above

spectral sensitivity. Such an interpretation is supported by findings that, compared to

non-musicians, musicians demonstrate superior performance in auditory sensoryprocessing of speech sounds, presumably due to the strengthening of auditory attention

via cortico-fugal pathways (for an overview see Strait & Kraus, 2011), but do not show

superior performance on awhole range of executive function tasks (Schellenberg, 2011).

Future research will have to determine whether additional mechanisms captured by

musical aptitude tests, such asworkingmemory capacity or executive functioning, reflect

joint task demands or whether musical aptitude – apparently the better predictor of

non-native speech-sound processing – is somewhat independent from the effects of

musical training.In conclusion, our findings show that the link betweenmusical ability and the ability to

discriminate unfamiliar non-native speech sounds is partially mediated by sensitivity to

certain acoustic features relevant formusic and language. The present results suggest that

this mediating role is mainly fulfilled by sensitivity to spectral information, as contained in

complex tones, but not by temporal information, suggesting that the time scale that is

relevant to speech processing is different from the time scale relevant for processing

music. However, the observation that sensitivity to acoustic features only partially

mediated the link between musical ability and non-native speech-sound processingunderscores the importance of studying the role of other potential mediating mecha-

nisms.

Acknowledgement

We thank Christina Grenoble, Gina Martino, and Joseph Rivera for help in collecting data.

References

Besson,M., Sch€on, D.,Moreno, S., Santos, A., &Magne, C. (2007). Influence ofmusical expertise and

musical training on pitch processing in music and language. Restorative Neurology and

Neuroscience, 25, 399–410.

Musical ability and non-native speech-sound processing 15

Boersma, P., & Weenink, D. (2011). Praat: doing phonetics by computer [Computer program]

Version 5316. Available at: http://wwwfonhumuvanl/praat/

Bowles, A. R., Silbert, N. H., Jackson, S. R., & Doughy, C. J. (2011). Individual differences in working

memorypredict second language learning success. Poster presented at the 52ndAnnualMeeting

of The Psychonomic Society, Seattle, WA.

Cattell, R. B., & Cattell, H. E. P. (1973). Measuring intelligence with the culture-fair tests.

Champaign, IL: Institute for Personality and Ability Testing.

Chandrasekaran, B., Sampath, P.D.,&Wong, P.C. (2010). Individual variability in cue-weighting and

lexical tone learning. The Journal of the Acoustical Society of America, 128, 456–465. doi:10.1121/1.3445785

Cutting, J. E., & Rosner, B. S. (1974). Categories and boundaries in speech and music. Perception

and Psychophysics, 16, 564–570. doi:10.3758/BF03198588Deguchi, C., Boureux, M., Sarlo, M., Besson, M., Grassi, M., Sch€on, D., & Colombo, L. (2012).

Sentence pitch change detection in the native and unfamiliar language in musicians and

non-musicians: Behavioral, electrophysical and psychoautistic study. Brain Research, 1455,

75–89. doi:10.1016/j.brainres.2012.03.034Delogu, F., Lampis, G., & Belardinelli, M. O. (2006). Music-to-language transfer effect: May melodic

ability improve learning of tonal languages by native nontonal speakers? Cognitive Processes, 7,

203–207. doi:10.1007/s10339-006-0146-7Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: Musical ability

enhances specific aspects of foreign language perception. European Journal of Cognitive

Psychology, 22, 46–61. doi:10.1080/09541440802708136Golestani, N., & Zatorre, R. J. (2009). Individual differences in the acquisition of second language

phonology. Brain and Language, 109, 55–67. doi:10.1016/j.bandl.2008.01.005Gordon, E. E. (1989). Advanced Measures of Music Audiation. Chicago, IL: GIA.

Goswami, U., Thompson, J., Richardson, U., Stainthorp, R., Highes, D., Rosen, S., & Scott, S. K.

(2002). Amplitude envelope onsets and developmental dyslexia: A new hypothesis.

Proceedings of the National Academy of Sciences USA, 99, 10911–10916. doi:10.1073/pnas.122368599

Halliday, L. F., Moore, D. R., Taylor, J. L., & Amitay, S. (2011). Dimension-specific attention directs

learning and listening on auditory training tasks. Attention, Perception, and Psychophysics, 73,

1329–1335. doi:10.3758/s13414-011-0148-0Hayes, A. F. (2013). Introduction to mediation, moderation and conditional process analysis: A

regression-based approach. New York: The Guilford Press.

Kempe, V., Thoresen, J., Kirk, N. W., Schaeffler, F., & Brooks, P. J. (2012). Individual differences in

the discrimination of novel speech sounds: Effects of sex, temporal processing, musical and

cognitive abilities. PLoS ONE, 7 (11), e48623. doi:10.1371/journal.pone.0048623

Koelsch, S., Schulze, K., Sammler, D., Fritz, T., M€uller, K., & Gruber, O. (2009). Functional

architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping, 30,

859–873. doi:10.1002/hbm.20550

Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Experience-inducedmalleability in neural

encodingof pitch, timbre, and timing.Annals of theNewYorkAcademyof Sciences,1169, 543–557. doi:10.1111/j.1749-6632.2009.04549.x

Kraus, N., Strait, D. L., & Parbery-Clark, A. (2012). Cognitive factors shape brain networks for

auditory skills: Spotlight on auditory working memory. Annals of the New York Academy of

Sciences, 1252, 100–107. doi:10.1111/j.1749-6632.2012.06463.xKuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a

facilitation effect for native language phonetic perception between 6 and 12 months.

Developmental Science, 9, F13–F21. doi:10.1111/j.1467-7687.2006.00468.xMagne, C., Schon, D., & Besson, M. (2006). Musician children detect pitch violations in both music

and language better than nonmusician children: Behavioral and electrophysiological

approaches. Journal of CognitiveNeuroscience, 18, 199–211. doi:10.1162/jocn.2006.18.2.199

16 Vera Kempe et al.

Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical

expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive

Neuroscience, 23, 2701–2715. doi:10.1162/jocn.2010.21585Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a

foreign language better than nonmusicians: Behavioral and electrophysiological evidence.

Journal of Cognitive Neuroscience, 19, 1453–1463. doi:10.1162/jocn.2007.19.9.1453Milovanov, R., Huotilainen, M., V€alim€aki, V., Esquef, P. A. A., & Tervaniemi, M. (2008).

Musical aptitude and second language pronunciation skills in school-aged children: Neural

and behavioural evidence. Brain Research, 1194, 81–89. doi:10.1016/j.brainres.2007.11.

042

Musacchia, G., Sams,M., Skoe, E., &Kraus, N. (2007).Musicians have enhanced subcortical auditory

and audiovisual processing of speech and music. Proceedings of the National Academy of

Sciences, 104, 15894–15898. doi:10.1073/pnas.0701498104Nardo, D., & Reiterer, S. M. (2009). Musicality and phonetic language aptitude. In G. Dogil & S. M.

Reiterer (Eds.), Language talent and brain activity. Trends in Applied Linguistics (pp. 213–255). Berlin, Germany: Mouton De Gruyter.

Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech-in-noise.

Ear and Hearing, 30, 653–661. doi:10.1097/AUD.0b013e3181b412e9Patel, A. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–681. doi:10.

1038/nn1082

Patel, A. D. (2011). Whywould musical training benefit the neural encoding of speech? The OPERA

hypothesis Frontiers in Psychology, 2, 142. doi:10.3389/fpsyg.2011.00142

Perrachione, T. K., Fedorenko, E. G., Vinke, L., Gibson, E., & Dilley, L. (2013). Evidence for shared

cognitive processing of pitch in music and language. PLoS ONE, 15 (8), e73372. doi:10.1371/

journal.pone.0073372

Posedel, J., Emery, L., Souza, B., & Fountain, C. (2011). Pitch perception, working memory, and

second language phonological production. Psychology of Music, 40, 508–517. doi:10.1177/0305735611415145

Schellenberg, G. E. (2011). Examining the association between music lessons and intelligence.

British Journal of Psychology, 102, 283–302. doi:10.1111/j.2044-8295.2010.02000.xSchulze, K., Zysset, S., M€uller, K., Friederici, A., & Koelsch, S. (2011). Neuroarchitecture of verbal

and tonal working memory in nonmusicians and musicians. Human Brain Mapping, 32, 771–783. doi:10.1002/hbm.21060

Seashore, C. E. (1919). The measurement of musical talent. The Musical Quarterly, January (1),

129–148.Seashore, C. E. (1939). Revision of the Seashore measures of musical talent. Music Educators’

Journal, 26, 31–33. doi:10.2307/3385627Slevc, L. R., & Miyake, A. (2006). Individual differences in second-language proficiency: Does

musical ability matter? Psychological Science, 17, 675–681. doi:10.1111/j.1467-9280.2006.01765.x

Strait, D. L., Hornickel, J., &Kraus, N. (2011). Subcortical processing of speech regularities underlies

reading and music aptitude in children. Behavioral and Brain Functions, 7, 44. doi:10.1186/

1744-9081-7-44

Strait, D., & Kraus, N. (2011). Playing music for a smarter ear: Cognitive, perceptual and

neurobiological evidence. Music Perception, 29, 133–146. doi:10.1525/mp.2011.29.2.133

Surprenant, A. M., & Watson, C. S. (2001). Individual differences in the processing of speech and

nonspeech sounds by normal-hearing listeners. The Journal of the Acoustical Society of

America, 110, 2085–2095. doi:10.1121/1.1404973Tallal, P. (1980). Auditory temporal perception, phonics, and reading disabilities in children. Brain

and Language, 9, 182–198. doi:10.1016/0093-934X(80)90139-XToscano, J. C., & McMurray, B. (2010). Cue integration with categories: Weighting acoustic cues in

speech using unsupervised learning and distributional statistics. Cognitive Science, 34, 434–464. doi:10.1111/j.1551-6709.2009.01077.x

Musical ability and non-native speech-sound processing 17

Williamson, V. J., Baddeley, A. D., & Hitch, G. J. (2010). Musicians’ and nonmusicians’ short-term

memory for verbal and musical sequences: Comparing phonological similarity and pitch

proximity. Memory and Cognition, 38, 163–175. doi:10.3758/MC.38.2.163

Wing, H. D. (1968). Tests of musical ability and appreciation: An investigation into the

measurement, distribution, and development of musical capacity (2nd ed.). London, UK:

Cambridge University Press.

Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human

brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10, 420–422. doi:10.1038/nn1872

Zatorre, R. J., Belin, P., &Penhune, V. B. (2002). Structure and function of auditory cortex:Music and

speech. Trends in Cognitive Sciences, 6, 37–46. doi:10.1016/S1364-6613(00)01816-7Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory

processing. Psychology and Aging, 27, 410. doi:10.1037/a0024816

Received 8 December 2013; revised version received 8 August 2014

Appendix

Norwegian words presented as non-native speech stimuli in the Experiment

Contrast Short vowel Long vowel

Tonal Bøtter [name] – bøtter [buckets]

gullet [gold] – gulle [to fluke]

legget [ID (slang)] – legge [to put]

rakket [dog] – rakke [to botch]

bøter1 [fines] – bøter2 [to repent]

gulet [wind] – gule [yellow]

l€aget [state (slang)] – lege [GP]raket [wreck] – rake [rake]

Vowel rykk [pull] – rikk [budge]syll [joist] – sild [herring]

mytt [molted] – mitt [mine]

lynn [to soften] – lind [lime tree]

ryk [to smoke] – rik [rich]

syl [awl]– sil [sieve]

myt [to moult] – mit [mite]

lyn [lightning] – lin [linen]

18 Vera Kempe et al.