musical ability and non-native speech-sound processing are linked through sensitivity to pitch and...
TRANSCRIPT
British Journal of Psychology (2014)
© 2014 The British Psychological Society
www.wileyonlinelibrary.com
Musical ability and non-native speech-soundprocessing are linked through sensitivity to pitchand spectral information
Vera Kempe1*, Dennis Bublitz2 and Patricia J. Brooks2
1Division of Psychology, Abertay University, Dundee, UK2Department of Psychology, College of Staten Island, City University of New York,USA
Is the observed link between musical ability and non-native speech-sound processing due
to enhanced sensitivity to acoustic features underlying both musical and linguistic
processing? To address this question, native English speakers (N = 118) discriminated
Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal,
pitch, and spectral characteristicswere used tomeasure sensitivity to the various acoustic
features implicated in musical and speech processing. Musical ability was measured using
Gordon’s Advanced Measures of Musical Audiation. Results showed that sensitivity to
specific acoustic features played a role in non-native speech-sound processing:
Controlling for non-verbal intelligence, prior foreign language-learning experience, and
sex, sensitivity to pitch and spectral information partially mediated the link between
musical ability and discrimination of non-native vowels and lexical tones. The findings
suggest that while sensitivity to certain acoustic features partially mediates the
relationship between musical ability and non-native speech-sound processing, complex
tests of musical ability also tap into other shared mechanisms.
Compared to children, adults show a reduced ability to distinguish and produce
non-native speech sounds, due to perceptual narrowing associated with tuning into the
ambient language during the first year of life (e.g., Kuhl et al., 2006). Despite thisacknowledged disadvantage, adults still exhibit considerable individual differences in
their ability to distinguish subtle phonetic contrasts in a non-native language (Chandr-
asekaran, Sampath, & Wong, 2010; Golestani & Zatorre, 2009). Recently researchers
have begun to explore whether sensitivity to non-native speech contrasts may be linked
to an individual’s musical ability, that is, their ability to process and produce musical
sounds. This ability relies on sensitivity to relevant acoustic features of musical sounds,
but may also include a wider range of associated component abilities, such as superior
short-term memory capacity that can facilitate learning of tonal and rhythmical patternsover time (Williamson, Baddeley, & Hitch, 2010). There are two ways in which a link
between musical ability and speech-sound processing could arise: On the one hand,
*Correspondence should be addressed to Vera Kempe, Division of Psychology, Abertay University, Bell Street, Dundee DD1 1HG,United Kingdom (email: [email protected]).
A subset of the findings was presented at the 2013 Eastern Psychological Associationmeeting, and at the 35th Annual Conferenceof the Cognitive Science Society.
DOI:10.1111/bjop.12092
1
individual differences in aptitudemight arise from genetically mediated differences in the
neuroanatomy of subcortical and cortical pathways that result in superior auditory
processing, which may benefit the processing of acoustic features shared between the
sounds of music and language. Individuals with greater aptitude may then be more likelyto take up pursuits that are compatible with these pre-dispositions, such as musical
training (Schellenberg, 2011), which can explain the well-attested observation that
musicians exhibit superior processing of speech sounds (e.g., Kraus, Skoe, Par-
bery-Clark, & Ashley, 2009; Musacchia, Sams, Skoe, & Kraus, 2007; Wong, Skoe, Russo,
Dees, & Kraus, 2007). However, since musical aptitude does not necessarily result in
musical training, a relationship between musical aptitude and speech-sound processing
can also be found in non-musicians (e.g., Delogu, Lampis, & Belardinelli, 2006; Slevc &
Miyake, 2006).On the other hand, musical expertise, operationalized as the amount of musical
training, might exert a direct causal influence on an individual’s ability to discriminate
speech sounds because studying and playing music, among other things, requires and
gradually trains the very precise processing of certain acoustical features (Patel, 2011), as
well as the attentional control associated with auditory processing (Kraus, Strait, &
Parbery-Clark, 2012). Even though a causal relationship between musical expertise and
speech-sound processing cannot reliably be established using quasi-experimental designs
that compare musicians and non-musicians, there is evidence for a correlation betweenthe amount ofmusical training and theprocessing of acoustical features relevant in speech
(Kraus et al., 2009), providing tentative evidence that musical expertise might indeed
play a causal role in improving speech processing.
Since native phonetic categories can often be identified by multiple, partially
redundant acoustic and contextual cues (Toscano & McMurray, 2010), the superior
auditory sensitivity associated with musical ability is unlikely to benefit native
speech-sound processing substantially (Surprenant & Watson, 2001) except when it
occurs under adverse conditions, such as age-related hearing loss (Zendel & Alain,2012) or noise (Kraus et al., 2012; Parbery-Clark, Skoe, Lam, & Kraus, 2009).
However, musical ability should prove especially beneficial for the processing of
novel speech sounds in non-native languages. Indeed, a number of studies have
confirmed a positive relationship between musical ability and non-native speech--
sound processing, be it in second language learners (Milovanov, Huotilainen,
V€alim€aki, Esquef, & Tervaniemi, 2008; Posedel, Emery, Souza, & Fountain, 2011;
Slevc & Miyake, 2006), or in laboratory studies testing discrimination of pitch
differences in non-native prosody (Besson, Sch€on, Moreno, Santos, & Magne, 2007;Deguchi et al., 2012; Magne, Sch€on & Besson, 2006; Marques, Moreno, Castro &
Besson, 2007) and of unfamiliar non-native speech sounds (Delogu et al., 2006;
Delogu, Lampis, & Belardinelli, 2010; Marie, Delogu, Lampis, Belardinelli, & Besson,
2011; Wong et al., 2007).
Regardless of whether musical ability benefits non-native speech-sound processing
because of innate aptitude, training, or both (Nardo & Reiterer, 2009), the central idea
linking these two domains involves shared processing of acoustic features, such that
increased sensitivity to specific features associated with musical ability might also benefitlanguage. In other words, sensitivity to certain acoustic features might mediate the link
between musical ability and non-native speech-sound processing. However, very few
studies have explored the mediating effects of sensitivity to various acoustic features
directly, and those that have done so have typically focused mainly on the mediating role
of sensitivity to pitch (Deguchi et al., 2012; Milovanov et al., 2008; Posedel et al., 2011).
2 Vera Kempe et al.
What is missing is a fuller picture of the range of acoustic features that might mediate the
links between musical ability and processing of specific non-native speech sounds. This
study aims to fill this gap.
Sensitivity to pitch is, of course, themost obvious candidate tomediate a link betweenmusical ability and non-native speech processing (Deguchi et al., 2012; Milovanov et al.,
2008; Posedel et al., 2011), although the degree of precision in pitch processing required
for music is likely to surpass the precision required for processing phonological
information in speech (Patel, 2011). In addition, pitch sensitivity may be critical for the
processing of lexical tones or pitch accents in languages such as Mandarin, Thai or
Norwegian, wherein tonal information requires listeners to process pitch values along
with the direction of pitch change (Chandrasekaran et al., 2010). In line with this idea,
findings linking non-native speech processing with musical ability, operationalized eitherby tests of musical aptitude or by amount of musical training, have been more consistent
for tonal contrasts (Marie et al., 2011; Wong et al., 2007) than for phonological contrasts
(Delogu et al., 2006, 2010; Slevc & Miyake, 2006).
In addition to pitch processing, musical ability is also associated with processing of a
wide variety of other features of sound, such as harmony, timbre, loudness, duration,
tempo, and melodic contour (Patel, 2003), and the enhanced sensitivity to these features
may benefit discrimination of certain non-native phonological contrasts. For example,
high precision in identifyingmusical harmony or the timbre ofmusical instruments – bothof which are features characterized by spectral information –may benefit the processing
of vowels, which are distinguished by the spectral information associated with the first
and second formants. Furthermore, sensitivity to temporal information associated with
beat and rhythm may benefit the perception of consonants, which are distinguished by
rapid temporal changes in various features, such as voice onset time or formant
transitions. This study aims to test these more specific predictions by exploring whether
sensitivity to several non-linguistic acoustic features, encompassing pitch, spectral and
temporal information,mediates links betweenmusical ability and sensitivity to non-nativephonological and tonal contrasts.
Method
Musical aptitude and musical experience were assessed separately to ascertain whether
they are differentially predictive of individual differences in non-native speech-soundprocessing.
Assessment of musical aptitude typically involves administration of complex
measures such as the Seashore test (Seashore, 1919, 1939), the Wing test (Wing,
1968) or Gordon’s Advanced Measures of Musical Audiation (AMMA; Gordon, 1989).
These tests require participants to compare musical phrases to detect subtle
differences, for example, in melody or rhythm, thereby tapping into sensory and
perceptual abilities associated with musical ability, but also into attention control and
working memory capacity required to remember and compare the phrases (Nardo &Reiterer, 2009). They do not, however, assess the motor skills attained with musical
practice, presumably a reflection of the fact that they are designed for selection of
individuals into musical training. In this study, we administer the AMMA, because it is
considered to be independent of an individual’s level of musical expertise (Nardo &
Reiterer, 2009). Furthermore, we used AX discrimination tasks to assess the potential
mediating effects of sensitivity to three types of non-linguistic acoustic information –
Musical ability and non-native speech-sound processing 3
temporal, pitch, and spectral – on sensitivity to two unfamiliar Norwegian speech
sounds – a vowel and a tonal contrast.1 The Norwegian vowel contrast was the /i/ - /y/
contrast which does not exist in English. The tonal contrast was based on one of several
Norwegian dialects with lexical tones consisting of rising and falling-rising pitchaccents, which distinguish minimal pairs of segmentally identical bi-syllabic words. For
example, ‘Hammer’, spoken with the rising tone, is a Norwegian proper noun while
‘hammer’, spoken with the falling-rising tone, denotes the tool. These contrasts
encompass temporal changes in fundamental frequency extending over several
hundreds of milliseconds. Testing Norwegian tonal contrasts provides an interesting
cross-linguistic extension to the more commonly studied non-native tonal contrasts in
languages like Mandarin or Thai. To ensure that it was the pitch contour contained in
these natural language stimuli, and not some other phonetic information, that was thecritical feature used for discrimination, we also tested sensitivity to artificially
synthesized pitch contours which were non-linguistic, pure-tone analogues of the
Norwegian tonal contrasts.
Crucially, we examined to what extent temporal, pitch, and spectral sensitivity
mediated links between musical ability and non-native speech-sound processing. To
control for non-verbal intelligence, we administered Cattell’s Culture Fair Intelligence
Test (Cattell & Cattell, 1973). Music and language background questionnaires inquired
about participants’ length of musical training and number of languages learned athome or at school, and elicited self-ratings of proficiency in each language (L2 and
L3). Previous research (Bowles, Silbert, Jackson, & Doughy, 2011; Kempe et al., 2012)
had found a male advantage in processing of non-speech and speech sounds
encompassing rapid temporal changes; hence, participant sex was also considered in
the analyses.
ParticipantsOne hundred and eighteen speakers of American English (59 women, 59 men), mean age
20 years, range 19–31 years, participated in the study. L3 proficiency self-ratings were
missing from threeparticipants, andpitch discrimination data fromoneparticipant. These
participants were excluded from analyses including these variables.
Materials
Advanced Measures of Musical Audiation (AMMA)
Gordon (1989) developed the AMMA to measure the ability to detect subtle tonal and
rhythmical differences in music. It consists of 30 items, each comprising two musical
phrases – a short musical ‘statement’ followed after four seconds by an ‘answer’ of thesame length. The duration of the musical phrases ranges from 4 to 11 s. Each item
contains either one or more tonal changes, or one or more rhythmic changes, but
never both. Participants have to decide whether the phrases are the same or different.
For ‘different’ phrases, participants have to decide whether the difference involves a
tonal or rhythm change. The test yields tone and rhythm scores, as well as a composite
score.
1 In a previous study (Kempe, Thoresen, Kirk, Schaeffler, & Brooks, 2012), we had also included an unfamiliar Norwegianconsonant contrast, the /c�/ - /∫/ contrast, which proved to be too easy for native English participants resulting in ceiling effects.Wetherefore did not include this contrast in this study.
4 Vera Kempe et al.
AX (‘same-different’) tasks
All six AX tasks comprised 32 ‘same’ and 32 ‘different’ trials. The individual sound files
comprising the stimulus pairs are provided in the Supplementary Materials.
(a) Testing sensitivity to temporal information
We synthesized eight 250 Hz sinusoidal carrier waves with an overall duration of 600 ms
differing in amplitude envelope onset rise times, and otherwise devoid of segmental,
spectral, and pitch information. The onset of the amplitude envelope was faded in with a
depth of 100% so that rise times reached a maximum amplitude at 0, 10, 20, 30, 60, 70, 80
and 90 ms. ‘Different’ trials comprised pairs of sounds differing in rise times by 60 ms
(e.g., 0 ms vs. 60 ms, 10 ms vs. 70 ms, etc.), centred around 45 ms, a value which hasbeen reported as the category boundary between ‘bowed’ and ‘plucked’ sounds (Cutting
& Rosner, 1974). All sounds had a fade-out period of 50 ms so that the overall duration of
the sound at maximum amplitude was comparable to the duration of sounds in the pitch
and spectral sensitivity tasks described below.
(b) Testing sensitivity to pitch
We created eight 500 ms pure tone sinusoidal carrier waves ranging from 100 to3,000 Hz. The tones increased following a quadratic trend resulting in tones of 100, 200,
400, 700, 1,100, 1,600, 2,200, and 3,000 Hz. Each tone was paired with a contrast tone
with a frequency exceeding its counterpart by 2%, resulting in tones of 102, 204, 408, 714,
1,122, 1,632, 2,244, and 3,060 Hz. The 2% difference was chosen to be slightly below
pitch discrimination thresholds established in previous studies examining untrained
listeners (e.g., Halliday, Moore, Taylor, & Amitay, 2011; Surprenant &Watson, 2001). The
cumulative increase was designed to create sound pairs that subjectively sampled the
pitch range at roughly similar intervals, taking into account the non-linearity of pitchperception. For the ‘different’ trials, each sound was paired with its corresponding
contrast sound, resulting in pairs of 100 Hz versus 102 Hz, 200 Hz versus 204 Hz, 400 Hz
versus 408 Hz etc.
(c) Testing sensitivity to spectral information
For this test, we combined the pure tones created for the pitch test into complex tones of
500 ms duration, with each complex tone comprising low (e.g., 100 Hz or 200 Hz),middle (e.g., 700 Hz or 1,100 Hz) and high (e.g., 2,200 Hz or 3,000 Hz) frequencies.
These frequencies were chosen to broadly mimic the fundamental frequency and the first
two formants of speech, which are crucial for vowel perception. For ‘different’ pairs, one
of the component toneswas increasedby 2%, and this change affected either themiddle or
the high frequency, for example, a ‘different’ pair might include a complex tone
consisting of frequencies of 100, 1,100 and 3,000 Hz and a complex tone consisting of
frequencies of 100, 1,122 and 3,000 Hz.
(d) Testing sensitivity to the Norwegian tonal contrast
Recordings by a male native speaker of Norwegian of eight minimal pairs of Norwegian
words containing a contrast between rising and falling-rising tonal contours (see
Appendix) were taken from Kempe et al. (2012). Each stimulus word was recorded
Musical ability and non-native speech-sound processing 5
twice, with ‘same’ trials using two within-category exemplars (e.g., ‘Hammer1’,
‘Hammer2’) and ‘different trials using two different-category exemplars (e.g., ‘Hammer1’,
‘hammer1’). Four word pairs contained short vowels in the first (stressed) syllable (mean
length 64 ms); the remaining four pairs contained long vowels (mean length 187 ms).Crucially, words with rising and with falling-rising tones did not differ in the length of the
first vowel (118 vs. 133 ms, p = .5), overall word length (396 vs. 417 ms, p = .2), and
metric stress; thus, duration and stress could not be used as additional cues to tonal
contrasts. Corresponding short and long vowel pairs were matched for initial phonemes.
To ensure that a male advantage in discriminating the tonal contrasts (as reported in
Kempe et al., 2012)was not just an artifact of using amale voice in the recordings,we also
created a synthetic female voice version of the stimuli. Synthesizing the female voice was
necessary because our previous recordings by a female speaker from the same region ofNorway had a slower speech rate, and retained greater articulatory clarity even after time
compression tomatch themale speech rate while maintaining the pitch characteristics of
the female speech. Pilot studies demonstrated that these differing features of the female
recordings affected the degree and range of sensitivity to the tonal contrast. To match for
articulatory clarity and indexical features, we therefore submitted the male voice stimuli
to the voice gender change algorithm in PRAAT (Boersma & Weenink, 2011) using a
fundamental frequency of 220 Hz and scaling the first formant up by 20%. All results
below are averaged over the male- and female-voiced versions of the AX task.
(e) Testing sensitivity to pitch contours comprising non-speech analogues of the tonal contrast
The non-speech analogues of the Norwegian tonal contrast comprised sine waves with
pitch contours extracted from the fundamental frequency modulation of both the male-
and female-voiced versions of the Norwegian tonal contrasts, resulting in contours in the
male pitch range and contours in the female pitch range. These stimuli contained no other
information than the pitch contour of the Norwegian tones. The male-voiced andsimulated female-voiced versions of the AX task were administered separately; sensitivity
measures were then averaged over these male- and female-voiced versions.
(f) Norwegian vowel contrast
We used eight minimal pairs of Norwegian monosyllabic words containing the vowel /i:/
or /I/ versus /y:/ or /Y/ to test discrimination of a contrast between high front unrounded
and rounded vowels that does not exist in English (see Appendix). Recordings of a malenative speaker of Norwegian were taken from Kempe et al. (2012). Each stimulus word
was recorded twice, with ‘same’ trials using twowithin-category exemplars (e.g., ‘rykk1’,
‘rykk2’) and ‘different trials using two different-category exemplars’ (e.g., ‘rykk1’, ‘rikk1’).
Four word pairs contained the short vowels /I/ and /Y/ (mean length 67 ms), and the
remaining four word pairs contained the long vowels /i:/ and /y:/ (mean length 150 ms).
Again, on average both members of a minimal pair did not differ in vowel length (108 vs.
108 ms, p = .9) and overall word length (381 vs. 382 ms, p = .95; thus, duration could
not serve as additional cue.
Other measures
Participants also completed (1) Cattell’s Culture-Fair Test of Nonverbal Intelligence, Scale
3, Form A (Cattell & Cattell, 1973); (2) a music background questionnaire on which they
6 Vera Kempe et al.
provided information about the extent of their musical training (if any); and (3) a language
background questionnaire onwhich they indicated the number of languages learned, and
rated their reading, writing, speaking and comprehension abilities in all languages on a
scale from 1 (very poor) to 6 (native-like). If participants listed no foreign language,proficiency was coded as 0.
Procedure
AXdiscrimination taskswere presented in three blocks,with the first block containing the
temporal, pitch and spectral AX tasks, the second block containing the male- and
female-voiced tonal AX tasks as well as the vowel AX task, and the third block containing
the two AX tasks presenting the extracted non-speech pitch contours of the male- andfemale-voiced Norwegian tonal stimuli. The fixed block sequence ensured that the
non-speech pitch contours could not prime the processing of the tonal contrast, and that
variance due to order effectswas not confoundedwith participant variance, although task
order was counterbalanced within each block. AMMA and Culture Fair Intelligence Test
were interspersed between blocks with their order counterbalanced as well. Informed
consent was obtained and background questionnaires were completed prior to any of the
tasks. The entire session lasted around 90 min.
In each of the AX tasks, participants received eight practice trials with feedback,randomly selected from the entire set of trials. These were followed by 64 test trials
presented without feedback, 32 ‘same’ and 32 ‘different’ trials. Each AX task was
constructed to test eight different instantiations of the contrast of interest. In the
‘different’ trials, each of the eight stimulus pairings representing the contrast was
repeated four times, twice in each order. In the ‘same’ trials, each of the 16 items was
paired with itself; these identical pairs were repeated twice. Note, however, that for the
Norwegian tonal and vowel contrasts, identical pairs comprised different within-category
exemplars, obtained by recording multiple examples of each word, as described above.Within each trial, sound stimuli were separated by an inter-stimulus interval of 200 ms,
and the inter-trial interval was 500 ms. Participants were asked to press the ‘s’ key if they
perceived the sounds to be the same and the ‘d’ key if they perceived them to be different.
Each AX task lasted approximately 5 min.
Results
Participants’ performance on the AX tasks was converted into A0, a sensitivity measure
that corrects for differences in response bias; A0 scores range from 0 to 1, with 0.5
corresponding to chance. Table 1 presents themeans, standard deviations, and ranges for
musical training (years), foreign language backgroundmeasures, AMMA tone, rhythm and
composite scores, as well as the auditory sensitivity and speech-sound processing tasks.
Comparing performance of different speech sounds
Since previous research had demonstrated a male advantage in discriminating some
non-native speech sounds (Bowles et al., 2011; Kempe et al., 2012), results for all
measures are presented for male and female participants separately. For the speech
processing tasks, a 3 (Sound Type: tonal, pitch contour, vowel) 9 2 (Sex) mixed-type
ANOVAwith Sound Type as a within-subjects factor yielded a main effect of Sound Type,
Musical ability and non-native speech-sound processing 7
F(2, 232) = 11.9, p < .001. Bonferroni-corrected post-hoc tests with a revised alpha-level
of .017 indicated that performancewas superior for the vowels compared to the tonal and
pitch contour contrasts, all t’s > 3.9, all p’s < .001, and that tonal contrasts and pitch
contour contrasts did not differ from each other, p = .7. The interaction between Sex and
Sound Type fell short of significance, F(2, 202) = 2.05, p = .13, indicating that the
expected male advantage for processing the tonal contrasts (Bowles et al., 2011; Kempe
et al., 2012) could not be confirmed in this study.
Zero-order correlations
For a preliminary exploration of the links between linguistic background, musical
aptitude, musical expertise, sensitivity to acoustic features, and non-native speech-sound
processing, we computed zero-order correlations, see Table 2. After Bonferroni correc-
tionusing a revised alpha-level of 0.0006,we found that all languagebackgroundmeasures
were positively correlated. There was also a strong positive correlation between the
AMMA tonal and rhythm scores. Sensitivity to the tonal contrast and the non-linguisticpitch contourwere also strongly correlated, suggesting that pitch contourwas indeed the
dominant cue for discrimination of the Norwegian tonal stimuli. Note that the positive
correlation between years of musical training and musical aptitude fell short of
significance only after Bonferroni correction, which is a very conservative way to guard
against Type 1 error (p = .002 for AMMA tonal score, p = .006 for AMMA rhythm score).
Crucially, the zero-order correlations confirmed the link betweenmusical aptitude and
non-native speech-sound processing: The AMMA tonal score was positively correlated
with discrimination of the tonal and vowel contrasts, and the AMMA rhythm score waspositively correlated with discrimination of the tonal contrast. Note that musical training
failed to show any significant zero-order correlations with non-native speech-sound
processing.
Table 1. Means, standard deviations and ranges (the latter two in parentheses) for all variables, aswell as
results of t-tests comparing men and women. AMMA, Advanced Measures of Musical Audiation
Men Women t (df), p
Culture fair intelligence 24.1 (4.5; 12–37) 23.8 (5.1; 14–36) 0.3 (116), n.s.
No. of languages (incl. Eng.) 2.5 (0.7; 1–4) 2.4 (0.6; 1–4) 0.7 (116), n.s.
L2-rating 2.6 (1.4; 0–6) 2.5 (1.4; 0–6) 0.3 (114), n.s.
L3-rating 0.8 (1.3; 0–5) 0.7 (1.1; 0–3.75) 0.5 (112), n.s.
Musical training (yrs) 1.7 (2.3; 0–8) 4.3 (5.5; 0–20) �3.5 (116), <.001AMMA tone 23.2 (4.0; 14–32) 23.7 (4.5; 13–34) �0.6 (116), n.s.
AMMA rhythm 25.3 (3.5; 15–31) 25.4 (4.0; 15–36) �0.2 (116), n.s.
AMMA composite 48.5 (6.6; 35–63) 48.9 (8.6; 27–70) �0.3 (116), n.s.
Auditory sensitivity measures (A0)Temporal sensitivity 0.64 (0.14) 0.65 (0.15) �0.4 (116), n.s.
Pitch sensitivity 0.67 (0.15) 0.61 (0.14) 2.1 (116), <.05Spectral sensitivity 0.61 (0.13) 0.59 (0.15) 0.9 (116), n.s.
Speech-sound processing (A0)Tonal contrast 0.77 (0.11) 0.75 (0.10) 1.2 (116), n.s.
Pitch contour 0.78 (0.10) 0.74 (0.10) 1.9 (116), .06
Vowel contrast 0.80 (0.10) 0.80 (0.11) �0.1 (116), n.s.
8 Vera Kempe et al.
We also found a link between sensitivity to pitch and spectral information and
discrimination of the tonal contrast, and between sensitivity to temporal and spectralinformation and discrimination of the vowel contrast. The zero-order correlations
between musical aptitude and musical training on the one hand, and sensitivity to the
acoustical features on the other hand were not significant except for the correlation
between the AMMA rhythm score and sensitivity to pitch.However, these links need to be
revisited after controlling for the background variables (sex, non-verbal intelligence and
prior language experience), which is achieved by the regression and mediation analyses
reported below.
Regression analyses
Next, we performed regression analyses to examine whether musical aptitude and
musical trainingmade independent contributions to explaining the variance in non-native
speech-sound processing, after sex, non-verbal intelligence and prior language experi-
encewere partialled out. Prior language experience comprised three variables: number of
known languages and self-rated proficiency in L2 and L3. Since the ANOVA presented
above had not revealed any differences in overall performance between linguistic andnon-linguistic pitch contour discrimination, and because of the strong correlation
between these two measures, we omitted pitch contour discrimination as a criterion
variable from further analyses. Furthermore, because the AMMA tonal and rhythm scores
were highly correlated, we used the composite AMMA score in subsequent analyses to
avoid collinearity in the models.
For tonal contrasts as the criterion variable, a stepwise regression analysis revealed that
the AMMA composite score accounted for 12.5% of adjusted variance over and above the
background variables, partial F(1,106) = 17.13, p < .001, and years of musical trainingdid not explain any additional adjusted variance (partial F < 1). In contrast, entering years
of musical training first after the background variables accounted for a marginally
Table 2. Zero-order Pearson correlations between predictor variables
2 3 4 5 6 7 8 9 10 11 12 13
1. Culture fair intelligence .32 .02 .22 .11 .24 .28 .26 .13 .01 .23 .22 .14
2. No. of languages .44* .78* .08 .16 .24 .29 .18 .09 .19 .09 .16
3. L2-rating .56* .15 .06 .05 .18 .08 .07 .18 .03 .17
4. L3-rating .12 .21 .19 .24 .13 .08 .20 .06 .30
5. Musical training (years) .29 .25 .13 .12 .27 .18 .05 .12
6. AMMA tone .71* .18 .27 .21 .39* .24 .32*
7. AMMA rhythm .13 .33* .26 .45* .36* .28
8. Temporal sensitivity .13 .05 .29 .23 .34*
9. Pitch sensitivity .50* .49* .29 .28
10. Spectral sensitivity .42* .33* .36*
11. Tonal contrast .62* .47*
12. Pitch contrast .42*
13. Vowel contrast
Note. N’s range from 112 to 118 depending on missing values. AMMA, Advanced Measures of Musical
Audiation.
*Indicates significance after Bonferroni correction at p < .0006.
Musical ability and non-native speech-sound processing 9
significant 2.0% of adjusted variance, partial F(1,106) = 3.21, p = .08, while the AMMA
composite score entered next accounted for another 10.3% of adjusted variance, partial F
(1,105) = 14.14, p < .001.
For vowel contrasts as criterion variable, the AMMA composite score, entered after thebackground variables, accounted for 8.0% of adjusted variance, partial F(1,106) = 10.94,
p < .01, while years of musical training did not explain any additional adjusted variance
(partial F < 1). When years of musical training were entered first after the background
variables, they accounted for no additional adjusted variance (partial F < 1) while the
AMMA composite score, entered next, accounted for another 7.5% of adjusted variance,
partial F(1,105) = 10.26, p < .01. This analysis indicates that musical aptitude, but not
musical experience, was related to non-native speech-sound processing.
Mediation analyses
The next set of analyses explored whether the link between musical aptitude and
non-native speech-sound processing was mediated by sensitivity to temporal, pitch and
spectral information. To this end,we performed a series ofmediation analyses, employing
bootstrapping to estimate the 95% confidence intervals of the indirect effect using a
procedure introduced by Hayes (2013) (the SPSS-macro PROCESS, downloadable from
http://afhayes.com/introduction-to-mediation-moderation-and-conditional-pro-cess-analysis.html). An indirect effect is deemed to be statistically significant at p = .05 if
the confidence interval does not include zero. The mediation analyses included Culture
Fair intelligence, sex, number of studied languages, and self-ratings in L2 and L3 as
covariates. For the tonal contrast (see Figure 1), the mediation analysis (based on 10,000
bootstrap samples) with temporal, pitch, and spectral sensitivity as mediators confirmed
an indirect effect of the AMMA score through pitch sensitivity (ab = .0011, 95% CI:
.0003–.0025) and through spectral sensitivity (ab = .0007, 95% CI: .0001–.0020) as well
as a direct effect of the AMMA score (c0 = .0031, 95% CI: .0008–.0053). For the vowelcontrast (see Figure 2), the mediation analysis also confirmed an indirect effect through
spectral sensitivity (ab = .0009, 95% CI: .0002–.0024) and a direct effect of the AMMA
score (c0 = .0028, 95% CI: .0004–.0051). There was no indirect effect through temporal
sensitivity in themediation analyses, as temporal sensitivitywas not linked to theAMMA in
the first place. These analyses show that the link betweenmusical aptitude and non-native
speech-sound processing is partially mediated by spectral sensitivity, and, for tonal
contrasts, it is also partially mediated by pitch sensitivity; however, the findings also show
that there remains a residual direct link between musical aptitude and sensitivity to thenon-native speech sound contrasts.
Even though therewas no significant correlation between years ofmusical training and
sensitivity to the non-native speech contrasts, we fitted the same mediation model using
years of musical training as a predictor.2 This analysis revealed only indirect effects
through spectral sensitivity on performance for each of the speech sound tasks (ab’s from
.0016 to .0021, CI’s between .0001 and .0053), but showed no direct effects of years of
musical training (all CI’s encompassed 0). Thus, the mediation analyses suggest that both
2Wealso obtainedmusicality self-ratings whichwere highly correlatedwith years ofmusical training (r = .71, p < .001) but onlyweakly correlated with the AMMA scores (r = .33, p < .001) suggesting that participants based the appraisal of their ownmusical abilities predominantly on howmuchmusical training they had received. The results of themediation analyses are virtuallyidentical when musicality self-ratings are used instead of years of musical training as a predictor.
10 Vera Kempe et al.
musical expertise and musical aptitude are linked to non-native speech-sound processing
via pitch and spectral processing.
Discussion
The aim of this study was to determine whether sensitivity to specific acoustic features
mediates the link between musical ability and non-native speech-sound processing, and
whether musical aptitude or musical expertise is the better predictor of non-native
speech-sound processing.
Links between musical ability and sensitivity to acoustic features
When effects of sex, intelligence, and language background were controlled for, both
musical aptitude and musical expertise were linked to the ability to process frequency
(pitch and spectral) information, but not temporal information in the range below100 ms
(see paths on the left in Figures 1 and 2). This is perhaps not surprising given that the
temporal changes relevant tomusic involve a longer time scale in the order of hundreds or
thousands ofmilliseconds,which is different from the rapid temporal changes in the order
of tens ofmilliseconds that are present in segmental linguistic units, andwere simulated inour temporal sensitivity test. While such sensitivity to very rapid temporal changes is
fundamental for language processing (e.g., Goswami et al., 2002), as indicated by the link
between the ability to discriminate subtle temporal changes in amplitude envelope onset
rise times and discrimination of the non-native tonal and vowel contrasts, it seems to be
Figure 1. Results of mediation analysis testing the direct link, and the indirect link through temporal,
pitch, and spectral sensitivity, between the composite Advanced Measures of Musical Audiation (AMMA)
score and Norwegian tonal contrast processing, controlling for sex, Culture Fair Intelligence, number of
previously learned languages, and proficiency self-ratings in L2 and L3.Only significant links and associated
standardized coefficients are shown. Significant effects of the covariates are indicated by solid grey lines.
***p < .001; **p < .01; *p < .05.
Musical ability and non-native speech-sound processing 11
less relevant for musical processing. This qualifies claims that music benefits speech due
to requirements for more precise processing of auditory information (Patel, 2011) by
suggesting that this does not apply as much to temporal processing as it applies to
frequency-related information.
Links between sensitivity to acoustic features and non-native speech-sound processing
We had hypothesized that the discrimination of tonal contrasts would be linked totemporal, pitch, and spectral sensitivity,whereas discrimination of vowel contrastswould
mainly be linked to spectral sensitivity. Indeed, for tonal contrasts this prediction bore out
after controlling for sex, language background and intelligence, especially with respect to
the effects of temporal and spectral sensitivity (see the paths on the right in Figure 1).
Thus, the ability to process both rapid temporal changes and frequency information
proved important for the discrimination of the tonal contrasts. This confirms the
previously observed role of temporal information (Kempe et al., 2012), in addition to
pitch and spectral information, in the processing of pitch contours – a finding that is in linewith other investigations of tonal contour discrimination, which have shown that when
non-musicians were successful in processing non-native tonal contrasts they tended to
rely on the direction of pitch change over time, whereas non-musicians who were
unsuccessful in the same task tended to rely on pitch height discrimination (Chandr-
asekaran et al., 2010).
Figure 2. Results of mediation analyses testing the direct link, and the indirect link through temporal,
pitch, and spectral sensitivity, between the composite AdvancedMeasures of Musical Audiation (AMMA)
score and Norwegian /i/ - /y/ vowel discrimination, controlling for sex, Culture Fair Intelligence, number
of previously learned languages, and proficiency self-ratings in L2 and L3. Only significant links and
associated standardized coefficients are shown. Significant effects of the covariates are indicated by solid
grey lines. The figure also lists the L2swhich aremost likely responsible for the link between L3-rating and
vowel processing. ***p < .001; **p < .01; *p < .05.
12 Vera Kempe et al.
For vowel contrasts, we found the expected link to spectral sensitivity as well as a link
to temporal sensitivity (see the paths on the right in Figure 2). The link between spectral
sensitivity and vowel processing is not surprising, as vowels are characterized by
differences in spectral information. However, the finding that temporal sensitivity wasalso a significant predictor for vowel processing was unexpected. Although we carefully
controlled for vowel length and metrical stress to exclude temporal information as a cue,
participants might have been sensitive to subtle durational differences in other segments
when trying to discriminate the Norwegian words that differed with respect to their
vowels. It should also be noted that performance on the vowel contrast was positively
related to self-ratings in an L3. Most likely this reflects the fact that some participants had
studied languages containing the /i/ - /y/ contrast (Albanian, French, and German), and
that prior experience with this contrast may have benefitted vowel processing. Purelycoincidentally, in our sample these languages had been learned more often as L3s rather
than as L2s.
The observation that temporal, spectral and, to some extent, pitch sensitivity were all
linked to non-native speech-sound processing complements approaches that tend to
emphasize the role of rapid temporal auditory processing as themain sensory component
of language processing (Goswami et al., 2002; Tallal, 1980). While rapid temporal
information plays a greater role in the discrimination of consonants and in identifying
patterns of metrical stress, frequency information may be more important for thediscrimination of vowels. Most likely, both types of information are crucial when it comes
to the processing of more complex segmental and suprasegmental aspects of speech,
such as lexical tones and prosodic contours.
Does sensitivity to acoustic features mediate the link between musical ability and
non-native speech-sound processing?
We found that the link between musicality and non-native speech-sound processing waspartially mediated by spectral sensitivity and to some extent pitch sensitivity, but not
temporal sensitivity. This is in line with previous findings showing pitch and chord
processing to mediate observed links between musical ability and non-native
speech-sound processing (Milovanov et al., 2008; Posedel et al., 2011). The fact that
spectral sensitivity was the more consistent mediator suggests that the processing of
complex spectral information may be of greater relevance to both music and language
than the processing of pure tones.
As discussed above, the lack of a mediating effect of temporal sensitivity most likelyreflects the fact that temporal changes in music take place on a slower time scale than
temporal changes in language. This finding is only partially compatible with claims that
musical and linguistic processing exploit different cues –with languagemainly relying on
rapid temporal processing and music relying on processing of pitch and spectral
information (Zatorre, Belin, & Penhune, 2002). Instead, our findings support the idea of
shared mechanisms betweenmusic and language (Patel, 2003; Strait, Hornickel, & Kraus,
2011) by suggesting that processing of frequency information is one of the mechanisms
that may be shared across the two domains, while processing of rapid temporalinformation appears to be more important for language.
Finally, links with musical ability – direct and indirect ones –were found, not just for
the tonal contrast, but also for the /i/ - /y/ vowel contrast. This finding differs from studies
reporting links with musicality for non-native lexical tones but not for non-native
segmental contrasts (Delogu et al., 2006, 2010). The discrepancy between the results of
Musical ability and non-native speech-sound processing 13
these earlier studies and ours may have arisen from the fact that we tested only one
particular vowel contrast, whereas Delogu and colleagues tested a variety of non-native
Mandarin vowels and consonants. It is conceivable that the inclusion of consonants,
which in many instances requires processing of rapid temporal information, for example,to detect voice onset times or frequency transitions, may have attenuated overall links
between processing of phonemes and musical ability in their studies.
The mediation analyses also revealed a direct link between musicality and non-native
speech-sound processing, which was not explained by shared variance with the
sensitivity to acoustic features. This finding corroborates very closely the results of a
similar studypublished at the time of submission: Perrachione, Fedorenko, Vinke,Gibson,
and Dilley (2013) also reported a link between the processing of linguistic prosody and
musical pitch contours, after controlling for basic pitch, temporal and visuo-spatialprocessing. While their results were obtained for the processing of suprasegmental
linguistic information, our study extends the finding to the processing of segmental
information, specifically, to the processing of vowel contrasts. A minor discrepancy
between the results of the Perrachione et al. study and ours arises onlywith respect to the
role of rapid temporal processing, which was linked to speech-sound processing in our
study but not in theirs. This discrepancy may be due to the different ways in which rapid
temporal processing was tested in the two studies: While in the present study, temporal
processing required detecting differences in amplitude onset rise time, in Perrachioneet al. (2013), it required detection of differences in interval durations of click trains,
which are akin to ‘acoustic flutter’. Nonetheless, the converging evidence for a direct link
between the processing of musical and linguistic stimuli from these two studies implies
that, in addition to basic pitch and spectral sensitivity, there may be other, as yet
unspecified, mechanisms that mediate processing of novel speech sounds and measures
of musical aptitude.
Although our study was not designed to explore these other mechanisms, we would
like to offer some speculations as to what they might be. Tests of musical aptitude andspeech-sound processing rely on task-specificworkingmemory capacity: Both the AMMA
and the AX task require participants to retain strings of sounds in auditory working
memory for purposes of comparison. At present, it is unclear to what extent holding
musical phrases in mind relies on general working memory capacity (Williamson et al.,
2010) or recruits different neural pathways, such as a phonological loop, supporting
rehearsal of phonological information, or amusical loop, supporting rehearsal of tonal and
rhythmic information (Schulze, Zysset, M€uller, Friederici, & Koelsch, 2011). Neuro-imag-
ing provides some evidence for distinct working memory mechanisms underlying skilledprocessing in the musical and linguistic domains (Schulze et al., 2011). However,
individuals who have not received much musical training may initially recruit the
phonological loop to performmusical tasks, thereby engaging the same circuits that play a
role in speech processing. Given that none of our participants were professional
musicians, it is conceivable that they relied on the phonological loopwhenprocessing the
musical stimuli, which could account for the direct link with non-native speech-sound
processing. This suggestion is consistent with neuro-imaging studies showing a
substantial overlap of neural pathways underpinning verbal and tonal working memoryin non-musicians, but a separation of these pathways in musicians, distinguished by
differences in the motor requirements for executing music and speech (Koelsch et al.,
2009; Schulze et al., 2011). Future researchwill have to determinewhether themediating
role of workingmemory is task-dependent or more general, andwhether such potentially
shared mechanisms diverge with increasing musical expertise.
14 Vera Kempe et al.
Musical aptitude versus musical expertise
A number of studies have conceptualized musicality as musical expertise, and suggested
that musical training may hone sensory and cognitive abilities, which, in turn, benefit
non-native speech-sound processing (Marie et al., 2011; Wong et al., 2007). Under thisscenario, one would expect the amount of musical training to exert an effect on
non-native speech-sound processing. In our sample, there was a trend towards a
correlation between musical aptitude and reported years of musical training, which,
however did not reach significance after Bonferroni correction, despite a range of years of
musical training (0–20 years) that is compatible with other studies (e.g., Kraus et al.,
2009). Perhaps the correlationwas not very strong because participants reported years of
musical tuition in school, yet musical school programmes vary in the degree with which
they employ selection by musical aptitude for admission. Still, we did find that musicaltraining was linked with non-native speech-sound processing through the mediating
effect of enhanced spectral processing. However, there was no residual direct link
between musical training and non-native speech-sound processing. Thus, unlike the
AMMAmeasure of musical aptitude, a simple self-report measure of musical expertise did
not seem to tap into the additional mechanisms shared between musical and linguistic
processing that might support non-native speech-sound discrimination over and above
spectral sensitivity. Such an interpretation is supported by findings that, compared to
non-musicians, musicians demonstrate superior performance in auditory sensoryprocessing of speech sounds, presumably due to the strengthening of auditory attention
via cortico-fugal pathways (for an overview see Strait & Kraus, 2011), but do not show
superior performance on awhole range of executive function tasks (Schellenberg, 2011).
Future research will have to determine whether additional mechanisms captured by
musical aptitude tests, such asworkingmemory capacity or executive functioning, reflect
joint task demands or whether musical aptitude – apparently the better predictor of
non-native speech-sound processing – is somewhat independent from the effects of
musical training.In conclusion, our findings show that the link betweenmusical ability and the ability to
discriminate unfamiliar non-native speech sounds is partially mediated by sensitivity to
certain acoustic features relevant formusic and language. The present results suggest that
this mediating role is mainly fulfilled by sensitivity to spectral information, as contained in
complex tones, but not by temporal information, suggesting that the time scale that is
relevant to speech processing is different from the time scale relevant for processing
music. However, the observation that sensitivity to acoustic features only partially
mediated the link between musical ability and non-native speech-sound processingunderscores the importance of studying the role of other potential mediating mecha-
nisms.
Acknowledgement
We thank Christina Grenoble, Gina Martino, and Joseph Rivera for help in collecting data.
References
Besson,M., Sch€on, D.,Moreno, S., Santos, A., &Magne, C. (2007). Influence ofmusical expertise and
musical training on pitch processing in music and language. Restorative Neurology and
Neuroscience, 25, 399–410.
Musical ability and non-native speech-sound processing 15
Boersma, P., & Weenink, D. (2011). Praat: doing phonetics by computer [Computer program]
Version 5316. Available at: http://wwwfonhumuvanl/praat/
Bowles, A. R., Silbert, N. H., Jackson, S. R., & Doughy, C. J. (2011). Individual differences in working
memorypredict second language learning success. Poster presented at the 52ndAnnualMeeting
of The Psychonomic Society, Seattle, WA.
Cattell, R. B., & Cattell, H. E. P. (1973). Measuring intelligence with the culture-fair tests.
Champaign, IL: Institute for Personality and Ability Testing.
Chandrasekaran, B., Sampath, P.D.,&Wong, P.C. (2010). Individual variability in cue-weighting and
lexical tone learning. The Journal of the Acoustical Society of America, 128, 456–465. doi:10.1121/1.3445785
Cutting, J. E., & Rosner, B. S. (1974). Categories and boundaries in speech and music. Perception
and Psychophysics, 16, 564–570. doi:10.3758/BF03198588Deguchi, C., Boureux, M., Sarlo, M., Besson, M., Grassi, M., Sch€on, D., & Colombo, L. (2012).
Sentence pitch change detection in the native and unfamiliar language in musicians and
non-musicians: Behavioral, electrophysical and psychoautistic study. Brain Research, 1455,
75–89. doi:10.1016/j.brainres.2012.03.034Delogu, F., Lampis, G., & Belardinelli, M. O. (2006). Music-to-language transfer effect: May melodic
ability improve learning of tonal languages by native nontonal speakers? Cognitive Processes, 7,
203–207. doi:10.1007/s10339-006-0146-7Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: Musical ability
enhances specific aspects of foreign language perception. European Journal of Cognitive
Psychology, 22, 46–61. doi:10.1080/09541440802708136Golestani, N., & Zatorre, R. J. (2009). Individual differences in the acquisition of second language
phonology. Brain and Language, 109, 55–67. doi:10.1016/j.bandl.2008.01.005Gordon, E. E. (1989). Advanced Measures of Music Audiation. Chicago, IL: GIA.
Goswami, U., Thompson, J., Richardson, U., Stainthorp, R., Highes, D., Rosen, S., & Scott, S. K.
(2002). Amplitude envelope onsets and developmental dyslexia: A new hypothesis.
Proceedings of the National Academy of Sciences USA, 99, 10911–10916. doi:10.1073/pnas.122368599
Halliday, L. F., Moore, D. R., Taylor, J. L., & Amitay, S. (2011). Dimension-specific attention directs
learning and listening on auditory training tasks. Attention, Perception, and Psychophysics, 73,
1329–1335. doi:10.3758/s13414-011-0148-0Hayes, A. F. (2013). Introduction to mediation, moderation and conditional process analysis: A
regression-based approach. New York: The Guilford Press.
Kempe, V., Thoresen, J., Kirk, N. W., Schaeffler, F., & Brooks, P. J. (2012). Individual differences in
the discrimination of novel speech sounds: Effects of sex, temporal processing, musical and
cognitive abilities. PLoS ONE, 7 (11), e48623. doi:10.1371/journal.pone.0048623
Koelsch, S., Schulze, K., Sammler, D., Fritz, T., M€uller, K., & Gruber, O. (2009). Functional
architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping, 30,
859–873. doi:10.1002/hbm.20550
Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Experience-inducedmalleability in neural
encodingof pitch, timbre, and timing.Annals of theNewYorkAcademyof Sciences,1169, 543–557. doi:10.1111/j.1749-6632.2009.04549.x
Kraus, N., Strait, D. L., & Parbery-Clark, A. (2012). Cognitive factors shape brain networks for
auditory skills: Spotlight on auditory working memory. Annals of the New York Academy of
Sciences, 1252, 100–107. doi:10.1111/j.1749-6632.2012.06463.xKuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months.
Developmental Science, 9, F13–F21. doi:10.1111/j.1467-7687.2006.00468.xMagne, C., Schon, D., & Besson, M. (2006). Musician children detect pitch violations in both music
and language better than nonmusician children: Behavioral and electrophysiological
approaches. Journal of CognitiveNeuroscience, 18, 199–211. doi:10.1162/jocn.2006.18.2.199
16 Vera Kempe et al.
Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical
expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive
Neuroscience, 23, 2701–2715. doi:10.1162/jocn.2010.21585Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a
foreign language better than nonmusicians: Behavioral and electrophysiological evidence.
Journal of Cognitive Neuroscience, 19, 1453–1463. doi:10.1162/jocn.2007.19.9.1453Milovanov, R., Huotilainen, M., V€alim€aki, V., Esquef, P. A. A., & Tervaniemi, M. (2008).
Musical aptitude and second language pronunciation skills in school-aged children: Neural
and behavioural evidence. Brain Research, 1194, 81–89. doi:10.1016/j.brainres.2007.11.
042
Musacchia, G., Sams,M., Skoe, E., &Kraus, N. (2007).Musicians have enhanced subcortical auditory
and audiovisual processing of speech and music. Proceedings of the National Academy of
Sciences, 104, 15894–15898. doi:10.1073/pnas.0701498104Nardo, D., & Reiterer, S. M. (2009). Musicality and phonetic language aptitude. In G. Dogil & S. M.
Reiterer (Eds.), Language talent and brain activity. Trends in Applied Linguistics (pp. 213–255). Berlin, Germany: Mouton De Gruyter.
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech-in-noise.
Ear and Hearing, 30, 653–661. doi:10.1097/AUD.0b013e3181b412e9Patel, A. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–681. doi:10.
1038/nn1082
Patel, A. D. (2011). Whywould musical training benefit the neural encoding of speech? The OPERA
hypothesis Frontiers in Psychology, 2, 142. doi:10.3389/fpsyg.2011.00142
Perrachione, T. K., Fedorenko, E. G., Vinke, L., Gibson, E., & Dilley, L. (2013). Evidence for shared
cognitive processing of pitch in music and language. PLoS ONE, 15 (8), e73372. doi:10.1371/
journal.pone.0073372
Posedel, J., Emery, L., Souza, B., & Fountain, C. (2011). Pitch perception, working memory, and
second language phonological production. Psychology of Music, 40, 508–517. doi:10.1177/0305735611415145
Schellenberg, G. E. (2011). Examining the association between music lessons and intelligence.
British Journal of Psychology, 102, 283–302. doi:10.1111/j.2044-8295.2010.02000.xSchulze, K., Zysset, S., M€uller, K., Friederici, A., & Koelsch, S. (2011). Neuroarchitecture of verbal
and tonal working memory in nonmusicians and musicians. Human Brain Mapping, 32, 771–783. doi:10.1002/hbm.21060
Seashore, C. E. (1919). The measurement of musical talent. The Musical Quarterly, January (1),
129–148.Seashore, C. E. (1939). Revision of the Seashore measures of musical talent. Music Educators’
Journal, 26, 31–33. doi:10.2307/3385627Slevc, L. R., & Miyake, A. (2006). Individual differences in second-language proficiency: Does
musical ability matter? Psychological Science, 17, 675–681. doi:10.1111/j.1467-9280.2006.01765.x
Strait, D. L., Hornickel, J., &Kraus, N. (2011). Subcortical processing of speech regularities underlies
reading and music aptitude in children. Behavioral and Brain Functions, 7, 44. doi:10.1186/
1744-9081-7-44
Strait, D., & Kraus, N. (2011). Playing music for a smarter ear: Cognitive, perceptual and
neurobiological evidence. Music Perception, 29, 133–146. doi:10.1525/mp.2011.29.2.133
Surprenant, A. M., & Watson, C. S. (2001). Individual differences in the processing of speech and
nonspeech sounds by normal-hearing listeners. The Journal of the Acoustical Society of
America, 110, 2085–2095. doi:10.1121/1.1404973Tallal, P. (1980). Auditory temporal perception, phonics, and reading disabilities in children. Brain
and Language, 9, 182–198. doi:10.1016/0093-934X(80)90139-XToscano, J. C., & McMurray, B. (2010). Cue integration with categories: Weighting acoustic cues in
speech using unsupervised learning and distributional statistics. Cognitive Science, 34, 434–464. doi:10.1111/j.1551-6709.2009.01077.x
Musical ability and non-native speech-sound processing 17
Williamson, V. J., Baddeley, A. D., & Hitch, G. J. (2010). Musicians’ and nonmusicians’ short-term
memory for verbal and musical sequences: Comparing phonological similarity and pitch
proximity. Memory and Cognition, 38, 163–175. doi:10.3758/MC.38.2.163
Wing, H. D. (1968). Tests of musical ability and appreciation: An investigation into the
measurement, distribution, and development of musical capacity (2nd ed.). London, UK:
Cambridge University Press.
Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human
brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10, 420–422. doi:10.1038/nn1872
Zatorre, R. J., Belin, P., &Penhune, V. B. (2002). Structure and function of auditory cortex:Music and
speech. Trends in Cognitive Sciences, 6, 37–46. doi:10.1016/S1364-6613(00)01816-7Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory
processing. Psychology and Aging, 27, 410. doi:10.1037/a0024816
Received 8 December 2013; revised version received 8 August 2014
Appendix
Norwegian words presented as non-native speech stimuli in the Experiment
Contrast Short vowel Long vowel
Tonal Bøtter [name] – bøtter [buckets]
gullet [gold] – gulle [to fluke]
legget [ID (slang)] – legge [to put]
rakket [dog] – rakke [to botch]
bøter1 [fines] – bøter2 [to repent]
gulet [wind] – gule [yellow]
l€aget [state (slang)] – lege [GP]raket [wreck] – rake [rake]
Vowel rykk [pull] – rikk [budge]syll [joist] – sild [herring]
mytt [molted] – mitt [mine]
lynn [to soften] – lind [lime tree]
ryk [to smoke] – rik [rich]
syl [awl]– sil [sieve]
myt [to moult] – mit [mite]
lyn [lightning] – lin [linen]
18 Vera Kempe et al.