l2 learners’ assessments of accentedness, fluency, and comprehensibility of native and nonnative...

Language Learning ISSN 0023-8333

L2 Learners’ Assessments of Accentedness,

Fluency, and Comprehensibility of Native

and Nonnative German Speech

Mary Grantham O’BrienUniversity of Calgary

In early stages of classroom language learning, many adult second language (L2) learnerscommunicate primarily with one another, yet we know little about which speech streamcharacteristics learners tune into or the extent to which they understand this lingua francacommunication. In the current study, 25 native English speakers learning German as aL2 with varying levels of German proficiency rated German speech produced by nativespeakers and fellow learners of German along three continua: accentedness, fluency, andcomprehensibility. An examination of speech stream (i.e., phonological, fluency based,and lexical/grammatical) characteristics along with partial correlations indicates boththat the raters distinguished among the three concepts but that they conflated the termfluency with proficiency. Self-reported proficiency in German and linguistic trainingwere the best predictors of the ratings assigned.

Keywords accentedness; fluency; comprehensibility; L2 speech; German; L2 raters

Introduction

Adults choose to learn a second language (L2) for a variety of reasons. Whilesome may wish to spend time in the target country and/or interact with na-tive speakers, others may primarily be in contact with other L2 learners overthe course of their language learning careers. Whether their ultimate goal inlearning the L2 is to speak with native speakers or not, one thing is clear: It

I would like to thank Mike Ryszka, Lisa Parker, and Allison Bajt for their assistance in data

collection and Tak Fung for his assistance in data analysis. I would also like to thank the three

anonymous reviewers and Associate Editor Pavel Trofimovich for their helpful comments. Any

errors that remain are my own.

Correspondence concerning this article should be addressed to Mary Grantham O’Brien,

Department of Linguistics, Languages and Cultures, University of Calgary, C205 Craigie Hall,

2500 University Drive NW, Calgary, AB T2N 1N4 Canada. E-mail: [email protected]

Language Learning 64:4, December 2014, pp. 715–748 715C© 2014 Language Learning Research Club, University of MichiganDOI: 10.1111/lang.12082

O’Brien L2 Learner Assessments

is crucial that learners understand the speech of fellow L2 speakers, includingthose in their L2 classrooms. In light of this, the current contribution situatesitself within the literature on lingua franca communication (e.g., Jenkins, 2002,2009, 2012; Jenks, 2013; Matsumoto, 2011). As such, it values the assessmentsof L2 learners, who are legitimate “international speakers” (Jenkins, 2002, p.85) with judgments that deserve to be understood in the absence of those fromnative speakers.

L2 learners’ ability to understand and appreciate the speech of fellow L2learners is related to several factors including, among others, the accentedness,fluency, comprehensibility, and intelligibility of the speech that is produced.Accentedness has been defined as a listener’s perception of “how closely thepronunciation of an utterance approaches that of a native speaker” (Kennedy& Trofimovich, 2008, p. 461). Fluency “denotes listeners’ assessments of howsmoothly and rapidly an utterance is spoken” (Isaacs & Trofimovich, 2011, p.114), and comprehensibility is “the ease or difficulty with which a listener un-derstands L2 accented speech” (Derwing, Munro, & Thomson, 2008, p. 360).Comprehensibility and intelligibility are both measures of understanding ofspeech, but they differ in their operationalization: Comprehensibility studiesrely on raters’ judgments of perceived understanding on a scale, but intelligibil-ity studies require listeners to provide evidence of understanding (e.g., throughtranscriptions or by answering questions about the content of a speech sample).Previous research has shown that, although the concepts of accentedness andcomprehensibility may be related, a speech sample that is perceived to have astrong accent is not necessarily difficult to understand (e.g., Derwing & Munro,1997; Munro & Derwing, 1995a, 1995b). It has also been shown that relativelydisfluent speech may be quite comprehensible (Derwing et al., 2008). Levis(2005) therefore distinguishes between the principles of nativeness and intel-ligibility. That is, while we may expect the speech of adult L2 learners to beaccented (i.e., nonnativelike in its pronunciation) and to contain disfluencies, itis still possible for interlocutors to understand this speech (for a critical reviewof first language [L1] listener biases toward L2 speech, see Lindemann & Sub-tirelu, 2013). Given the importance of listener judgments in communication,the concepts of accentedness, fluency, and comprehensibility1 are central to thecurrent study.

Research investigating the concepts of accentedness, fluency, and compre-hensibility has flourished in the past two decades. In spite of this, there is ageneral tendency for studies to focus on L2 English speech. Investigations intoother languages are primarily accentedness studies, and several of these focuson L2 German (e.g., Gut, 2009; Molnar, 2010; Moyer, 1999; O’Brien, 2004;

Language Learning 64:4, December 2014, pp. 715–748 716


Wilkerson, 2010). A few additional studies examine fluency among L2 learnersof other languages, including L2 Dutch in Cucchiarini, Strik, and Boves (2002)and de Jong, Steinel, Florijn, Schoonen, and Hulstijn (2013) and L2 French inFreed (1995) and Prefontaine (2013). In addition, much of the work done to daterelies on the judgments of native-speaker raters (but see Derwing & Munro,2013; Kormos & Denes, 2004; Major, 2007; Munro, Derwing & Morton, 2006;Neufeld, 1980; Rossiter, 2009). Derwing and Munro (2005) and Munro et al.(2006) therefore call for studies that involve L2 learners as raters. Previousresearch has found that the judgments of L2 learners are often correlated withthose of other L2 learners and native speakers (e.g., Major, 2007; Munro et al.,2006), but L2 learners may be harsher (i.e., they may rate the speech of fellowL2 learners as more accented, less fluent, or less comprehensible) than nativespeakers in their ratings (e.g., Rossiter, 2009). These differences may be re-duced among highly proficient L2 learners (e.g., Prefontaine, 2013; Wilkerson,2013).

The current study has two main goals. The first is to determine the extentto which L2 learners of German in a nonimmersion setting rely upon speechstream characteristics falling into the broad categories of phonology, fluency,and a select set of linguistic resources when rating the speech of native speakersand fellow English–German L2 learners (native English speakers who learnedL2 German) along the continua of accentedness, fluency, and comprehensibil-ity. The second objective is to ascertain the extent to which individual ratercharacteristics including, among others, L2 proficiency, age of learning, andmusical training, play a role in the ratings assigned.

Background to the Study

AccentednessA considerable body of research on accentedness has investigated how listenercharacteristics and speech stream cues relate to listener perception of accent.Although Munro (1995) suggests that there is no need to provide raters withspecial training in the rating of accents because of the salient speech streamcharacteristics associated with accented speech (e.g., slower speaking ratesalong with different intonation and timing patterns), listeners may vary in theratings they assign based on various experiences and abilities. For example,Kang (2012) found that teaching experience played a significant role in theaccentedness ratings assigned to speech samples produced by internationalteaching assistants. Similarly, musically trained raters may be more sensitive to

717 Language Learning 64:4, December 2014, pp. 715–748


foreign accents than those without such training (Isaacs & Trofimovich, 2011).In Thompson (1991), raters with linguistic training who had also learned at leastone L2 rated the speech of L2 learners to be less accented than those withoutsuch training, but the phonetically trained raters in Flege (1984) showed greatersensitivity to accented speech than those without such training. Neufeld (1980)found that nonnative listeners are able to detect foreign accents, although it hasbeen shown that the sensitivity to speech stream cues may increase with greaterproficiency in the L2 (Wilkerson, 2010). In fact, people are able to detect andreliably rate foreign accents in languages they do not know (Major, 2007).Munro, Derwing, and Burgess (2010) determined that even listeners ratingutterances presented backwards could classify the samples as being spoken bya native or a nonnative speaker.

Researchers who have examined various speech stream cues to determinewhich are best correlated with ratings of foreign accentedness have foundthat listeners are tuned into cues in two main areas: individual speech segmentsand prosody (including syllable duration, reduction, speech rate, stress, rhythm,and intonation). Although some work investigating accentedness focused on thesegment as the primary locus of nonnativelikeness (e.g., Bohn & Flege, 1992;Major, 1987; Moyer, 1999, 2011), much recent research seeks to determinethe relative contributions of segmental and suprasegmental cues in markingL2 speech as accented. Syllable structure errors were a primary predictor ofaccentedness ratings in Trofimovich and Isaacs (2012). Baker, Trofimovich,and Mack (2004), Calla McDermott (1986), and Trofimovich and Isaacs (2012)found nonnativelike rhythm to be a salient characteristic of accented L2 speech,and Moyer (2011) and Kang (2010) point to the role played by stress errors. Acommon finding is that speech samples with a slower speech rate are often ratedas more accented (e.g., Calla McDermott, 1986; Chen, 2010; Kang, Rubin, &Pickering, 2010; Munro, 1995; Munro et al., 2010; Riggenbach, 1991, 2000).Pausing is also a key predictor of accentedness ratings (e.g., Kang, 2010;Trofimovich & Baker, 2006), and Kang (2010) found that a reduced pitchrange best predicted accentedness ratings. In spite of the wide variety of speechstream characteristics associated with accentedness, most studies have foundthat listeners do indeed focus on pronunciation in assigning their ratings.

FluencyLennon (1990) proposed two potential definitions of the term fluency. Thebroad definition is used as a “cover term for oral proficiency” (Lennon, 1990,p. 389) that corresponds to “smooth” and “effortless” language use (Crystal,1987, p. 421). The narrow definition, on the other hand, which is adopted in



this article, focuses on the fluidity of speech and is operationalized throughtemporal measures such as speech rate, hesitations, and pausing. Fluency ofspeech may have a profound effect on ease of communication, such that dis-fluent speakers may have difficulty keeping the attention of their interlocutors,which may in turn have a negative impact on communicative success (Rossiter,2009). Moreover, as noted by Derwing et al. (2008), “fluency problems canbe exasperating for L2 speakers” (p. 360). Disfluencises (i.e., nonpathologicalinterruptions in the speech stream like pauses or fillers that affect the fluidityof utterances) are often used by speakers when they are looking for a word,deciding what to say, or to indicate a desire to keep or cede the floor (Clark &Fox Tree, 2002).

A variety of types of listeners have been found to reliably rate speechfluency. These range from relatively expert raters (i.e., linguists, teachers,speech therapists) in Cucchiarini et al. (2002) to untrained native speakers(Derwing, Rossiter, Munro, & Thomson, 2004) and L2 learners (Riggenbach,1991; Rossiter, 2009). Raters from a variety of backgrounds may rely uponsimilar cues in the speech stream (e.g., Kormos & Denes, 2004). For ex-ample, Prefontaine (2013) found that French L2 learners’ self-perceptions offluency were moderately correlated with native French listeners’ fluency rat-ings. Nonetheless, L2 learners may rate the speech of fellow L2 learners as lessfluent than native speakers do (Rossiter, 2009).

Native and nonnative listeners tend to rely on temporal aspects of speechwhen assigning ratings. Most commonly these include (pruned) speech rate andpausing phenomena (e.g., Derwing et al., 2004; Lennon, 1990; Riggenbach,1991; Rossiter, 2009). Cucchiarini et al. (2002) found that speech rate was thebest predictor of fluency among beginner learners, that mean length of runbest corresponded to fluency ratings among intermediate learners, and that thenumber (as opposed to length) of unfilled pauses affected perceived fluency.On the other hand, de Jong and Perfetti (2011) found that pause length alongwith phonation/time ratio and mean length of fluent run correlated with fluencyratings. In their study of L2 learners of Dutch, de Jong et al. (2013) foundthat the number of pauses along with repetitions and repairs were related toperceived fluency ratings. This differs from previous work by Kormos andDenes (2004), who found that listeners’ ratings of fluency were not affectedby pause frequency. Instead, the raters in that study focused more on temporalcharacteristics including speech rate, mean length of utterance, phonation timeratio, and the number of stressed words/minute. The fluency ratings assigned inRossiter’s (2009) study were also affected by nontemporal measures includingpronunciation, grammar, and vocabulary. It may be that listeners in that study



made use of the broad definition of fluency that equates fluency with proficiency.Although many studies investigating speech fluency have found that commondisfluencies are often associated with ratings assigned to speech samples, thefeatures associated with disfluent speech may also include those related to amore proficiency-oriented view of fluency.

ComprehensibilityRecent work investigating L2 speech production has set comprehensibilityover nativelikeness as a more realistic and important goal for L2 learners interms of the role it plays in “communicative success” (Derwing & Munro,2009, p. 183), and it has sought to determine the effects of listener back-ground on comprehensibility ratings. Some studies rely on naıve listeners torate speech for comprehensibility, and it is generally reported that they provideratings that do not differ quantitatively from those of more expert raters. Munroet al. (2006), in a study investigating the role of listener background, foundthat both L2 learners (whose L1 was Cantonese, Japanese, or Mandarin) andnative English listeners showed similar levels of comprehension of L2 Englishspeech. They thus argued that there may be a “basic underlying ability to com-prehend accented speech that is shared by most people” (p. 128). Nonetheless,the Cantonese and Japanese listeners in that same study rated the speech sam-ples produced by native speakers of their own L1s as easier to understand thanthose produced by other speaker groups, including native speakers of the targetlanguage. Comprehensibility studies performed to date have not investigatedL2 listener characteristics—beyond L1 background—in listeners’ perceivedcomprehensibility of speech.

When rating speech for comprehensibility, listeners may focus on segmentaland suprasegmental phenomena (e.g., Caspers, 2010; Derwing & Munro, 1997).Trofimovich and Isaacs (2012) pointed to the role of stress assignment, andKang (2010) found that speed of delivery also affects comprehensibility ratings.As has been shown in some fluency studies, other aspects of speech, whichtypically fall outside the realm of pronunciation, have also been associatedwith comprehensibility. These include grammatical accuracy (Trofimovich &Isaacs, 2012; Varonis & Gass, 1982) and lexical richness (Trofimovich &Isaacs, 2012). In sum, previous accentedness, fluency, and comprehensibilitystudies have demonstrated that nonnative speakers make up a legitimate ratergroup. To date, only Derwing and Munro (2013) have investigated L2 learners’judgments along all three continua, and their study was performed with L2English listeners.



The Current Study

The current study extends previous research in the following ways. First, itexamines the ratings of speech samples produced by learners who have learnedan L2 other than English, in this case, L2 German. Second, it investigates thejudgments made by nonnative listeners with the goal of determining whichfactors influence the ratings of L2 utterances produced by fellow L2 learnerswhose L1 is also English. Situated within a lingua franca context, that is,a context in which, as Jenkins (2012) puts it, the classroom L2 variety isa “rich variety” (p. 491) that exists “in its own right” (p. 488), this studyseeks to determine which factors—both individual listener factors and speechstream characteristics—affect nonnative listener judgments of speech. Twomain research questions guide this study:

1. Which speech sample characteristics are associated with the ratings assignedto native and nonnative German speech samples along the three continua ofaccentedness, fluency, and comprehensibility?

2. Which listener characteristics, if any, predict the ratings assigned to nativeand nonnative German speech?

Given the results of previous studies, it is expected that English–GermanL2 learners will be able to distinguish native from nonnative speech on all threecontinua and that they will rely on similar speech stream characteristics as havebeen reported in studies on L2 English speech. In addition, it is expected thatthe L2 learners will rate the speech samples more harshly for accentedness andfluency than for comprehensibility. That is, they will provide comprehensibilityratings that indicate that they are able to understand despite rating the samespeech samples as being quite accented and/or disfluent, as has been shownpreviously (e.g., Munro & Derwing, 1995a, 1995b). In addition, it is expectedthat more proficient learners and those with a musical background will provideaccentedness ratings indicating that speech samples produced by L2 learnersare heavily accented, based on the results of Wilkerson (2010) and Isaacs andTrofimovich (2011).

Method

Speech SamplesForty-eight speech samples were used in the current study. Twenty-four ofthem were produced by native English learners of L2 German (12 males, 12females), all of whom were residing in Germany at the time of recording.



Most of them were taking part in a study-abroad program. These participantswere between the ages of 20 and 31 years (M = 22.8 years), and they hadbeen living in Germany for 17 months on average (0–126 months). All of theseparticipants completed a 30-point proficiency exam in German (Goethe Institut,2004), and 12 of the participants received a score that placed them at the B2level of proficiency (intermediate), based on Common European Frameworkof Reference for languages (Council of Europe, 2001) scores. The remainingparticipants exhibited advanced (C1 or C2) proficiency on the exam.2 The24 native speakers (6 males, 18 females) who produced speech samples werebetween the ages of 21 and 30 years (M = 23.9 years). They were all fromthe greater Frankfurt area, and they were recorded at a university in centralGermany.

All participants were asked to provide a description of the series of picturesknown as the Suitcase Story, which has been used in a number of previousstudies investigating the notions of accentedness, fluency, and comprehensi-bility (e.g., Derwing et al., 2008; Derwing & Munro, 2009; Trofimovich &Isaacs, 2012). The story was used with the permission of the original principalinvestigator, Tracey Derwing, and is also downloadable from the IRIS reposi-tory (http://www.iris-database.org/iris/app/home/index). It depicts a man and awoman who are carrying identical suitcases; the two bump into each other ata corner and drop their suitcases, and it is only later when they are unpackingthat they both realize that they picked up the other’s suitcase. Participants toldthe story twice. The first time permitted them to practice telling the story sothat they were familiar with the basic storyline. Only the second telling wasrecorded. As is customary, the first 20 seconds of participants’ second renditionof the story (excluding initial hesitations and disfluencies) were extracted fromthe recording and used (e.g., Derwing et al., 2008).

The speech samples were analyzed for a variety of speech stream character-istics, which fall into three basic categories that have been previously employedin the literature (Iwashita, Brown, McNamara, & O’Hagan, 2008; Trofimovich& Isaacs, 2012). These categories include the following: phonology, both seg-mental and suprasegmental; fluency, which includes both temporal measuresand pausing/hesitation phenomena; and a selection of linguistic resources span-ning grammar and lexis. All 48 speech samples were transcribed and coded fora total of 12 speech stream characteristics outlined below. The coding was per-formed independently by two phonetically trained raters, and initial agreementwas 91%. All discrepancies were resolved through consensus.



PhonologyThis category included four measures: two segmental (segmental and pho-netic realization errors) and two suprasegmental (word stress errors and pitchrange). Absolute numbers of errors were adjusted for speech sample length (i.e.,number of errors/number of word tokens) for each of the first three variables.Examples provided come from the current database:

1. Segmental errors: phonemic substitutions (e.g., Koffer “suitcase” spoken asKoffen).

2. Phonetic realization errors: all errors resulting from L1 transfer in the real-ization of the following salient segments, which have been shown to causedifficulty for and be associated with accented speech in the productions ofEnglish–German L2 learners (e.g., Ducate & Lomicka, 2009; Moyer, 1999;O’Brien, 2004): [Â] realized as North American English [®], syllable-finaldevoiced obstruents [p, t, k, f, s] realized as their voiced counterparts [b, d,g, v, z], and palatal and velar fricatives [ç, x] realized as [S] or [k].

3. Word stress errors: any missing or misplaced primary stress assignment inwords containing more than one syllable (e.g., AUF-ge-stan-den “stood up”spoken as auf-ge-STAN-den). This analysis was performed auditorily.

4. Pitch range: the difference between the highest and lowest fundamentalfrequency value in each speech sample, measured for the entirety of each20-second speech sample in Praat (Boersma & Weenink, 2011). The outputdata (in Hz) were normalized to Equivalent Rectangular Bandwidth valuesalong the lines of Barry (2007) and Daly and Warren (2001). It has beenproposed that native speakers of American English learning German as aL2 produce German utterances with a wider pitch range than do Germannative speakers (Jilka, 2007).

FluencyThere were five measures of fluency. The first three were adjusted for speechsample length as above:

1. Total number of filled pauses: every nonlexical pause in the data (e.g., zweiLeute uhm sind zusammen “two people uhm are together”).

2. Total number of unfilled pauses: every silent pause longer than 400 mil-liseconds, along the lines of Derwing et al. (2004).

3. Repetitions and self-corrections: the total number of syllables repeated eitherin the absence of a grammatical correction (e.g., und sie sie sind “and theythey are”) or in the presence of a grammatical correction (e.g., sie haben



zusammen ihre ihren Koffern “they have together their [targetlike accusativecase] their [incorrect accusative case] suitcases”).

4. Speech rate: the total number of syllables divided by the total duration ofthe recording.

5. Articulation rate: the total number of syllables divided by the phonationtime (i.e., with pauses removed).

Speech and articulation rates were extracted automatically in Praat(Boersma & Weenink, 2011) with a script by de Jong and Wempe (2008)that detects syllable nuclei and measures speech and articulation rate.

Linguistic resourcesThe category of linguistic resources included three categories, all of whichwere normalized for speech sample length:

1. Syntactic errors: errors involving word order. Each word that was misplacedin the utterance counted as an error (e.g., in eine große Stadt es gibt “in abig city there is,” which is realized by native speakers as in einer großenStadt gibt es).

2. Morphological errors: errors involving inflectional morphology. Each incor-rect morpheme was counted as an error (e.g., die gleichen Koffer “the samebags” spoken as die gleich_ Koffern contains two morphological errors,underlined here).

3. Lexical errors: incorrect German words (e.g., gefallen “fell” spoken asgefahren “traveled”3), or English words substituted for German words (e.g.,verwirrt “confused” spoken as confused).

Although the category of linguistic resources often includes token and/ortype frequency, the total number of words produced was taken into consid-eration in the normalization process. The lexical, morphological, and syn-tactic errors made by the native speakers were considered errors insofar asthey deviated from the prescribed classroom standard. For example, swal-lowed case endings or word orders that are not verb-second have respectivelybeen deemed morphological and syntactic errors in the speech of the nativespeakers in the current study. A summary of the extent to which the sam-ples produced by the L2 speakers and the native speakers differ is provided inTable 1.

ListenersListeners in the current study were 25 English–German L2 learners (10 males)between the ages of 18 and 31 (M = 21.08 years) studying German in Canada at



Table 1 Means and standard deviations (in parentheses) for speech sample characteris-tics

Native English-GermanCharacteristics Speakers L2 Learners

Segmental errors∗∗∗ 0 .01 (.02)Phonetic errors∗∗∗ 0 .14 (.14)Word stress errors∗∗∗ 0 .00 (.01)Pitch range∗∗ 7.81 ERB (1.38 ERB) 8.25 ERB (1.20 ERB)Filled pauses∗∗∗ .01 (.02) .10 (.10)Unfilled pauses∗∗∗ .01 (.02) .04 (.05)Self-corrections/Repetitions∗∗∗ .02 (.03) .02 (.03)Speech Rate∗∗∗ 3.54 syll/s (.53) 2.92 syll/s (.57)Articulation Rate∗∗∗ 15.76 syll/phon. time

(6.78)14.84 syll/phon. time

(17.89)Syntactic Errors∗∗∗ .00 (.007) .01 (.14)Morphological Errors∗∗∗ .00 (.008) .09 (.09)Lexical Errors∗∗∗ .00 (.006) .03 (.03)

Note. ∗∗p < .01; ∗∗∗p < .001.

the time of the study. Although no attempt was made to control for immersionexperience, none had spent time in a German-speaking area of the world.Proficiency scores are based on the results of the same 30-point proficiency test(Goethe Institut, 2004) completed by the speakers who produced the speechsamples. Listeners self-assessed their German knowledge on a 4-point scale (4= “I can speak German proficiently,” 3 = “I can get around comfortably inGerman,” 2 = “I can communicate at a basic level in German,” and 1 = “Ihave heard German before but I didn’t know what was being said”). Becausefamiliarity with foreign accents may affect the ratings one assigns to speech(e.g., Calla McDermott, 1986; Flege, Frieda, & Nozawa, 1997) and becauselisteners in the current study primarily only interacted with nonnative speakersof German, it was deemed appropriate to determine whether the frequencyof interactions with nonnative speakers in listeners’ L1 (English) might affecttheir ratings of L2 speech. Listeners provided answers on the following scale:0 = “I never interact with nonnative speakers,” 1 = “I interact with nonnativespeakers about once a month,” 2 = “I interact with nonnative speakers aboutonce a week,” 3 = “I interact with nonnative speakers about once a day,” 4 =“I interact with nonnative speakers more than once a day.” Relevant listenercharacteristics are provided in Table 2.



Table 2 Listener characteristics

Rater Characteristic Mean Range

Age of Learning 19.36 15–26 yearsLength of German Study 2.08 1–7 yearsProficiency Score 13.36 7–30 points (CEFR

A2-C2)Self-assessed German

Proficiency2.28 2–3

Additional LanguageSpoken

1.24 0–3

Interactions with L2Speakers of English

3.85 between once a dayand more than oncea day

Interactions with L2Speakers of German

2.60 a few times a week

Linguistic Training 8 participantsMusical Training 16 participants

Note. CEFR = Common European Framework (Council of Europe, 2001), with sixlevels from basic (A1) through advanced (C2).

TaskListeners sat in a quiet room, where they performed three separate ratingtasks—accentedness, fluency, and comprehensibility—in Psyscope X on aniMac computer. This means that they rated each speech sample separately alongeach continuum and that they heard each sample a total of three times. Theorder of the three rating tasks was counterbalanced across participants.4 Eachtask first began with a set of instructions that explained the task and presentedthe listeners with the 9-point rating scales, which have become “entrenched asa research convention” in research investigating the constructs of accentedness,fluency, and comprehensibility (Isaacs & Thomson, 2013, p. 135). The scales,which are similar to those used in previous studies investigating the constructs(e.g., Munro & Derwing, 1998; Munro et al., 2010; Rossiter, 2009), were asfollows: foreign accentedness (from 1 = no accent to 9 = extremely strongaccent), fluency (from 1 = extremely fluent to 9 = extremely disfluent), andcomprehensibility (from 1 = very easy to understand to 9 = extremely difficultto understand). In order to make sure that listeners understood the task and eachrating scale, they were asked to rate three 20-second practice sound files beforethey began rating along a given scale. During this familiarization session,



listeners were informed that the accentedness scale was a scale of foreignaccenteness and that they should, as such, be listening to speech characteristicsthat indicated the pronunciation of a speech sample “deviated from nativelikenorms.” When performing the fluency ratings, they were given instructions tofocus on disfluencies (e.g., hesitations, pauses). For comprehensibility, theywere instructed to focus only on the extent to which they actually understoodthe speech samples when rating them. While this familiarization may havehad an effect on the ratings assigned, acquainting raters with rating scalesand associated speech stream characteristics is common (e.g., Derwing et al.,2008; Rossiter, 2009). The samples used for this familiarization phase wereproduced by speakers who were similar to those who produced the targetsamples. Listeners were given feedback on the ratings they provided to thesample sound files. They then proceeded to rate each speech sample, whichwas presented over loudspeakers. The presentation of speech samples wasrandomized. After listeners had completed one rating task they were given theopportunity to take a break before continuing on to the next task. Participantsalso completed the language background questionnaire and the proficiency testduring the meeting. In total, the experiment took approximately 60 minutes perparticipant.

Data AnalysisData were analyzed using an unbalanced repeated-measures design with in-terval outcome variables using a Generalized Estimating Equation (i.e., GEEunder Genlin procedures in SPSS v.20) to determine whether and the extent towhich English–German L2 learners were able to distinguish native from non-native speech along each of the three continua. More specifically, though, theanalyses were run to determine: (a) which speech sample properties predict theratings assigned and (b) whether the same listener characteristics are related tothe ratings assigned to native and nonnative speech. Because the speech sam-ples were rated three times by the same raters, there are potential correlationsamong ratings assigned for accentedness, fluency, and comprehensibility. Thedata from this study met the basic assumptions of GEE analyses: The outcomeswere correlated and unbalanced, and the distribution was normal. In addition,the model allows for the addition of predictor variables including listener char-acteristics such as age of learning, proficiency score, self-assessed Germanproficiency, length of German study, number of languages spoken, frequencyof interactions with nonnative speakers of English in English, frequency ofinteractions with nonnative speakers of German in German, whether or notlisteners had enrolled in a linguistics course, and musical training.



Partial correlations were run to determine the extent to which ratings alongthe three continua were related. Unlike a traditional correlational analysis, thepartial correlation allows for an investigation of the relationship among vari-ables (i.e., ratings along the three continua) while removing the effects of othervariables (i.e., controlling for the effects of individual speakers). Because theanalysis of speech stream characteristics relied on a large number of potentialpredictor variables, it was important to first determine the relationship amongfactors. Exploratory factor analysis also using SPSS v.20 was used to extractfactors for the independent variables (speech stream characteristics) and for thedependent variables (ratings). This was done separately for native and nonna-tive speech samples, as the samples differed in the extent to which speakersrealized various speech stream characteristics. The data were analyzed usingthe orthogonal rotation (Varimax) on the independent variables, and the .40criterion was used as a cutoff for identifying the loadings on each factor.

Results

Given that L2 learners rated each sample three times, it was essential to carryout an analysis that accounts for the interdependence of observations. Shek andMa (2011) therefore propose an alternative to coefficients of internal consis-tency that are based on generalized linear models that violate the assumption ofindependence of observations. The resulting variable, the intraclass correlationcoefficient (ICC), “describes the amount of variance in the outcome that is at-tributed to differences between individuals” (p. 50). It is calculated as the inter-cept variance [subject ID]/(intercept variance + residual). An ICC that is .25 orabove is deemed acceptable. The L2 learners were relatively consistent in theirratings of native speech (ICCaccent = .57, ICCfluency = .22, ICCcomprehensibility =.39) but less consistent in their ratings of the speech of other L2 learners(ICCaccent = .15, ICCfluency = .08, ICCcomprehensibility = .22). Thus, L2 learnersvaried a great deal in the ratings they assigned to the speech samples. There-fore, it was essential to determine which speech stream characteristics and ratervariables predict the ratings assigned.

The L2 learners who rated the speech samples distinguished the speech ofL2 learners from that of native speakers on all three scales. The L2 learnersjudged the speech of fellow German language learners to be significantly moreaccented, B = −2.32, SE = .48, X2(1) = 23.64, p < .001, less fluent, B = −2.92,SE = .12, X2(1) = 621.2, p < .001, and less comprehensible, B = −1.66, SE= .30, X2(1) = 30.91, p < .001, than that produced by native speakers of



Table 3 Results of partial correlations controlling for speaker

German Native Speakers English-German L2 Learners

Fluency Comprehensibility Fluency Comprehensibility

Accent .15∗ .31∗∗ .17∗∗ .35∗∗

Fluency .29∗∗ .36∗∗

Note. ∗p < .01; ∗∗p < .001.

Table 4 Factor loadings for native speaker productions

Morphology / Filled Pauses/ Pitch/Characteristics Rate Disfluencies Syntax/Lexicon Unfilled Pauses

Speech Rate .94Articulation Rate .90Morphological Errors .91Corrections / Repetitions .85Filled Pauses .78Syntactic Errors −.61Lexical Errors .59Pitch Range .88Unfilled Pauses .62

German. An analysis of partial correlations between each pair of ratings, whilecontrolling for speaker (shown in Table 3), indicates that ratings along all threecontinua were significantly correlated for L2 learners’ ratings of German nativespeakers and for L2 speakers.

Speech Sample CharacteristicsGiven the correlations among the ratings assigned along all three continua, itwas important to determine whether the constructs of accentedness, fluency, andcomprehensibility were indeed distinct concepts for the raters and whether the12 potential speech stream characteristics differed in the extent to which theypredicted the ratings assigned to native and nonnative speech. Native speakersdid not make phonological, phonetic, or stress errors, so nine potential variableswere included in the factor analysis. A four-factor solution that included allnine variables provided clean loadings, which are provided in Table 4. Thefactors have been named according to the shared characteristics.



Table 5 Factors predicting ratings sssigned to native speech

Accentedness Fluency Comprehensibility

Rate ns ∗∗∗ ∗

Morphology/Disfluencies ∗∗∗ ∗∗∗ ∗∗∗

Filled Pauses/Syntax/Lexicon ns ∗∗∗ ∗

Pitch/Unfilled Pauses ns ns ns

Note. ∗p < .05; ∗∗p < .01; ∗∗∗p < .001.

These factors were then used to determine which bundles of speech streamcharacteristics the listeners considered when they rated the speech of the na-tive speakers. For accent, only the factor morphology/disfluencies significantlypredicted the ratings. Listeners rated native speech samples that contained acombination of morphological errors and disfluencies as more accented thanthose with fewer such errors and disfluencies, B = 6.96, SE = 1.63, X2(1) =18.23, p < .001. For fluency, three factors significantly predicted the outcome:Speech samples spoken more slowly, B = −.07, SE = .01, X2(1) = 81.91, p <

.001, those that contained a combination of morphological errors and disflu-encies, B = 6.41, SE = 1.33, X2(1) = 23.18, p < .001, and those with acombination of lexical and syntactic errors and filled pauses, B = 9.99, SE =2.43, X2(1) = 16.90, p < .001, were rated as less fluent. Comprehensibilityscores patterned with fluency scores: The combined factors of rate, B = −.03,SE = .01, X2(1) = 5.72, p = .017, corrections/repetitions/morphological errors,B = 6.44, SE = 1.74, X2(1) = 13.76, p < .001, as well as lexical and syntacticerrors and filled pauses, B = 6.03, SE = 2.92, X2(1) = 4.27, p = .039, wereassociated with less comprehensible speech. A summary of these findings isprovided in Table 5. The final factor—pitch range and unfilled pauses—did notsignificantly predict any of the ratings assigned.

The factor loadings of the 12 potential variables for ratings of L2 learners’speech are provided in Table 6. A five-factor solution was obtained, and all ofthe variables loaded onto the model. The resulting factors have been namedaccording to the shared characteristics.

These factors were then used to determine which combinations of speechstream characteristics the listeners relied on when they rated the speech offellow L2 learners. For accent, all of the combined factors except articula-tion rate/morphology in Table 6 predicted higher accent ratings, indicating thatlisteners perceived the speech samples spoken more slowly and those contain-ing more errors, pauses, and corrections and repetitions to be more foreign



Table 6 Factor loadings for English–German L2 speaker productions

Phonetics/ Phonology/ Articulation Rate/ PitchFluency Syntax/ Pauses Grammar Morphology Range

Phonetic Errors .88Speech Rate −.76Filled Pauses .66Phonological Errors .83Syntactic Errors .83Unfilled Pauses −.57Corrections/Repetitions .72Stress Errors −.63Lexical Errors .54Articulation Rate .84Morphological Errors −.59Pitch Range .91

accented. The combined phonetic errors/speech rate/filled pauses factor pre-dicted higher accentedness ratings, B = −.99, SE = .20, X2(1) = 24.98, p <

.001. In addition, the combined phonological/syntactic errors/unfilled pausesfactor predicted higher accentedness ratings, B = .11, SE = .04, X2(1) = 6.31,p = .012. High accentedness ratings were also predicted by the factor made upof lexical and stress errors, corrections and repetitions, and disfluencies, B =9.34, SE = 1.94, X2(1) = 23.33, p < .001. Finally, those speech samples con-taining a wider pitch range were rated as more accented than those with anarrower pitch range, B = .15, SE = .06, X2(1) = 5.85, p = .016.

For fluency, higher ratings—indicating a perception of less fluent speech—could be predicted by three of the five factors: phonetics/fluency, phonol-ogy/syntax/pauses, and grammar. Higher fluency ratings could be predicted bythe combined factor that included phonetic errors, filled pauses, and speech rate,B = 1.92, SE = .10, X2(1) = 350.77, p < .001. The combined factors of phonol-ogy/syntax/unfilled pauses, B = .10, SE = .03, X2(1) = 10.51, p = .001, andstress/lexical errors and corrections/repetitions, B = 7.69, SE = .99, X2(1) =60.80, p < .001, predicted higher fluency ratings.

Finally, for comprehensibility, all of the factors except for the factor com-posed of phonological and syntactic errors and unfilled pauses predicted lis-teners’ ratings. Samples with more phonetic, lexical, morphological, and stresserrors, those that were spoken more slowly, and those with fewer explicit



Table 7 Factors predicting ratings assigned to English–German L2 learner speech

Accentedness Fluency Comprehensibility

Phonetics/ Fluency ∗∗∗ ∗∗∗ ∗∗∗

Phonology /Syntax / Pauses ∗ ∗∗ nsGrammar ∗∗∗ ∗∗∗ ∗∗∗

Articulation Rate / Morphology ns ns ∗∗∗

Pitch Range ∗ ns ∗∗∗

Note. ∗p < .05; ∗∗p < .01; ∗∗∗p < .001.

disfluency markers were rated as less comprehensible. The phonetics/fluencyfactor was a significant predictor of higher comprehensibility ratings, B = 1.03,SE = .19, X2(1) = 28.42, p < .001, and the same was true of the grammar fac-tor, B = 6.09, SE = 1.53, X2(1) = 15.76, p < .001. Higher comprehensibilityratings—indicating that samples were more difficult to understand— were alsopredicted by the factor that included morphological errors and articulation rate,B = .017, SE = .0004, X2(1) = 15.78, p < .001. Speech samples with a widerpitch range were perceived to be more comprehensible, B = −.20, SE = .05,X2(1) = 16.75, p < .001. A summary of these findings is provided in Table 7.

In summary, listeners were affected by corrections and repetitions in theirratings of speech produced by both native and nonnative speakers. These playeda role in the ratings assigned along all three continua for both native andnonnative speakers. That is, listeners rated speech—regardless of whether itwas produced by native or nonnative speakers— containing more correctionsand repetitions as being more accented, less fluent, and less comprehensible.It is the combined factor of corrections/repetitions and morphological errorsalone that predicted ratings assigned to speech produced by native speakersalong all three continua. Two combined factors predicted ratings along all threecontinua assigned to all speech produced by L2 learners. They were the factorthat includes phonetic realizations, filled pauses, and speech rate along withthe factor comprising stress assignment, corrections/repetitions, and lexicalerrors. The extent to which listeners relied upon various cues in the speechof native and nonnative speakers did vary, indicating that they used differentcriteria in their assessment of native versus nonnative speech. In addition, thefactors that predicted the ratings along the continua of accentedness, fluency,and comprehensibility differed, suggesting that the L2 learners distinguishedamong the three constructs.



Listener EffectsThe listener characteristics associated with L2 learners’ ratings of speech sam-ples produced by native speakers varied depending on the continuum alongwhich speech was rated. The listener characteristics that significantly predictedlisteners’ accent ratings assigned to native speech included musical training,self-reported—but not measured— proficiency, and experience with nonnativespeakers of German speaking German. The L2 learners without musical train-ing rated the speech of native speakers to be more accented than those whoreported having received musical training, B = −1.08, SE = .53, X2(1) = 4.11,p = .04. The L2 learners who reported higher levels of German proficiencyrated the speech of German native speakers to be less accented than did thosewho reported lower levels of proficiency, B = −1.9, SE = .73, X2(1) = 6.78, p= .009. In addition, those L2 learners who reported spending less time speak-ing German with nonnative speakers of German also assigned less nativelikeaccent ratings to the speech of native speakers, B = −.70, SE = .28, X2(1) =6.13, p = .013. The listener characteristics that predicted the fluency ratingsassigned by L2 learners to native speech were self-rated proficiency and com-pletion of a linguistics course. That is, L2 learners who assessed their Germanproficiency to be higher rated the speech of native speakers to be more fluentthan those who reported lower proficiency levels, B = −.90, SE = .40, X2(1) =5.14, p = .023. Listeners who reported having taken a linguistics course alsorated the speech of native speakers to be more fluent than those who did not,B = −.73, SE = .25, X2(1) = 8.46, p = .004. Finally, self-reported proficiencyand interactions with nonnative speakers of English in English were significantpredictors of comprehensibility ratings of native speech. L2 learners who ratedtheir proficiency in German as being higher rated native speech as more com-prehensible than those who reported lower German proficiency, B = −1.15,SE = .54, X2(1) = 4.48, p = .034. On the other hand, those L2 learners whospent more time with nonnative speakers of English speaking English rated thespeech of native German speakers to be less comprehensible than those whoreported spending less time speaking English with nonnative speakers, B =.45, SE = .21, X2(1) = 4.39, p = .036.

The listener characteristics associated with the ratings of L2 speech differfrom those associated with the ratings of native speech. Accent ratings wereassociated with the number of languages the English–German L2 learners spokeand enrollment in a linguistics course. Listeners who spoke more languagesrated the speech of fellow L2 learners to be more accented than those whospoke fewer languages, B = .34, SE = .15, X2(1) = 5.12, p = .024. Participantswho had enrolled in a linguistics course rated the speech of fellow L2 learners



to be less accented than those who had not, B = −1.58, SE = .27, X2(1) =34.63, p < .001. The only listener characteristic associated with fluency ratingswas enrollment in a linguistics course, such that participants who had taken alinguistics course rated the speech samples of L2 learners to be more fluentthan those who had not taken one, B = −.80, SE = .24, X2(1) = 10.86, p =.001. Three listener characteristics predicted comprehensibility ratings. Theyincluded number of languages spoken, proficiency score, and enrollment ina linguistics course. Participants who spoke more languages rated the speechof fellow L2 learners as less comprehensible than those who spoke fewerlanguages, B = .47, SE = .11, X2(1) = 18.44, p < .001. L2 learners whoreceived higher scores on the proficiency test rated the speech samples producedby fellow L2 learners to be more comprehensible than those who received lowerscores, B = −.17, SE = .06, X2(1) = 6.66, p = .010. Finally, participants whohad enrolled in a linguistics course rated the speech samples of L2 learners tobe more comprehensible than those who had not, B = −1.18, SE = .34, X2(1) =12.30, p < .001. A summary of these results is provided in Table 8.

Taken together, the results of the analyses investigating the role of variouslistener characteristics indicated two findings. First, L2 learners who assessedtheir German proficiency to be higher rated the speech of German nativespeakers as less accented, more fluent, and more comprehensible. Second,those L2 learners who had taken a linguistics course rated the speech samplesof fellow L2 learners to be less accented, more fluent, and more comprehensiblethan those who had not ever taken a linguistics course.

Discussion

The L2 learners in the current study demonstrated (a) their ability to distin-guish native from nonnative speech according to a number of speech streamcharacteristics and (b) their capacity to apply the constructs of accentedness,fluency, and comprehensibility differentially when assessing speech samples.As such, the results support previous research that has highlighted the ability ofL2 learners to effectively rate the speech of fellow L2 learners and distinguish itfrom that of native speakers. As is evident in a comparison of the factor loadingsin Tables 4 and 6, listeners relied upon different speech stream characteristicsin assigning ratings to native versus nonnative speech.

AccentednessLearners of L2 German who rated accentedness found the speech of fellowL2 learners to be more accented than that of native speakers, which supports



Tab

le8

Lis

tene

rch

arac

teri

stic

spr

edic

ting

rati

ngs

ofsp

eech

prod

uced

byG

erm

anna

tive

spea

kers

and

byE

ngli

sh–G

erm

anL

2le

arne

rs

Ger

man

nativ

esp

eech

Eng

lish

-Ger

man

L2

spee

ch

Rat

erC

hara

cter

isti

csA

ccen

tF

luen

cyC

ompr

ehen

sibi

lity

Acc

ent

Flu

ency

Com

preh

ensi

bili

ty

Age

ofL

earn

ing

nsns

nsns

nsns

Len

gth

ofG

erm

anS

tudy

nsns

nsns

nsns

Pro

fici

ency

Sco

rens

nsns

nsns

∗

Sel

f-as

sess

edG

erm

anP

rofi

cien

cy∗∗

∗∗

nsns

nsA

ddit

iona

lLan

guag

eS

poke

nns

nsns

∗ns

∗∗∗

Inte

ract

ions

wit

hN

onna

tive

Spe

aker

sof

Eng

lish

nsns

∗ns

nsns

Inte

ract

ions

wit

hN

onna

tive

Spe

aker

sof

Ger

man

∗ns

nsns

nsns

Lin

guis

tic

Tra

inin

gns

∗∗ns

∗∗∗

∗∗∗

∗∗∗

Mus

ical

Tra

inin

g∗

nsns

nsns

ns

Not

e.∗ p

<.0

5;∗∗

p<

.01;

∗∗∗ p

<.0

01.



previous findings showing that L2 learners are able to detect accents (e.g.,Major, 2007; Neufeld, 1980). A more surprising result is that the listeners inthe current study rated the speech of native speakers to be accented. Althoughit is not obvious why L2 learners rated some native speakers as speaking withan accent, the most plausible cause may have to do with experience with thetarget language and its speakers. The realities of the university in which theseparticipants were studying were such that they may have never interacted witha native speaker of German in German. For example, most of the program’slanguage courses are taught by nonnative speakers, and there is no study abroadrequirement. These participants may not have had enough exposure to nativespeakers to know whether variation from classroom norms was due to foreignaccent or whether the variability was merely due to other factors like a speaker’sregion of origin.

A combined factor of morphological errors and corrections or repetitionspredicted higher accentedness ratings. While it is difficult to determine therole of morphological errors in listeners’ assessments, given their relative in-frequency in the speech of native speakers in the current study (see Table 1),it is likely that the listeners were affected more by the corrections and repeti-tions. These may have similar effects on listeners as other disfluencies, whichhave been associated with higher accentedness ratings in previous studies (e.g.,Kang, 2010).

The L2 learners in the current study relied on a number of factors whenthey determined that the speech of fellow L2 learners was accented. As inprevious studies, a number of stream characteristics including speech rate,phonetic errors, filled and unfilled pauses, phonological category errors, lex-ical stress errors, and pitch range affected the accentedness ratings assignedto L2 speech (e.g., Calla McDermott, 1986; Chakraborty & Goffman, 2011;Ducate & Lomicka, 2009; Kang, 2010; Kang et al., 2010; Moyer, 2011; Munro,1995; O’Brien, 2004; Riggenbach, 1991, 2000; Trofimovich & Baker, 2006).Unlike in previous studies, however, the accentedness ratings assigned in thecurrent study were also predicted by lexical and syntactic (but not morpho-logical) errors. While it is difficult to determine the cause, it may be thatgrammatical errors were part of a holistic impression associated with accentedspeech.

The listener characteristics associated with the ratings assigned to the nativespeech included musical training, self-assessed ratings of German proficiency,and extent of interactions with nonnative speakers of German speaking German.The findings regarding musical training may at first seem to stand counter tothose of Isaacs and Trofimovich (2011), who found that musically trained



listeners rated L2 English speech as more accented than did untrained nativelisteners. The difference between the finding in the current study and that ofIsaacs and Trofimovich (2011) is that, in the latter, musically trained listenersonly rated the speech of L2 learners. Thus, the current finding that listeners whoreported musical training rated the speech of native speakers to be less accentedis quite similar: The listeners showed the ability to distinguish between speechsamples produced by native and nonnative speakers. That L2 learners in thecurrent study who reported more advanced L2 proficiency also rated the speechsamples of native speakers to be less accented aligns well with the finding ofWilkerson (2010), who showed that the ability of L2 learners to detect accentsimproves with increased L2 proficiency. Finally, listener experience playedan important role in the accentedness ratings in the current study. Namely,participants who reported interacting more with nonnative speakers of Germanspeaking German rated the speech of native speakers to be less accented.Although no previous study has reported a similar finding, experience hasbeen reported to affect accentedness ratings in previous research (e.g., Isaacs& Thomson, 2013; Kennedy & Trofimovich, 2008): L2 learners who spendmore time listening to the speech of other L2 learners may be better able todistinguish it from native speech and therefore rate native speech to be lessaccented.

The accentedness ratings assigned to fellow L2 learners in the current studywere also related to experience: English–German L2 learners who had enrolledin at least one linguistics course rated the speech of fellow L2 learners as beingless accented, and those who spoke more languages rated the L2 speech samplesto be more accented. Taken together, these findings may partially contradict theresults obtained by Thompson (1991), who found that linguistically trainednative speaker listeners who spoke at least one foreign language fluently ratedthe English speech of L2 learners to be less accented. The difference may bedue to two possible factors. The first is that the raters in Thompson’s studywere rating L2 speakers of their native language and raters in the current studywere rating fellow L2 learners speaking their L2 German. Second, Thompsondid not independently investigate the respective roles of linguistic trainingand L2 experience, which may involve two different types of knowledge. Therole of experience with additional languages may play an independent role inaccentedness ratings.

FluencyThe English–German L2 learners in the current study were able to distin-guish native from nonnative speech in terms of perceived fluency of the speech



samples. This is not surprising, given that the speech samples of native andnonnative speakers differed significantly in terms of all of the common disflu-ency markers associated with a narrow definition of perceived fluency. Self-corrections and repetitions, filled and unfilled pauses, and speech rate all pre-dicted the assigned ratings, in line with previous research (e.g., de Jong et al.,2013; Derwing et al., 2004; Lennon, 1990; Riggenbach, 1991; Rossiter, 2009).

The ratings assigned to speech samples produced by native speakers inthe current study were not only predicted by most of the common disfluencymarkers (excluding unfilled pauses), but also by additional factors that includedlexical, syntactic, and morphological errors. Because native speakers producedso few of these errors, however, it is difficult to determine their effect. Nonethe-less, this finding is not unrelated to the result regarding the predictors of fluencyratings assigned to the speech samples of fellow English–German L2 speakers.In addition to being predicted by the factors associated with fluency of speech(i.e., speech rate, unfilled and filled pauses, self-corrections, and repetitions),the fluency ratings assigned to L2 learners’ speech samples were also predictedby grammatical (lexical and syntactic) and pronunciation errors (phonetic re-alization and stress errors). Thus, in rating the speech samples produced byboth native and nonnative speakers, the listeners in the current study reliedon speech stream characteristics that tend to align with the broad definition offluency that equates the term with oral proficiency (Lennon, 1990, p. 389). Thisis in spite of the fact that the raters participated in the familiarization sessionin which the researcher clearly defined fluency as the “fluidity and flow ofspeech” and pointed out common disfluencies in three speech samples beforethe experiment began. That listeners applied this broad proficiency-orienteddefinition of fluency as they completed their ratings is further supported by thefinding that they assigned numerically the lowest of the three ratings (indicatingthe highest assessment: 1 = extremely fluent) to native speakers for fluency,thereby indicating that the L2 learners evaluated the speech of native speakersas being highly proficient.

Listeners in the current study who reported higher levels of L2 proficiencyas well as those who reported linguistic training rated the speech samples ofnative speakers to be more fluent than those who reported lower levels of L2proficiency or those who had not enrolled in linguistics courses. In addition,listeners who had taken a linguistics course rated the speech samples of fel-low L2 learners to be more fluent than those who had not. Insight into allof these findings may come from Prefontaine (2013), whose most advancedEnglish–French L2 learners’ self-assigned fluency ratings were also best corre-lated with speech sample characteristics and native speaker ratings of fluency.



Although Prefontaine does not couch her findings in these terms, there is someevidence that the advanced learners in her study also exhibited greater levels ofmetalinguistic awareness that enabled them to classify disfluencies better thanthose who were less advanced. Linguistically trained individuals and those whoprovide higher self-assessments of their proficiency in German may possess asimilarly high level of metalinguistic awareness.

ComprehensibilityListeners in the current study assigned the lowest ratings to fellow L2 learnersfor comprehensibility (1 = very easy to understand), indicating that the speechsamples were more comprehensible than they were fluent or nativelike in theirpronunciation. As such, the listeners indicated an ability to understand thespeech samples of fellow L2 learners, in spite of their being accented anddisfluent, thus supporting similar claims made previously (e.g., Derwing &Munro, 2009). Nonetheless, the L2 learners distinguished the samples producedby native speakers from those of the L2 learners along the comprehensibilitycontinuum, thereby supporting the results of Munro et al. (2006). The speechsample characteristics associated with the comprehensibility ratings assignedto native speech were predicted by the combined speech rate factor, the factorcomposed of morphological errors and corrections/repetitions as well as bythe factor consisting of filled pauses and lexical and syntactic errors. Thus,listeners rated those speech samples that deviated from what they had learnedas the classroom standard as being less comprehensible.

When they rated the comprehensibility of fellow L2 learners, the listenersin the current study relied on different factors. The only factor not related tocomprehensibility ratings assigned to the speech of L2 learners was the com-bined factor of phonological and syntactic errors and unfilled pauses. The factthat morphological, lexical, and stress assignment errors also played a rolein comprehensibility assessments supports the findings of Isaacs and Trofi-movich (2011) and Varonis and Gass (1982), who found that grammaticalerrors were linked to comprehensibility judgments. Caspers (2010) and Isaacsand Trofimovich also found that stress assignment errors negatively affectedcomprehensibility judgments. These findings are thus in line with previousresearch that has made use of native speaker raters assessing the speech of L2learners.

Similar to the raters in Kang (2010), listeners in the current study rated asmore comprehensible the speech of fellow L2 learners that was faster. Speechsamples containing fewer infelicitous phonetic realizations were rated as morecomprehensible, thereby supporting the findings of a number of studies (e.g.,



Braun, Lemhofer, & Mani, 2011; Caspers, 2010; Hahn, 2004). One unexpectedfinding has to do with the contribution of pitch range. The speech of the L2learners in this study who produced utterances with a wider pitch range wasperceived to be more comprehensible. To date, there appears to be no studythat has reported a similar finding, but this may have to do with the languagepairing in question. Jilka (2007) reported that American English–German L2learners produce their German utterances with a wider pitch range than nativeGerman speakers. While the results of the current study indicate that thismay have a negative influence on accentedness ratings, this wider range mayprovide a positive contribution in terms of comprehensibility. Finally, listenersin this study rated speech samples—whether produced by L2 learners or nativespeakers—containing fewer corrections and repetitions and filled pauses asmore comprehensible. Although this finding has not been reported explicitly inprevious studies, it may be that the effort required to process speech sampleswith disfluencies may affect perceived comprehensibility.

The listener characteristics associated with comprehensibility ratings as-signed to native speech included self-assessed proficiency in German and fre-quency of interactions with nonnative speakers of English speaking English.Those listener characteristics associated with the ratings assigned to fellow L2learners’ speech samples included number of languages spoken, proficiencytest score, and enrollment in a linguistics course. As was the case in the rat-ings assigned to native speech samples along the accent and fluency continua,participants who rated themselves to be more proficient in German were morelikely to perceive the speech of native speakers as being more comprehensible,and those who received a higher score on the proficiency test as well as thosewho had taken a linguistics course rated the speech of fellow L2 learners tobe more comprehensible. These more proficient learners and those who hadlinguistics training may be more metalinguistically aware, and they may there-fore be better able to distinguish salient speech stream characteristics that mayresult in a foreign accent from those that affect the message being transmitted.

Two somewhat surprising findings were that listeners who reported spendingmore time with nonnative speakers of English speaking English rated the speechsamples of native speakers of German as less comprehensible, and those whoreported speaking more languages rated the speech of fellow L2 learners as lesscomprehensible. Similar findings have not been reported in previous studies,and one can only conjecture that both may be the result of experience withL2 speech. In the case of ratings assigned to native German speech, thoseparticipants, none of whom had spent time in a German-speaking country andmany of whom had never interacted with a native German speaker, frequently



interacted with nonnative speakers of a variety of languages speaking English.They may thus have been accustomed to the salient features of nonnative speechand may therefore have found it less difficult to comprehend than the speechproduced by native speakers of German. That L2 learners with more experiencewith a variety of L2s rated the speech samples of fellow L2 learners as being lesscomprehensible aligns with the current study’s findings regarding accentedness.That is to say, these raters rated speech samples to be both more accented andless comprehensible. A heightened sensitivity to linguistic features resultingfrom having learned additional languages along with nonnative rater status maytherefore result in the assignment of less favorable ratings, along the lines ofRossiter’s (2009) finding for fluency.

Conclusion

The current study has addressed the call by Derwing and Munro (2005) andMunro et al. (2006) to make use of L2 learner raters in investigations into theaccentedness, fluency, and comprehensibility of L2 speech. Overall, the listen-ers in the current study distinguished native from nonnative speech samplesand applied relatively distinct criteria as they rated speech samples accordingto each of the three constructs. Nonetheless, they showed a general lack of con-sistency in their ratings—especially those they assigned to fellow L2 learners,as indicated by the ICCs. Although it is unwise to compare these values toCronbach’s alpha values reported in previous studies, the lack of consistencyin the ratings points to the importance of including analyses of speech streamcharacteristics and listener variables in order to make sense of the variabilityin ratings. A general lack of consistency among naıve raters has been reportedin previous studies that make use of native listeners as raters (e.g., Isaacs &Thomson, 2013). Future studies that require raters—regardless of whether theyare native or nonnative speakers—to judge speech samples more than onceshould consider making use of the ICC statistic and GEE analyses in order toaccount for the correlated nature of the outcomes.

Returning to the ratings themselves, it is important to highlight the nu-merical ratings assigned by fellow English–German L2 learners to the speechsamples in the study. The mean comprehensibility rating assigned to the speechof fellow L2 learners was 5 on a 9-point scale, where a rating of 9 indicates thata speech sample was extremely difficult to understand. Although it is the casethat listeners were more lenient in assigning the ratings for comprehensibil-ity, as opposed to those assigned for accentedness, a mean of 5 on the scale is



indicative of a relative lack of understanding of the speech samples. If we dividethe 9-point scale into percentages, a rating of 5 would roughly correspond toan understanding of 45% of the speech to which listeners were exposed. Whilelisteners in the current study most likely actually understood more than 45%of the L2 speech that they heard, it is perhaps problematic that these listenersassigned such high ratings to the speech of fellow L2 learners. It may be thatthey underestimated their ability to understand the accented, disfluent speechproduced by fellow L2 learners. Because this study relied on comprehensibil-ity ratings as opposed to intelligibility measures, we cannot know for certainthe extent to which listeners truly understood the content of the utterances theyrated. It would be wise to make use of an intelligibility measure in future studiesinvestigating lingua franca communication.

The L2 learners in the current study were similar to the L2 learner ratersin Rossiter (2009), who also relied on grammatical and segmental errors whenthey rated the fluency of fellow L2 learners. Given the overextension of theterm fluency (to oral proficiency) in studies such as that one, the results ofthe current study support Kennedy, Foote, and Buss’s (2014) call to rely uponanother term such as “fluidity” or “flow” in studies that require listeners torate “fluency” according to its narrow definition. It may be wise to determinewhether similar groups of listeners differ in their ratings of fluency accordingto the name of the scale (i.e., fluency vs. fluidity or flow).

Given that the L2 learners in the current study also rated the speech of nativespeakers as accented, future studies should consider making use of think-aloudprotocols as in Kennedy et al. or other types of qualitative analyses such as thoseproposed by Isaacs and Thomson (2013), performed during ratings sessions, inorder to determine what listeners think they are listening to as they rate speechsamples. This may be especially helpful in studies that rely on L2 learners asraters.

The results of the current study have important implications for classroompractice. The first is one that is repeatedly uttered by those working in the field oflingua franca communication: Mutual intelligibility among L2 speakers shouldbe the ultimate goal of classroom instruction (e.g., Jenks, 2013). Returningto the speech sample characteristics that most negatively affected the compre-hensibility of L2 speech samples, we are reminded that it was factors relatedto phonetic realizations, speed and fluency of speech, stress assignment, andlexis and syntax that predicted which speech samples produced by L2 learnerswere easiest to understand. Thus, the findings of the current study point toan oft-repeated call made by researchers working in the field of oral languagedevelopment. That is, in addition to focusing on formal aspects of grammar,



classroom speaking practice should concentrate on the following aspects: (a)segments that carry a high functional load (e.g., Munro & Derwing, 2006),(b) global pronunciation (e.g., Derwing, Munro, & Wiebe, 1998), and (c) thedevelopment of oral fluency skills (e.g., Derwing et al., 2008). The value ofattending to these aspects should not be underestimated, given their overallimportance for successful lingua franca communication.

A second implication of the current study also relates to the importanceof comprehensibility in the classroom setting. In everyday interactions, L2learners are confronted with accented speech in their classrooms. Based onthe results of the current study, it is clear that they are both able to distinguishnative speakers from L2 learners and that they find the speech samples of fellowL2 learners to be more understandable than nativelike in their pronunciation.Nonetheless, listeners in the current study rated the speech samples of fellow L2learners as being only moderately comprehensible (i.e., a mean rating of 5 ona 9-point scale). Admittedly, only an intelligibility study in combination with athink-aloud protocol could provide the type of data necessary to determine (a)the extent to which L2 learners truly understand the content of speech samplesand (b) which speech stream characteristics they consciously notice when theyassign their ratings. Nevertheless, the current results point to the importancefor learners themselves to invest in making comprehensibility a central goalin today’s communicative classrooms. Learners have a shared responsibilityfor successful classroom lingua franca communication. As speakers, L2 learn-ers should work to address the needs of their listeners, and as listeners, theyshould clearly indicate when they have not understood the speech of fellowlearners.

Final revised version accepted 5 March 2014

Notes

1 Because the current study is a first step toward determining how L2 learners ofGerman react to the speech of fellow L2 learners, and because the learners wererequired to carry out three separate tasks, this study relies on measures ofcomprehensibility as opposed to those of intelligibility.

2 The score on a proficiency exam such as that used in the current study is noindication of a speaker’s L2 pronunciation. Speakers at the various levels were usedin the hopes that they would exhibit a range of speech sample characteristics in theirL2 German speech.

3 Although there are two segmental differences between the two words, this wascounted as a lexical error because gefahren is a separate lexical item.



4 Counterbalancing in the current study was carried out to ensure that effects of ratingorder were equal. As such, all of the potential rating sequences were carried out anequal number of times.

References

Baker, W., Trofimovich, P., & Mack, M. (2004). Learning second-language intonation:Are children better than adults? In A. Brugos, L. Micciulla, & C. E. Smith (Eds.),BUCLD 28 proceedings (pp. 50–61). Somerville, MA: Cascadilla Press.

Barry, A. S. (2007). The form, function and distribution of high rising intonation inSouthern Californian and Southern British English. Unpublished doctoraldissertation, University of Sheffield, UK.

Boersma, P., & Weenink, D. (2011). Praat: doing phonetics by computer (Version5.2.19) [Computer software]. Retrieved from http://www.praat.org

Bohn, O.-S., & Flege, J. (1992). The production of new and similar vowels by adultGerman learners of English. Studies in Second Language Acquisition, 14, 131–158.

Braun, B., Lemhofer, K., & Mani, N. (2011). Perceiving unstressed vowels inforeign-accented English. Journal of the Acoustical Society of America, 129,376–387.

Calla McDermott, W. L. (1986). The scalability of degrees of foreign accent.Unpublished doctoral dissertation, Cornell University.

Caspers, J. (2010). The influence of erroneous stress position and segmental errors onintelligibility, comprehensibility and foreign accent in Dutch as a second language.Linguistics in the Netherlands, 27, 17–29.

Chakraborty, R., & Goffman, L. (2011). Production of lexical stress in non-nativespeakers of American English: Kinematic correlates of stress and transfer. Journalof Speech, Language, and Hearing Research, 54, 821–835.

Chen, H. (2010). Second language timing patterns and their effects on native listeners’perceptions. Concentric: Studies in Linguistics, 36, 183–212.

Clark, H. H., & Fox Tree, J. E. (2002). Using uh and um in spontaneous speaking.Cognition, 84, 73–111.

Crystal, D. (1987). The Cambridge encyclopedia of language. Cambridge, UK:Cambridge University Press.

Council of Europe. (2001). Common European framework of reference for languages:Learning, teaching, assessment. Cambridge, UK: Cambridge University Press.

Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of secondlanguage learners’ fluency: Comparisons between read and spontaneous speech.Journal of the Acoustical Society of America, 111, 2862–2873.

Daly, N., & Warren, P. (2001). Pitching it differently in New Zealand English. Journalof Sociolinguistics, 5, 85–96.



de Jong, N., & Perfetti, C. A. (2011). Fluency training in the ESL classroom: Anexperimental study of fluency development and proceduralization. LanguageLearning, 61, 533–568.

de Jong, N. H., Steinel, M. P., Florijn, A., Schoonen, R., & Hulstijn, J. H. (2013).Linguistic skills and speaking fluency in a second language. AppliedPsycholinguistics, 34, 893–916.

de Jong, N. H., & Wempe, T. (2008). Praat Script Syllable Nuclei v2. Retrieved fromhttps://sites.google.com/site/speechrate/Home/praat-script-syllable-nuclei-v2

Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility:Evidence from four L1s. Studies in Second Language Acquisition, 20, 1–16.

Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciationteaching: A research-based approach. TESOL Quarterly, 39, 379–397.

Derwing, T. M., & Munro, M. J. (2009). Comprehensibility as a factor in listenerinteraction preferences: Implications for the workplace. Canadian ModernLanguage Review, 66, 181–202.

Derwing, T. M., & Munro, M. J. (2013). The development of L2 oral language skills intwo L1 groups: A 7-year study. Language Learning, 63, 163–185.

Derwing, T. M., Munro, M. J., & Thomson, R. I. (2008). A longitudinal study of ESLlearners’ fluency and comprehensibility development. Applied Linguistics, 29,359–380.

Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broadframework for pronunciation instruction. Language Learning, 48, 393–410.

Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004). Secondlanguage fluency: Judgment on different tasks. Language Learning, 54, 655–679.

Ducate, L., & Lomicka, L. (2009). Podcasting: An effective tool for honing languagestudents’ pronunciation? Language Learning & Technology, 13(3), 66–86.

Flege, J. E. (1984). The detection of French accent by American listeners. Journal ofthe Acoustical Society of America, 76, 692–707.

Flege, J. E., Frieda, E. M., & Nozawa, T. (1997). Amount of native-language (L1) useaffects the pronunciation of an L2. Journal of Phonetics, 25, 169–186.

Freed, B. F. (1995). What makes us think that students who study abroad becomefluent? In B. F. Freed (Ed.), Second language acquisition in a study abroad context(pp. 123–148). Amsterdam: John Benjamins.

Goethe-Institut. 2004. Einstufungstest [Placement test]. Retrieved from:http://www.goethe.de/cgi-bin/einstufungstest/einstufungstest.pl

Gut, U. (2009). Non-native speech: A corpus-based analysis of phonological andphonetic properties of L2 English and German. Frankfurt, Germany: Peter Lang.

Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate theteaching of suprasegmentals. TESOL Quarterly, 38, 201–223.

Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, andjudgments of L2 pronunciation: Revisiting research conventions. LanguageAssessment Quarterly, 10, 135–159.



Isaacs, T., & Trofimovich, P. (2011). Phonological memory, attention control, andmusical ability: Effects of individual differences on rater judgments of secondlanguage speech. Applied Psycholinguistics, 32, 113–140.

Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels ofsecond language speaking proficiency: How distinct? Applied Linguistics, 29,24–49.

Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciationsyllabus for English and an International Language. Applied Linguistics, 23,83–103.

Jenkins, J. (2009). English as a lingua franca: Interpretations and attitudes. WorldEnglishes, 28, 200–207.

Jenkins, J. (2012). English as a Lingua Franca from the classroom to the classroom.ELT Journal, 66, 486–494.

Jenks, C. (2013). “Your pronunciation and your accent is very excellent”: Orientationsof identity during compliment sequences in English as a lingua franca encounters.Language and Intercultural Communication, 13, 165–181.

Jilka, M. (2007). Different manifestations and perceptions of foreign accent inintonation. In J. Trouvain & U. Gut (Eds.), Non-native prosody: Phonetic descriptionand teaching practice (pp. 77–96). Berlin, Germany: Mouton de Gruyter.

Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2comprehensibility and accentedness. System, 38, 301–315.

Kang, O. (2012). Impact of rater characteristics and prosodic features of speakeraccentedness on ratings of international teaching assistants’ oral performance.Language Assessment Quarterly, 9, 249–269.

Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures ofaccentedness and judgments of language learner proficiency in oral English.Modern Language Journal, 94, 554–566.

Kennedy, S., Foote, J., & Buss, L. (2014). L2 speakers at university: Longitudinaldevelopment and rater behaviour. Manuscript submitted for publication.

Kennedy, S., & Trofimovich, P. (2008). Intelligibility, comprehensibility andaccentedness of L2 speech: The role of listener experience and semantic context.Canadian Modern Language Review, 64, 459–489.

Kormos, J., & Denes, M. (2004). Exploring measures and perceptions of fluency in thespeech of second language learners. System, 32, 145–164.

Lennon, P. (1990). Investigating fluency in EFL: A quantitative approach. LanguageLearning, 40, 387–417.

Levis, J. (2005). Changing contexts and shifting paradigms in pronunciation teaching.TESOL Quarterly, 39, 369–377.

Lindemann, S., & Subtirelu, N. (2013). Reliably biased: The role of listener expectationin the perception of second language speech. Language Learning, 63, 567–594.

Major, R. C. (1987). Phonological similarity, markedness, and rate of L2 acquisition.Studies in Second Language Acquisition, 9, 63–82.



Major, R. C. (2007). Identifying a foreign accent in an unfamiliar language. Studies inSecond Language Acquisition, 29, 539–556.

Molnar, H. (2010). The influence of the age factor on pronunciation competence in asecond language: Results of a pilot study with learners of German as a secondlanguage. [Der Einfluss des Faktors Alter auf die Aussprachekompetenz in der L2:Ergebnisse einer Pilotstudie mit DaZ-Lernern]. Zeitschrift fur InterkulturellenFremdsprachenunterricht, 15, 42–60.

Matsumoto, Y. (2011). Successful ELF communications and implications for ELT:Sequential analysis of ELF pronunciation negotiation strategies. Modern LanguageJournal, 95, 97–114.

Moyer, A. (1999). Ultimate attainment in L2 phonology. Studies in Second LanguageAcquisition, 21, 81–108.

Moyer, A. (2011). An investigation of experience in L2 phonology: Does qualitymatter more than quantity? Canadian Modern Language Review, 67, 191–216.

Munro, M. (1995). Nonsegmental factors in foreign accent: Rating of foreign speech.Studies in Second Language Acquisition, 17, 17–33.

Munro, M., & Derwing, T. (1995a). Foreign accent, comprehensibility, andintelligibility in the speech of second language learners. Language Learning, 45,73–97.

Munro, M., & Derwing, T. (1995b). Processing time, accent, and comprehensibility inthe perception of native and foreign-accented speech. Language and Speech, 38,289–306.

Munro, M. J., & Derwing, T. M. (1998). The effects of speaking rate on listenerevaluations of native and foreign-accented speech. Language Learning, 48,159–182.

Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESLpronunciation instruction: An exploratory study. System, 34, 520–531.

Munro, M. J., Derwing, T. M., & Burgess, C. S. (2010). Detection of nonnative speakerstatus from content-masked speech. Speech Communication, 52, 626–637.

Munro, M. J., Derwing, T. M., & Morton, S. L. (2006). The mutual intelligibility of L2speech. Studies in Second Language Acquisition, 28, 111–131.

Neufeld, G. G. (1980). On the adult’s ability to acquire phonology. TESOL Quarterly,14, 255–298.

O’Brien, M. G. (2004). Pronunciation matters. Die Unterrichtspraxis, 37, 1–9.Prefontaine, Y. (2013). Perceptions of French fluency in second language speech

production. Canadian Modern Language Review, 69, 324–348.Riggenbach, H. (1991). Towards an understanding of fluency: A microanalysis of

nonnative speaker conversations. Discourse Processes, 14, 423–441.Riggenbach, H. (2000). Perspectives on fluency. Ann Arbor: University of Michigan

Press.Rossiter, M. J. (2009). Perceptions of L2 fluency by native and non-native speakers of

English. Canadian Modern Language Review, 65, 395–412.



Shek, D. T. L., & Ma, C. M. S. (2011). Longitudinal data analyses using linear mixedmodels in SPSS: Concepts, procedures and illustrations. The Scientific WorldJournal, 11, 42–76.

Thompson, I. (1991). Foreign accents revisited: The English pronunciation of Russianimmigrants. Language Learning, 41, 177–203.

Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals:Effect of L2 experience on prosody and fluency characteristics of L2 speech.Studies in Second Language Acquisition, 28, 1–30.

Trofimovich, P., & Isaacs, T. (2012). Disentangling accent from comprehensibility.Bilingualism: Language and Cognition, 15, 905–916.

Varonis, E. M., & Gass, S. (1982). The comprehensibility of non-native speech.Studies in Second Language Acquisition, 4, 114–136.

Wilkerson, M. (2010). Identifying accent in German: A comparison of native andnon-native listeners. Die Unterrichtspraxis, 43, 144–153.

Wilkerson, M. (2013). The sound of German: Descriptions of accent by native andnon-native listeners. Die Unterrichtspraxis, 46, 106–118.


l2 learners’ assessments of accentedness, fluency, and comprehensibility of native and nonnative...

Documents