römer: a corpus perspective on the development of verb … · 2019-09-04 · of the verb in a...

23
A corpus perspective on the development of verb constructions in second language learners Ute Römer Georgia State University is article reports initial findings from a study that uses written data from second language (L2) learners of English at different proficiency levels (CEFR A1 to C1) in a large-scale investigation of verb-argument construc- tion (VAC) emergence. e findings provide insights into first VACs in L2 learner production, changes in the learners’ VAC repertoire from low to high proficiency levels, and changes in learners’ dominant verb-VAC associ- ations from low to high proficiency levels. e article also addresses the question what role formulaic sequences play in the L2 acquisition of VACs. Data analyses indicate that, from lowest to highest proficiency levels, the VAC repertoire of L2 English learners shows an increase in VAC types, growth in VAC productivity and complexity, and a development from pre- dominantly fixed sequences to more flexible and productive ones. e find- ings help to expand our understanding of the processes that underlie construction acquisition in an L2 context. Keywords: verb-argument constructions, learner corpus, second language acquisition, proficiency development, German learners 1. Introduction Research which has studied language acquisition from a usage-based perspective has demonstrated that we acquire language by learning constructions. is applies to both first language (L1) learners (Ambridge & Lieven, 2015; Goldberg et al., 2004; Tomasello, 2003) as well as second language (L2) learners (Ellis, 2002, 2003; Ellis & Cadierno, 2009; Ellis et al., 2013; Ellis et al., 2016). Constructions have been described as the building blocks of language, and defined as conventionalized pairings of form and meaning that are entrenched in the speaker’s mind (Bybee, 2010; Goldberg, 1995, 2003, 2006; Trousdale & Hoffmann, 2013). Constructions https://doi.org/10.1075/ijcl.00013.roe International Journal of Corpus Linguistics 24:3 (2019), pp. 270–292. issn 1384-6655 | eissn 1569-9811 © John Benjamins Publishing Company

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

A corpus perspective on the developmentof verb constructions in secondlanguage learners

Ute RömerGeorgia State University

This article reports initial findings from a study that uses written data fromsecond language (L2) learners of English at different proficiency levels(CEFR A1 to C1) in a large-scale investigation of verb-argument construc-tion (VAC) emergence. The findings provide insights into first VACs in L2learner production, changes in the learners’ VAC repertoire from low tohigh proficiency levels, and changes in learners’ dominant verb-VAC associ-ations from low to high proficiency levels. The article also addresses thequestion what role formulaic sequences play in the L2 acquisition of VACs.Data analyses indicate that, from lowest to highest proficiency levels, theVAC repertoire of L2 English learners shows an increase in VAC types,growth in VAC productivity and complexity, and a development from pre-dominantly fixed sequences to more flexible and productive ones. The find-ings help to expand our understanding of the processes that underlieconstruction acquisition in an L2 context.

Keywords: verb-argument constructions, learner corpus, second languageacquisition, proficiency development, German learners

1. Introduction

Research which has studied language acquisition from a usage-based perspectivehas demonstrated that we acquire language by learning constructions. This appliesto both first language (L1) learners (Ambridge & Lieven, 2015; Goldberg et al.,2004; Tomasello, 2003) as well as second language (L2) learners (Ellis, 2002, 2003;Ellis & Cadierno, 2009; Ellis et al., 2013; Ellis et al., 2016). Constructions have beendescribed as the building blocks of language, and defined as conventionalizedpairings of form and meaning that are entrenched in the speaker’s mind (Bybee,2010; Goldberg, 1995, 2003, 2006; Trousdale & Hoffmann, 2013). Constructions

https://doi.org/10.1075/ijcl.00013.roeInternational Journal of Corpus Linguistics 24:3 (2019), pp. 270–292. issn 1384-6655 | e‑issn 1569-9811© John Benjamins Publishing Company

Page 2: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

exist at various levels of complexity which means that not only morphemes andwords but also combinations of words and abstract syntactic frames carry mean-ing as a whole (Goldberg, 1995). Since they form the core of sentences, verb-argument constructions (VACs) constitute a particularly important set among thethousands of constructions that a speaker knows. Examples of VACs include the“V n” or transitive construction (consisting of a verb, the central element, followedby a noun or noun phrase) as in she ate a piece of chocolate, and the “V across n”construction (a verb, followed by the preposition across, followed by a noun ornoun phrase) as in I swam across the entire lake). Both constructions constituteform-meaning pairings: in the case of “V n” one thing acts upon another, while “Vacross n” captures a directed motion event.

English L1 acquisition studies based on dense longitudinal corpora have con-tributed immensely to our better understanding of VAC development in child lan-guage (Behrens, 2009; Lieven et al., 1997; Ninio, 1999, 2006; Perfors et al., 2010;Tomasello, 1992, 2003). They have shown how children pick up on recurring pat-terns in the input they receive, how they use this input to build a mental construc-ticon, and how “[l]anguage structure emerges from language use” (Tomasello,2003: 327). Early simple, item-based (and often fixed and formulaic) constructionslike see baby and go away are followed by later more complex (and more variableand abstract) ones such as X broke Y and she’s V-ing it (Tomasello, 2003). Alsolooking at child language production, Goldberg et al. (2004) showed, based on asmall sample of VACs, that there is a strong tendency for one verb in each VAC tooccur with particularly high frequency, and that the distributional profile of verbsin VACs produced by L1 learners largely mirrors the profile found in the caretak-ers’ speech (i.e. the input the child receives). Similar distributions and input effectshave been described for L2 VAC acquisition as well (Ellis & Ferreira-Junior, 2009;Ellis & Larsen-Freeman, 2009). L2 VAC acquisition studies also indicate that alearner’s inventory of constructions starts from a small set of fixed sequencesbefore growing in complexity, abstractness, and productivity (Eskildsen, 2009;Eskildsen, & Cadierno, 2007; Li et al., 2014). Usage of and exposure to languageshape acquisition.

Compared to L1 acquisition, our understanding of the VAC emergence in L2English learners is still rather limited. Existing corpus-based studies of L2 develop-ment which explicitly focus on constructions have been based on data produced bysmall numbers of learners, and have focused on small sets of construction types.Eskildsen (2009), for instance, used a longitudinal corpus of oral language pro-duced in classroom interactions by one Spanish L1 learner of English to study theuse of the modal verb can and the emergence of its constructions. The corpus cov-ered a time span of about three and a half years. Eskildsen found that, in the learnerhe focused on, “can – patterns become increasingly varied” over time but did not

Development of verb constructions in second language learners 271

uroemer
Sticky Note
Would it be possible to change to running head to "The development of verb constructions in second language learners"?
Page 3: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

find evidence in his data for a “movement towards fully abstract constructions”(Eskildsen, 2009:350). In a study of negation constructions in longitudinal speechdata from the same learner, Eskildsen & Cadierno (2007) observed a similar trendof emergence of increasingly abstract patterns from the initially dominant fixed Idon’t know (see also Eskildsen, 2012). Again focusing on the same longitudinal L2dataset, Li et al. (2014) examined developing realizations of motion constructionsaround verbs such as come and go. They found that the learner’s inventory of VACsthat express directed motion grew from a small set of fixed constructions to a largerset of more productive ones. They also found that frequent motion verbs are used inmore varied expressions while other less commonly used verbs occur in rather lim-ited contexts. In a follow-up study by the same group of authors, data from four L2English learners (two L1 Spanish, two L1 Chinese) was used to trace changes in thelearners’ inventory of motion constructions over the span of three years (Eskildsenet al., 2015). Findings from this study confirm trends reported in Li et al. (2014) andhighlight differences between the Chinese and Spanish learners that are likely dueto crosslinguistic transfer from the learners’ L1s. Working with longitudinal spo-ken learner data from the European Science Foundation corpus, Ellis & Ferreira-Junior (2009) focus on how three different VAC types (verb locative, verb objectlocative, verb object object) develop in seven ESL learners (L1s Italian and Pun-jabi). The authors found that learners first predominantly used semantically gen-eral, high-frequency verbs in each VAC (e.g. give, go, put) before expanding theirrepertoire to more specific, lower-frequency verbs (e.g. explain, turn, drop).

Given the limitations of these studies in terms of scale (based on small corpora)and scope (small numbers of learners and VACs covered), we need larger-scalecorpus-based studies which may provide us with more representative evidence onL2 VAC development. The present article reports on such a study. This study usesmethods from Corpus Linguistics and Natural Language Processing (NLP) to con-siderably expand the scope of existing usage-based studies of L2 VAC acquisitionto hundreds of constructions. It studies the patterns of over 700,000 verb tokensextracted from a large corpus of more than 68,000 texts written by L1 German andL1 Spanish learners of English at low-beginner through advanced proficiency lev-els. Unlike earlier learner corpora which often somewhat arbitrarily operationalizeproficiency based on students’ year of study, the corpus used here has the advantageof being calibrated by learner proficiency level. It uses the CEFR (Common Euro-pean Framework of Reference for Languages; Council of Europe, 2001; Hawkins &Buttery, 2010) as a standardizing measure, which can be considered an importantrecent development in learner corpus research (Díez-Bedmar, 2012; Tono & Díez-Bedmar, 2014). The overarching goal of the larger study is to investigate how verb-argument constructions emerge in second language learners of English. Morespecifically, its goals are to identify the VACs that are first produced by beginning

272 Ute Römer

Page 4: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

learners, to trace how learners’ VAC repertoire and their use of verbs in VACs devel-ops from low to high proficiency levels, and to examine what role fixed formulaicsequences play in L2 VAC acquisition. In order to describe both universal trends ofL2 VAC emergence as well as L1 specific aspects and potential crosslinguistic influ-ence, the study includes comparisons of data produced by learners of different L1backgrounds. The larger study is guided by the following five research questions:

RQ 1: What are the first VACs acquired by beginning L2 learners of English?RQ 2: How does the VAC repertoire of learners develop across proficiency lev-

els?RQ 3: How does the distribution of verbs in VACs in learner production develop

across proficiency levels?RQ 4: What role do formulaic sequences play in the L2 acquisition of VACs?RQ 5: Are there significant observable differences in the acquisition of VACs and

verbs in VACs between L1 German and L1 Spanish learners of English?

The present article will report on first results from the study in response to RQs 1through 4 and comment on next steps in the ongoing analysis and evaluation oflearner data.

2. Data and methods

The study is based on data from the Education First-Cambridge Open LanguageDatabase (EFCAMDAT; Alexopoulou et al., 2015; Geertzen et al., 2013), an exten-sive collection of texts written by students completing coursework on Englishtown,Education First’s online school (www.englishtown.com; now English Live) whichenrolls millions of students worldwide. The texts in EFCAMDAT come fromlearners placed into 16 proficiency levels which correspond to the six levels of theCEFR: A1 “beginner” (Englishtown levels 1–3), A2 “elementary” (levels 4–6), B1“intermediate” (levels 7–9), B2 “upper intermediate” (levels 10–12), C1 “advanced”(levels 13–15), and C2 “proficiency” (level 16). At each level, learners respond toeight prompts for a total of 128 tasks. Learners are often asked to write a letter oran email, summarize information provided in a text, or compose a short argumen-tative text in response to a topic prompt (Alexopoulou et al., 2015). Even thoughsome learners move through all 16 levels, thus delivering longitudinal produc-tion data, others drop out before they reach the highest level of proficiency orenter the course at an intermediate or advanced level. EFCAMDAT is hence nei-ther a cross-sectional nor a truly longitudinal corpus but is perhaps best consid-ered “quasilongitudinal” or “pseudolongitudinal” (Jarvis & Pavlenko, 2008). Thefirst release of EFCAMDAT contained about half a million “scripts” (a script is

Development of verb constructions in second language learners 273

Page 5: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

“a writing piece a learner submits as an answer to a writing task”; Alexopoulouet al., 2015: 98) produced by learners from 172 nationalities, making up over 33million words. The database does not include information on each learner’s lan-guage background, but for countries with one dominant official language, learnernationality in EFCAMDAT has been shown to be a reliable proxy for L1 (Alex-opoulou et al., 2015; Murakami, 2013; Nisioi, 2015).

The text selection from EFCAMDAT for the larger study included all scriptsproduced by Englishtown students in Germany (dominant L1 German) and inMexico (dominant L1 Spanish) from proficiency levels A1 through C1 butexcluded texts at level 16 (CEFR C2) because of low numbers in both nationalitysubsets. Overall, our EFCAMDAT subsets consist of over 28,000 texts (2.8 millionwords) produced by German and over 40,000 texts (3.2 million words) producedby Mexican learners. Particularly large numbers of texts were available from learn-ers at the lower levels (A1 and A2) while learner and text numbers are smaller(though still robust) at levels B1 to C1 (see Table 1 for an overview of the EFCAM-DAT subsets relevant for the larger study; the focus in this paper will be on resultsretrieved from L1 German learner data).

Table 1. EFCAMDAT subsets used in this studyLearner group Number of texts Number of learners Number of words

German A1 10,721  2,072  728,275

German A2  8,507  1,580  811,842

German B1  5,222  1,240  631,338

German B2  3,092   926  488,431

German C1   930   202  186,176

German all levels 28,472  5,112 2,846,062

Mexican A1 24,275  4,043 1,533,012

Mexican A2 10,572  1,596 1,012,049

Mexican B1  3,903   808   471,543

Mexican B2  1,158   273   178,907

Mexican C1   186    34    37,225

Mexican all levels 40,094  5,905 3,232,736

Overall 68,566 11,017 6,078,798

Two copies of these EFCAMDAT subsets were saved to be annotated andprocessed in different ways. One copy was used in an exhaustive retrieval of verbsand the constructions they occur in in the learner texts using NLP technology.The retrieval method was modeled on a procedure described in Kyle (2016) and

274 Ute Römer

Page 6: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Kyle & Crossley (2017) and carried out by Kristopher Kyle. All EFCAMDAT textswere first syntactically annotated with the Stanford Neural Network DependencyParser (Chen & Manning, 2014) and then manipulated with a Python script whichidentified the main verbs in each sentence and extracted their direct dependents.Adjustments were made to the Stanford dependency scheme to include copulabe in the verb definition. Different from Kyle (2016), modal auxiliaries and aux-iliaries be and have were included as separate components of a VAC to ensurethat I go home, I have gone home, and I might go home are counted as examplesof three distinct constructions. Other Stanford dependents (advcl, advmod, tmod,neg, discourse; see de Marneffe & Manning, 2008) were not considered argumentsof the verb in a construction and were therefore excluded from the VAC defin-ition used in this study. From these syntactically annotated files, frequency pro-files were compiled for main verb lemmas, VACs, and verb-VAC combinations.For verb-VAC combinations, association strength measures, including Collexemescores (Gries & Stefanowitsch, 2004; Stefanowitsch & Gries, 2003), Delta-P (Allan,1980), and Faithfulness (Ellis et al., 2016), were calculated. In a final step, a VACdatabase (or constructicon) was generated for each of the ten L1-level combina-tions (e.g. German A1, Spanish B2).

The resulting databases make it possible to retrieve level-specific frequency-sorted lists of VACs, of verbs in VACs, and of verb-VAC combinations. TheVAC lists help us to identify which constructions emerge first in L2 learner Eng-lish (RQ 1) and how the VAC repertoire develops across levels (RQ 2). Lists ofverb types and tokens used in a VAC allow use to determine the productivityor predictability of each construction across levels (RQ 3). This was done bycalculating normalized entropy scores for individual VACs at each learner level.Normalized entropy or Hnorm (Kumar et al., 1986) is a measure of how uncertaina probability distribution is. In this study, Hnorm is used to assess the distributionof verbs in a VAC. Hnorm values range from 0 to 1 with values closer to 1 indi-cating a more even distribution which makes it hard to predict what the verbin a new token of a VAC might be. Hnorm values closer to 0 indicate an increas-ingly uneven and more predictable distribution of items (here verbs), potentiallywith one or two types making up the majority of all tokens. Entropy has beenshown to be less sensitive to Zipfian frequency patterns than type-token ratio(Eeg-Olofsson & Altenberg, 1994; Ellis & O’Donnell, 2014; Gries & Ellis, 2015).For the selected VACs, normalized entropy scores were also calculated for the80-million word academic component of the Corpus of Contemporary Amer-ican English (COCA academic; Davies, 2008). COCA academic is referencedas an L1 usage benchmark. VAC indices derived from this corpus have beenshown to be significant predictors of L2 writing quality (Kyle & Crossley, 2017).Frequency-sorted lists of verbs in a VAC are also used in correlation analyses

Development of verb constructions in second language learners 275

Page 7: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

which help us to systematically compare the preferred verb-VAC associations atdifferent learner levels and see how strongly a VAC’s verb distributions overlapor differ across groups (further addressing RQ 3). Pearson correlation coeffi-cients (r) were calculated in R (R Development Core Team, 2017) for each pos-sible comparison (e.g. German A1 vs. German A2). Calculations were based onthe log10 transformations of the verb token frequencies. Verb distributions werevisualized with the help of scatterplots. COCA academic was again used as anL1 usage reference point.

The second copy of the EFCAMDAT file sets was part-of-speech (POS) taggedwith the help of TagAnt (Anthony, 2014b) for use in a verb cluster analysis. The goalof this analysis is to provide insights into the role formulaic sequences play in the L2acquisition of VACs (RQ 4). The study focuses on recurring multi-word sequencesaround the 50 most frequent verbs in EFCAMDAT (all ten subsets combined). Alist of all verb types in the corpus (with token frequencies) was created by searchingfor all words tagged as verbs (VV*, VB*, VH*), followed by extracting a frequency-sorted list of the immediate left-hand collocates of the verb tags (i.e. the actualverbs). Table 2 displays the top 50 verbs ranked in descending order of frequency inEFCAMDAT. Using Collocate (Barlow, 2015), 3-word, 4-word, and 5-word clustersaround all lemma forms of each of these verbs (213 forms altogether) were extractedfrom the EFCAMDAT subsets together with frequency information and informa-tion on cluster association strength (Mutual Information, MI). A total of 6,390searches were carried out (213 verb forms, 3 cluster spans, 10 sub-corpora) andthe resulting cluster lists were saved for manual analysis (still ongoing). EFCAM-DAT concordance searches in AntConc (Anthony, 2014a) complement the manualinspection of verb cluster lists and provide a more contextualized view of the clus-ters, insights into verb semantics, and idiomaticity of use.

Table 2. Fifty most frequent verb lemmas in EFCAMDAT subsets, selected for verbcluster analysis1. be 11. play 21. wear 31. learn 41. talk

2. have 12. take 22. know 32. tell 42. read

3. do 13. think 23. love 33. let 43. call

4. go 14. hope 24. buy 34. write 44. listen

5. like 15. want 25. help 35. speak 45. say

6. see 16. look 26. give 36. use 46. bring

7. live 17. come 27. find 37. thank 47. apply

8. work 18. eat 28. feed 38. walk 48. ask

9. make 19. need 29. start 39. study 49. feel

10. get 20. meet 30. watch 40. wash 50. visit

276 Ute Römer

Page 8: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

3. Results

This section summarizes initial results from the study in response to RQs 1 to 4,listed at the end of Section 1.

3.1 First VACs in German L2 learners of English

The construction databases for the lowest EFCAMDAT proficiency level (A1) thatresulted from the VAC extraction procedure described above allow us access tofrequency-ordered lists of first VACs for both L1 groups (German and Spanish).In this paper, we will only refer to data extracted from the EFCAMDAT subsetsfor L1 German learners of English. Table 3 lists the ten most frequent VACs in theGerman A1 subset together with their raw token frequencies and examples fromlearner texts produced at this level. Three of these ten most frequent early VACs are:

i. copula be constructions: a nominal subject (nsubj), followed by a form of cop-ula be (vcop), followed by a nominal complement (ncomp; rank 1);

ii. nsubj, followed by vcop, followed by an adjectival complement (acomp; rank2); and

iii. existential “there” (expl), followed by vcop, followed by nsubj (rank 6).

Among the top-10 early VACs, there are also:

i. simple transitive constructions (nsubj, followed by a verb (v), followed by adirect object (dobj; rank 3), and v followed by dobj (rank 4));

ii. simple intransitive constructions (nsubj, followed by v, followed by a clausalcomplement (ccomp; rank 7), nsubj, followed by v, followed by an openclausal complement (xcomp), i.e. a clausal complement without its own sub-ject; rank 8), and nsubj followed by v (rank 9)); and

iii. a construction in which the verb is followed by a prepositional phrase startingwith in (prep_in; rank 5).

A longer version of this ranked VAC type list contains additional simple copula beVACs and transitive constructions containing prepositional phrases, for examplensubj, followed by vcop, followed by prep_from (rank 11) and nsubj, followed by v,followed by dobj, followed by prep_at (rank 17). Longer, more complex VACs areeither rare at A1 level or do not occur at all.

Development of verb constructions in second language learners 277

Page 9: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Table 3. Ten most frequent VAC types in EFCAMDAT subset German A1

Rank VACVAC

frequency Example

1 nsubj-vcop-ncomp

7,002 My name is Anna

2 nsubj-vcop-acomp 6,298 I’m happy

3 nsubj-v-dobj 4,952 I have on Wensday sport

4 v-dobj 4,867 Have free time 20:00 pm

5 nsubj-v-prep_in 2,379 They live in Cologne

6 expl-vcop-nsubj 1,307 There are many things near at my house

7 nsubj-v-ccomp 1,251 I think we should buy her some flowers and abook

8 nsubj-v-xcomp 1,235 I go to work at 9 o clock

9 nsubj-v 1,139 You work, right now

10 nsubj-v-dobj-dobj 1,128 I have a bedroom, a small badroom and a kitchen

3.2 Changes in German L2 learners’ English VAC repertoire across levels

Figure 1 plots the VAC type frequencies for L1 German learners at five proficiencylevels (A1 to C1). The numbers are based on lists of all VAC types (no minimumfrequency) in the construction databases created for each level. Since the EFCAM-DAT subsets vary in size, frequencies have been normalized (per 100,000 words).The graph shows a fairly even increase in VAC types from lowest (A1) to highestproficiency level (C1), indicating steady growth in the German learners’ verb con-struction repertoire as their language proficiency increases.

Comparisons of frequency-sorted VAC type lists across levels allow us toidentify specific changes in L1 German learners’ verb construction repertoire.Table 4 shows the ten most frequent VAC types in EFCAMDAT subsets GermanA1 and German B2 side by side. Instead of advanced level (C1), upper-intermediate (B2) data was selected for comparison because the B2 dataset isconsiderably larger than the C1 one. For both datasets, frequencies have been nor-malized (per 100,000 words) to facilitate comparison. Of the ten most frequentVACs, seven are shared between the two levels (highlighted in italics). This is per-haps not surprising given that we are dealing with core elements of verb grammar,including intransitive, transitive, and copula be constructions. For these sharedVACs, however, rank order and normed frequencies differ across levels. The toptwo VACs at level A1 (nsubj-vcop-ncomp and nsubj-vcop-acomp) are compar-atively more frequent at this level than in the B2 list, while the remaining fiveshared VACs are more frequent in the B2 corpus. Learners at the lowest EFCAM-

278 Ute Römer

Page 10: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Figure 1. VAC types across EFCAMDAT levels (German A1 to German C1)

DAT proficiency level rely heavily on these two simple copula be constructionsand use other VACs much less often. Two of the ten most frequent B2 level VACsthat are not shared with the A1 list are construction that contain modal verbs fol-lowed either by copula be or by a lexical verb and a direct object. If we look beyondthe top ten in the A1 and B2 frequency VAC lists, we notice that both of thesemodal verb constructions also occur in the A1 data, but with considerably lowerfrequencies than at B2 level (141.8 times per 100k words compared to 284.0 fornsubj-modal-v-dobj and 17.6 compared to 150.1 for nsubj-modal-vcop-acomp).

Table 4. Ten most frequent VAC types in EFCAMDAT subsets German A1 and GermanB2 (frequencies normalized per 100,000 words; shared VACs are highlighted in italics)

German A1 German B2

Rank VAC Frequency VAC Frequency

1 nsubj-vcop-ncomp 961.5 v-dobj 1164.5

2 nsubj-vcop-acomp 864.8 nsubj-v-dobj  834.3

3 nsubj-v-dobj 680.0 nsubj-vcop-acomp  655.8

4 v-dobj 668.3 nsubj-vcop-ncomp  493.8

5 nsubj-v-prep_in 326.7 nsubj-v-ccomp  452.3

6 expl-vcop-nsubj 179.5 nsubj-v-xcomp  365.3

7 nsubj-v-ccomp 171.8 v-ccomp  301.4

8 nsubj-v-xcomp 169.6 nsubj-v  286.8

9 nsubj-v 156.4 nsubj-modal-v-dobj  284.0

10 nsubj-v-dobj-dobj 154.9 nsubj-modal-vcop-acomp  150.1

Development of verb constructions in second language learners 279

Page 11: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

To gain further insights into learners’ VAC preferences across proficiency lev-els, we can also look at frequency information for repeatedly occurring com-binations of constructions and specific verbs. Table 5 displays the ten most fre-quent lemma-VAC combinations in the German A1 and German B2 subsets ofEFCAMDAT. The two most frequent lemma-VAC combinations (both copula beconstructions) are the same at the two levels but both used with higher frequen-cies at the A1 than the B2 level. The only other VAC realization that is sharedamong the top ten is the transitive construction with have, as exemplified in(1). In addition to those three, the list of most frequent lemma-VAC combina-tions at the low beginner level includes VAC realizations with high-frequencyverbs (nsubj-HAVE-dobj-dobj; expl-BE-nsubj) and VACs that contain preposi-tional phrases, illustrated by Examples (2) and (3). At the upper-intermediate level(B2), we instead find VAC realizations with lower frequency verbs such as applyand observe, exemplified in (4) and (5).

(1) (German B2)… you have enough space for the complete family

(2) (German A1)My aunts and uncles lives in Gelsenkirchen

(3) (German A1)I’m from Berlin, Germany

(4) (German B2)Let me encourage you to apply for this job

(5) (German B2)They offer regular trips to Florida to observe wild crocs

Table 5. Ten most frequent lemma-VAC combinations in EFCAMDAT subsets GermanA1 and German B2 (frequencies normalized per 100,000 words)

German A1 German B2

Rank Lemma-VAC combination Frequency Lemma-VAC combination Frequency

1 nsubj-BE-ncomp 961.3 nsubj-BE-acomp 655.8

2 nsubj-BE-acomp 864.8 nsubj-BE-ncomp 493.2

3 nsubj-LIVE-prep_in 260.5 nsubj-HAVE-dobj 185.3

4 expl-BE-nsubj 179.5 nsubj-modal-BE-acomp 150.1

5 nsubj-HAVE-dobj 171.4 nsubj-THINK-ccomp 132.3

6 nsubj-BE-prep_from 153.2 LET-ccomp 118.7

7 MEET-dobj 122.8 APPLY-prep_for 101.5

8 nsubj-LIKE-dobj 110.9 nsubj-aux_have-FIND-dobj  96.8

9 SEE-dobj  85.7 nsubj-WANT-xcomp  96.4

10 nsubj-HAVE-dobj-dobj  81.0 OBSERVE-dobj  92.3

280 Ute Römer

Page 12: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

3.3 Changes in verb-VAC associations across levels

In order to address RQ 3 and trace changes in verb-VAC associations from lowto high proficiency levels, verb frequency lists were retrieved for selected VACsand for each learner level. For these VACs, the distribution of verbs was thencompared across proficiency levels, using two analytic steps: normalized entropyand correlation analyses. Table 6 lists the normalized entropy (Hnorm) scores fortwo high-frequency VACs, nsubj-v-dobj (henceforth ‘transitive VAC’) and nsubj-modal-v-dobj (henceforth ‘modal VAC’), in the German A1 through German C1datasets and in COCA academic (as an L1 usage reference). A higher Hnorm scoreindicates a less predictable distribution of verbs in the construction which tendsto go with a lower degree of semantic cohesion among the verbs used in the VAC(Ellis & O’Donnell, 2014). Based on the scores in Table 6, the transitive VAC ismore predictable in its verb occupancy than the modal VAC. For both VACs,Hnorm values increase with learner proficiency from A1 to C1 level. Entropy scoresfor the same VACs in COCA academic are higher than or the same as those forGerman B2 but slightly lower than the C1 scores. For learners, moving towards ausage norm means moving towards higher Hnorm values. This trend is in opposi-tion to what we observed for more specific, less frequent VACs such as “V withn” (a verb followed by with, followed by a noun or noun phrase) for which Hnormscores were lowest in L1 usage and decreased in spoken learner production fromproficiency level B1, via B2, to C1/C2 (Römer, & Garner, forthcoming). A possiblereason for this is that the sequences nsubj-v-dobj and nsubj-modal-v-dobj mayeach actually subsume a set of sub-constructions which share the same surfaceform in terms of arguments but have different functions and hence contain differ-ent sets of verbs.

Table 6. Normalized entropy scores for two focus VACs across datasetsDataset Transitive VAC Modal VAC

German A1 0.59 0.72

German A2 0.61 0.74

German B1 0.64 0.79

German B2 0.71 0.77

German C1 0.76 0.83

COCA academic 0.71 0.78

For the same two focus VACs (transitive and modal), verb frequency lists werecorrelated in two datasets at a time. We compared the learner levels against eachother and against COCA academic. Table 7 provides Pearson correlation coeffi-cients (r) for all possible comparisons for both VACs. All correlations are significant

Development of verb constructions in second language learners 281

Page 13: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

at p< .001. R-values can range from 0 to 1, with 0 indicating no and 1 indicatinga perfect correlation. In other words, the closer an r-value is to 1, the stronger thecorrelation between the compared datasets (here two lists of verbs). For the firsttype of comparison (learner groups across levels), r-values are highest for adja-cent levels (e.g., German A1 vs. German A2; German B1 vs. German B2) and lowerfor non-adjacent levels (e.g. German A1 vs. German B2). This indicates that thereis, for instance, more overlap between the verbs used by German A1 and GermanA2 learners than there is between the verbs used by German A1 and German B2learners, suggesting development towards more variety in verb use at higher profi-ciency. This pattern can be observed for both VACs, although correlations are lowerthroughout for the modal VAC, potentially because, as indicated by our normalizedentropy analysis, this construction allows for more variation in the verbs it accom-modates. For the comparisons of learner and L1 usage data, all r-values are fairlylow, to a large extent because the COCA-derived verb lists for both VACs are signif-icantly longer than those retrieved from EFCAMDAT subsets and we thus includeda large number of verbs (data points) in the correlation analysis which are not usedin the learner production data. Among the results for this comparison type, how-ever, r-values increase from German A1 (via A2 and B1) to B2 level. This indicatesthat the production of verbs in the transitive and modal VACs moves closer to L1usage as learner proficiency increases. Correlations for German C1 compared toCOCA academic are lower than for German B1 and B2, likely because there is lessdata available at C1 than at the four other levels (resulting in shorter verb lists).

To visualize these correlations and see which verbs contribute to the overallscores, verb scatterplots were generated for each of the comparisons in Table 7.Figure 2 shows plots comparing the verbs used in the modal VAC in the GermanA1 vs. German B2 and in the German B1 vs. German B2 datasets. Perfect overlapin verbs between two datasets (i.e. an r-value of 1) would place all verb labelsneatly along the diagonal that runs through the middle of each plot. Verbs that arecomparatively more frequent in the dataset shown on the x-axis (German A1 inthe top and German B1 in the bottom plot) are plotted below the diagonal; verbsthat are comparatively more frequent in the dataset shown on the y-axis (GermanB2 in both plots) appear above the diagonal. The comparison of the modal VACverbs at low-beginner and upper-intermediate levels (top plot in Figure 2) showsa moderate r-value of 0.55, a small number of verbs along the diagonal (e.g. enjoy,help, like), and the majority of verbs appearing above the diagonal (e.g. believe,change, understand). This indicates that this VAC is more productive at B2 andthat learners at this level use the VAC with a larger variety of verbs. If we com-pare the verb lists for the same VAC for B1 and B2, two adjacent levels, we observea higher correlation (r= .70) and a larger number of verbs plotted along or closeto the diagonal (see bottom plot in Figure 2). This more balanced distribution of

282 Ute Römer

Page 14: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Table 7. Correlation values (Pearson’s r) for verb usage in two focus VACs across datasetsComparison Transitive VAC Modal VAC

Learner groups across levels

German A1 vs. German A2 0.76 0.63

German A1 vs. German B1 0.72 0.60

German A1 vs. German B2 0.61 0.55

German A1 vs. German C1 0.63 0.52

German A2 vs. German B1 0.75 0.69

German A2 vs. German B2 0.67 0.62

German A2 vs. German C1 0.62 0.59

German B1 vs. German B2 0.75 0.70

German B1 vs. German C1 0.70 0.63

German B2 vs. German C1 0.74 0.69

Learner groups vs. L1 usage

German A1 vs. COCA academic 0.30 0.28

German A2 vs. COCA academic 0.35 0.33

German B1 vs. COCA academic 0.41 0.40

German B2 vs. COCA academic 0.44 0.41

German C1 vs. COCA academic 0.38 0.37

verbs indicates that learners at these two levels show comparable VAC productiv-ity and that there is considerable overlap between the sets of verbs they use in thisVAC (both in terms of type and token frequencies).

3.4 The role of formulaic sequences in L2 VAC acquisition

Verb cluster lists for the verbs listed in Table 2 were analyzed manually to accessformulaic sequences in the learner production data at different levels of profi-ciency and determine what role those might play in the acquisition of VACs(RQ 4). In the verb cluster lists we looked for evidence of repeatedly used fixedsequences and of variable patterns based on those fixed sequences. Variationwithin a sequence would constitute evidence for the emergence of a construction.In this paper, we will only focus on clusters of different lengths around forms ofthe verb make (make, makes, made, making). make was selected because it is ahigh-frequency verb which has been shown to be difficult for even advanced L2learners (Altenberg & Granger, 2001; Nesselhauf, 2005). Table 8 displays the fivemost frequent 3-word and 4-word clusters around make in L1 German learnerdata across proficiency levels (A1 to C1). An analysis of longer versions of these

Development of verb constructions in second language learners 283

Page 15: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Figure 2. Correlations of verbs in the modal VAC in learner data across proficiency levels(top plot: German A1 vs. B2; bottom plot: German B1 vs. B2)

284 Ute Römer

Page 16: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

lists and corresponding lists for the other lemma forms indicates that learners atbeginning levels (A1 and A2) only use a small number of make clusters repeatedly(e.g. make an appointment, make the beds), while the repertoire of recurring makeclusters is larger at levels B1 to C1. Perhaps more interestingly, the analysis alsoshows a development from fixed sequences at levels A1 and A2 to more varied andflexible patterns at levels B1 to C1. To give a few examples, while A2 level learn-ers repeatedly produce make me happy, B2 learners use the same phrase but alsorepeatedly produce make you happy, would make you happy, and will make youhappy. Learners at B2 level do not only repeatedly use make you feel, but also makethem feel, and make my clients feel. Similarly, these upper-intermediate learners donot only produce make a decision, but also make difficult decisions, make certaindecisions, and make the right decisions. This indicates that in the written produc-tion of higher proficiency learners, the phrase make you feel has become the con-struction “make pronoun/noun feel”, while the phrase make a decision has becomethe construction “make noun phrase (containing the noun decision)”. Concretesequences of words develop into more abstract constructions.

Not only do the initial lists of mostly fixed make sequences exhibit more vari-ability in form at higher proficiency levels, make sequences also become morediverse in the meanings that the verb expresses. At the lowest level (A1), make iseither used in the sense of creating something tangible (make a meal) or intangi-ble (make an appointment), or it appears in the fixed phrase (to) make sure that. Atlevel A2, learners express these same functions but also use make clusters in whichthe verb expresses the meanings to cause (e.g. make me happy, make you want)and to obtain (e.g. make new friends). Starting at this level, make also appearsin support verb constructions (e.g. make a plan), a use that becomes more pro-nounced at levels B2 and C1 (e.g. make a decision, to make changes, make the rightchoice). Lastly, we noticed several unidiomatic or ungrammatical uses among theverb clusters at levels A1 and A2. Examples include make a party, make holidayin, make a break, make the laundry, make the dishes, and make the gardening. Inthese phrases, learners overextend make to contexts in which do or have wouldconstitute more appropriate choices, likely because of existing parallel structuresin their L1. In German, the verb machen, the most common translation equiv-alent of make, is used in all of these phrases (e.g. Urlaub machen, “make holi-day”; die Wäsche machen, “make the laundry”). None of these unidiomatic phrasesare attested in intermediate or advanced L1 German learner production in ourdatasets.

Development of verb constructions in second language learners 285

Page 17: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Table 8. Five most frequent make clusters (spans 3 and 4) in L1 German corpora acrosslevelsRank German A1 German A2 German B1 German B2 German C1

3-word clusters

1 make anappointment

make the beds make a tea to make a we can make

2 I will make I make the to make a to make it can make a

3 to make a we can make you can make able to make I will make

4 like to make and make the like to make can’t make it to make a

5 make sure that can makemusic

can make a we can’t make will make sure

4-word clusters

1 to make anappointment

we can makemusic

make a teafrom

we can’t makeit

we can make a

2 I will make a and make thebeds

I want tomake

able to make it I will make sure

3 to make sure that the dishes andmake

would like tomake

be able tomake

can make adifference

4 I’d like to make dishes andmake the

you can makea

to make acareer

will make surethat

5 would like tomake

make the bedson

like to make a afraid we can’tmake

believe we canmake

4. Conclusion and outlook

This paper has shared initial results from a large-scale study that combines dataand methods from Corpus Linguistics and NLP to investigate how verb-argumentconstructions develop in L2 learners of English. Data retrieved from a pseudo-longitudinal corpus of learner writing (subsets of EFCAMDAT) has allowed us toaddress the first four research questions listed at the end of Section 1. In responseto RQ 1, a frequency-sorted list of VACs in L1 German A1 learner production dataindicated that copula be constructions, simple transitive, and simple intransitiveconstructions are among the first VACs acquired by low-beginning L2 learners.Some of the frequent early transitive VACs contain prepositional phrases which,however, do not appear to be productive but are restricted to a small number ofconcrete realizations in terms of verb-preposition combinations (e.g. nsubj live in,nsubj be from). A comparison of VAC type lists from the different proficiency lev-els served to address RQ 2 and showed a steady growth in learners’ VAC reper-toire from lowest to highest levels. A comparison of normalized frequencies of

286 Ute Römer

Page 18: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

the most dominant VACs at A1 and B2 levels emphasized the heavy reliance of A1level learners on copula be constructions and showed a preference of VACs whichinclude modal verbs by B2 level writers. Comparing the most frequent lemma-VAC combinations between the same two levels indicated that A1 level learnersuse mostly VAC realizations with high-frequency verbs whereas at B2 level thetop-10 VAC realizations also include lower frequency verbs.

Results from normalized entropy and correlation analyses helped us addressRQ 3 and highlighted changes in dominant verb-VAC associations across learnerlevels. Normalized entropy values for two focus VACs (the transitive constructionand modal verb construction) indicated a development closer to an L1 usage normand towards less predictable verb distributions from low to high proficiency lev-els. For the same two VACs, the correlation analysis showed more overlap in verbchoices between adjacent than non-adjacent learner levels, higher productivity ofVACs at higher proficiency levels, and increasing overlap between verbs in VACsin L1 usage and learner production from low to high proficiency levels. Lastly(RQ4), lists of verb clusters for different forms of the verb make used at variousproficiency levels provided evidence for a development from small sets of fixedsequences at beginning levels (A1 and A2) to a wider repertoire of more variablesequences at intermediate and advanced levels (B1, B2, C1), confirming findingsfrom earlier studies on individual constructions based on data from individuallearners (e.g. Eskildsen, 2009; Eskildsen, & Cadierno, 2007; Eskildsen et al., 2015).Taken together with the expansion of functions of make observed at the higherlevels, this variability in form suggests that initial formulaic sequences give way tomore abstract types of constructions.

Despite the insights it has provided into L2 learner VAC development, thispaper has a number of limitations that open up opportunities for future research.It only reported findings based on a subset of constructions in the L1 Germandatasets. Next steps will include further evaluation of the construction databasesextracted from each EFCAMDAT subset (for two learner groups and five levels),both quantitatively and qualitatively. The quantitative analysis will include mea-suring the productivity and predictability of, and generating correlation plots fora larger set of VACs. The qualitative analysis will involve zooming in on the setsof verbs used in VACs which are frequent across proficiency levels to see howeach VAC develops in terms of dominant verb associations and semantics. For theverb cluster analysis, we only presented results for one focus verb, the verb make.This analysis will be extended to the remaining 49 high-frequency verbs for whichcluster lists have already been retrieved from each of the ten EFCAMDAT subsets.In order to address the fifth research question of the larger study focusing on L1differences in learners’ VAC emergence (not included in the present paper), VAClists, lemma-VAC lists, and verb-association information for individual VACs will

Development of verb constructions in second language learners 287

Page 19: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

be compared between the L1 German and L1 Spanish datasets, separately for eachproficiency level (e.g. German A1 vs. Spanish A1). In the comparisons of datasetsacross learner L1 backgrounds, we will consider potential effects of crosslinguis-tic transfer (Gass & Selinker, 1983; Jarvis, 2011, 2013; Jarvis & Pavlenko, 2008). Anadditional expansion of our work might involve the detailed longitudinal analysisof language produced by individual learners who move through all (or most) lev-els, provided that sufficient relevant data can be extracted from EFCAMDAT. Aremaining limitation of the present paper is that it did not discuss potential effectsof writing tasks on learner VAC production. We have recently received access tolists of the Englishtown task prompts for the majority of the 16 EFCAMDAT lev-els. This allows us to study the VACs in these prompts as well as the verbs used inthem and consider this information in the interpretation of the learner VAC pro-duction data.

As noted in the introduction to this article, our knowledge of how VACsemerge in L2 English learners is still quite limited, especially when compared towhat we know about L1 acquisition. By drawing on data on a wide range of VACsfrom a large database of L2 learner data across proficiency levels, we hope to havemade a small contribution to a better understanding of L2 learners’ emerging con-struction knowledge, while at the same time illustrating the potential of a usage-based or “usage-inspired” (Tyler & Ortega, 2018) approach to second languageacquisition research.

Acknowledgements

I would like to thank the Language Learning Small Research Grants Program for sponsoring theproject “The emergence of verb-argument constructions in second language learners of Eng-lish” (2016–2017). I am also extremely grateful to the external reviewers for helpful commentson an earlier draft of this article, to Kristopher Kyle for his work on the VAC retrieval fromEFCAMDAT, and to James Garner, Olivia Higgins, and Kristina Sherrill for their help with theverb cluster analysis.

References

Alexopoulou, T., Geertzen, J., Korhonen, A., & Meurers, D. (2015). Exploring big educationallearner corpora for SLA research: Perspectives on relative clauses. International Journal ofLearner Corpus Research, 1(1), 96–129. https://doi.org/10.1075/ijlcr.1.1.04ale

Allan, L.G. (1980). A note on measurement of contingency between two binary variables injudgment tasks. Bulletin of the Psychonomic Society, 15, 147–149.https://doi.org/10.3758/BF03334492

288 Ute Römer

uroemer
Cross-Out
uroemer
Inserted Text
I
uroemer
Cross-Out
uroemer
Inserted Text
me
uroemer
Cross-Out
uroemer
Inserted Text
I
Page 20: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of MAKE in nativeand non-native student writing. Applied Linguistics, 22(2), 173–195.https://doi.org/10.1093/applin/22.2.173

Ambridge, B., & Lieven, E. (2015). A constructivist account of child language acquisition. InB. MacWhinney & W. O’Grady (Eds.), The Handbook of Language Emergence (pp.478–510). Malden, MA: Wiley-Blackwell.

Anthony, L. (2014a). AntConc (Version 3.4.2) [Computer Software]. Tokyo, Japan: WasedaUniversity. Available from http://www.laurenceanthony.net/software

Anthony, L. (2014b). TagAnt (Version 1.1.0) [Computer Software]. Tokyo, Japan: WasedaUniversity. Available from http://www.laurenceanthony.net/software

Barlow, M. (2015). Collocate (Version 2.0) [Computer Software]. Houston, TX: Athelstan.Behrens, H. (2009). Usage-based and emergentist approaches to language acquisition.

Linguistics, 47(2), 383–411. https://doi.org/10.1515/LING.2009.014

Bybee, J. (2010). Language, Usage and Cognition. Cambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511750526

Chen, D., & Manning, C.D. (2014). A fast and accurate dependency parser using neuralnetworks. In Proceedings of the 2014 Conference on Empirical Methods on NaturalLanguage Processing (EMNLP) (pp. 740–750). Retrieved from https://cs.stanford.edu/~danqi/papers/emnlp2014.pdf (last accessed April 2019).https://doi.org/10.3115/v1/D14‑1082

Council of Europe. (2001). Common European Framework of Reference for Languages:Learning, teaching, assessment. Cambridge: Cambridge University Press.

Davies, M. (2008–). The Corpus of Contemporary American English (COCA): 560 million words,1990-present. Available online at https://www.english-corpora.org/coca/ (last accessedApril 2019).

de Marneffe, M.-C., & Manning, C.D. (2008). Stanford typed dependencies manual (revisedfor the Stanford Parser v. 3.5.2 in April 2015). Retrieved from https://nlp.stanford.edu/software/dependencies_manual.pdf (last accessed April 2019).

Díez-Bedmar, M. B. (2012). The use of the Common European Framework of Reference forLanguages to evaluate compositions in the English exam section of the universityadmissions examination. Revista de Educación, 357, 55–79.

Eeg-Olofsson, M., & Altenberg, B. (1994). Discontinuous recurrent word combinations in theLondon-Lund Corpus. In U. Fries, G. Tottie, & P. Schneider (Eds.), Creating and UsingEnglish Language Corpora: Papers from the Fourteenth International Conference on EnglishLanguage Research on Computerized Corpora. Amsterdam: Rodopi, 63–77.

Ellis, N. C. (2002). Frequency effects in language processing: A review with implications fortheories of implicit and explicit language acquisition. Studies in Second LanguageAcquisition, 24(2), 143–188. https://doi.org/10.1017/S0272263102002024

Ellis, N. C. (2003). Constructions, chunking, and connectionism: The emergence of secondlanguage structure. In C. J. Doughty & M. H. Long (Eds.), The Handbook of SecondLanguage Acquisition (pp. 63–103). Malden, MA: Blackwell.https://doi.org/10.1002/9780470756492.ch4

Ellis, N. C., & Cadierno, T. (2009). Constructing a second language. Annual Review ofCognitive Linguistics, 7 (Special section), 111–290. https://doi.org/10.1075/arcl.7.05ell

Ellis, N. C., & Ferreira-Junior, F. (2009). Constructions and their acquisition: Islands and thedistinctiveness of their occupancy. Annual Review of Cognitive Linguistics, 7, 188–221.https://doi.org/10.1075/arcl.7.08ell

Development of verb constructions in second language learners 289

Page 21: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Ellis, N. C., & Larsen-Freeman, D. (2009). Language emergence: Implications for appliedlinguistics – Introduction to the special issue. Applied Linguistics, 27(4), 558–589.https://doi.org/10.1093/applin/aml028

Ellis, N. C., & O’Donnell, M.B. (2014). Construction learning as category learning: A cognitiveanalysis. In T. Herbst, H. J. Schmid, & S. Faulhaber (Eds.), Constructions CollocationsPatterns. Berlin: De Gruyter Mouton, 71–97.

Ellis, N. C., O’Donnell, M.B., & Römer, U. (2013). Usage-based language: Investigating thelatent structures that underpin acquisition. Language Learning, 63(Supp. 1), 25–51.https://doi.org/10.1111/j.1467‑9922.2012.00736.x

Ellis, N. C., Römer, U., & O’Donnell, M.B. (2016). Usage-based Approaches to LanguageAcquisition and Processing: Cognitive and Corpus Investigations of Construction Grammar.Malden, MA: Wiley.

Eskildsen, S. W. (2009). Constructing another language: Usage-based linguistics in secondlanguage acquisition. Applied Linguistics, 30(3), 335–357.https://doi.org/10.1093/applin/amn037

Eskildsen, S. W. (2012). L2 negation constructions at work. Language Learning, 62(2), 335–372.https://doi.org/10.1111/j.1467‑9922.2012.00698.x

Eskildsen, S. W., & Cadierno, T. (2007). Are recurring multi-word expressions really syntacticfreezes? Second language acquisition from the perspective of usage-based linguistics. InM. Nenonen & Niemi, S. (Eds.). Collocations and Idioms 1. Papers from the First NordicConference on Syntactic Freezes (Vol. 41, pp. 86–99). Joensuu: Joensuu University Press.

Eskildsen, S. W., Cadierno, T., & Li, P. (2015). On the development of motion constructions infour learners of L2 English. In T. Cadierno & S. W. Eskildsen (Eds.), Usage-BasedPerspectives on Second Language Learning (pp. 207–232). Berlin: Walter de Gruyter.

Gass, S., & Selinker, L. (Eds.). (1983). Language Transfer in Language Learning. Rowley, MA:Newbury House.

Geertzen, J., Alexopoulou, T., & A. Korhonen. (2013). Automatic linguistic annotation of largescale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT).Proceedings of the 31st Second Language Research Forum (SLRF). Carnegie MellonUniversity: Cascadilla Press.

Goldberg, A. E. (1995). Constructions. A Construction Grammar Approach to ArgumentStructure. Chicago: University of Chicago Press.

Goldberg, A. E. (2003). Constructions: A new theoretical approach to language. Trends inCognitive Science, 7, 219–224. https://doi.org/10.1016/S1364‑6613(03)00080‑9

Goldberg, A. E. (2006). Constructions at Work. The Nature of Generalization in Language.Oxford: Oxford University Press.

Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning argument structuregeneralizations. Cognitive Linguistics, 15(3), 289–316. https://doi.org/10.1515/cogl.2004.011

Gries, S. T. & Ellis, N. C. (2015). Statistical measures for usage-based linguistics. LanguageLearning, 65(S1), 228–255. https://doi.org/10.1111/lang.12119

Gries, S. T., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-basedperspective on alternations. International Journal of Corpus Linguistics, 9(1), 97–129.https://doi.org/10.1075/ijcl.9.1.06gri

Hawkins, J.A., & P. Buttery. (2010). Criterial features in learner corpora: Theory andillustrations. English Profile Journal, 1(1), 1–23. https://doi.org/10.1017/S2041536210000036

290 Ute Römer

Page 22: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Jarvis, S. (2011). Conceptual transfer: Crosslinguistic effects in categorization and construal.Bilingualism: Language and Cognition, 14(Special Issue), 1–8.https://doi.org/10.1017/S1366728910000155

Jarvis, S. (2013). Crosslinguistic influence and multilingualism. In C.A. Chapelle (Ed.), TheEncyclopedia of Applied Linguistics. Malden, MA: Wiley-Blackwell. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/9781405198431.wbeal0291/full (last accessed April2019).

Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic Influence in Language and Cognition. NewYork, NY: Routledge. https://doi.org/10.4324/9780203935927

Kumar, U., Kumar, V., & Kapur, J. N. (1986). Normalized measures of entropy. InternationalJournal of General Systems, 12(1), 55–69. https://doi.org/10.1080/03081078608934927

Kyle, K. (2016). Measuring Syntactic Development in L2 Writing: Fine Grained Indices ofSyntactic Complexity and Usage-Based Indices of Syntactic Sophistication (Unpublisheddoctoral dissertation). Georgia State University, Atlanta, GA.

Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-basedapproach. Language Testing, 34(4), 513–535. https://doi.org/10.1177/0265532217712554

Li, P., Eskildsen, S.W., & Cadierno, T. (2014). Tracing an L2 learner’s motion constructionsover time: A usage-based classroom investigation. The Modern Language Journal, 98(2),612–628. https://doi.org/10.1111/modl.12091

Lieven, E., Pine, J.M., & Baldwin, G. (1997). Lexically-based learning and early grammaticaldevelopment. Journal of Child Language, 24(1), 187–219.https://doi.org/10.1017/S0305000996002930

Murakami, A. (2013). L1 Influence and Individual Variation in the L2 Accuracy Developmentof Grammatical Morphemes: Insights from Learner Corpora (Unpublished doctoraldissertation). University of Cambridge, UK.

Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam/Philadelphia, PA: JohnBenjamins. https://doi.org/10.1075/scl.14

Ninio, A. (1999). Pathbreaking verbs in syntactic development and the question of prototypicaltransitivity. Journal of Child Language, 26(3), 619–653.https://doi.org/10.1017/S0305000999003931

Ninio, A. (2006). Language and the Learning Curve. A New Theory of Syntactic Development.Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199299829.001.0001

Nisioi, S. (2015). Feature analysis for native language identification. In A. Gelbukh (Ed.),Computational Linguistics and Intelligent Text Processing. CICLing 2015. Cham: Springer.https://doi.org/10.1007/978‑3‑319‑18111‑0_49

Perfors, A., Tenenbaum, J.B., & Wonnacott, E. (2010). Variability, negative evidence, and theacquisition of verb argument constructions. Journal of Child Language, 37(3), 607–642.https://doi.org/10.1017/S0305000910000012

R Development Core Team. (2017). R: A Language and Environment for Statistical Computing.Vienna: R Foundation for Statistical Computing.

Römer, U., & Garner, J. R. (forthcoming). The development of verb constructions in spokenlearner English: Tracing effects of usage and proficiency. International Journal of LearnerCorpus Research.

Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction betweenwords and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.https://doi.org/10.1075/ijcl.8.2.03ste

Development of verb constructions in second language learners 291

Page 23: Römer: A corpus perspective on the development of verb … · 2019-09-04 · of the verb in a construction and were therefore excluded from the VAC defin-ition used in this study

Tomasello, M. (1992). First Verbs. A Case Study of Early Grammatical Development of Cognitionand Action. Cambridge, MA: MIT Press. https://doi.org/10.1017/CBO9780511527678

Tomasello, M. (2003). Constructing a Language. A Usage-based Theory of Language Acquisition.Cambridge, MA: Harvard University Press.

Tono, Y., & Díez-Bedmar, M.B. (2014). Focus on learner writing at the beginning andintermediate stages: The ICCI corpus. International Journal of Corpus Linguistics, 19(2),163–177. https://doi.org/10.1075/ijcl.19.2.01ton

Trousdale, G., & Hoffmann, T. (Eds.). (2013). Oxford Handbook of Construction Grammar.Oxford: Oxford University Press.

Tyler, A. E., & Ortega, L. (2018). Usage-inspired L2 instruction: An emergent, researchedpedagogy. In Tyler, A. E., L. Ortega, M. Uno, & H.I. Park (Eds.), Usage-Inspired L2Instruction: Researched Pedagogy (pp. 3–26). Amsterdam/Philadelphia, PA: JohnBenjamins. https://doi.org/10.1075/lllt.49.01tyl

Address for correspondence

Ute RömerDepartment of Applied Linguistics and ESLGeorgia State University25 Park Place NE, Suite 1500Atlanta, GA [email protected]

292 Ute Römer