corpus linguistics in tefl uses and benefits
TRANSCRIPT
This paper was presented by Chinger Zapata. It appears in Proceedings ofVenTESOL’s 30th National Convention “An Affirmation of Excellence,” World TradeCenter, Valencia – Venezuela, 2012, pp. 8-11.
Corpus Linguistics in TEFL: Uses and Benefits
Summary
Corpus Linguistics (CL) has made an impact on language learning and teaching in thelast decades. Literature shows the benefits of CL as a theoretical framework as wellas a methodological approach. In Venezuela, CL has a 40-year, consolidated traditionin linguistics and applied linguistics regarding L1. When it comes to L2, however, ithas unfortunately seen many springs pass by without blossoming. One of the reasonsthat might explain such case is EFL teachers’ lack of information about CL. A reviewof undergraduate and graduate EFL programs’ curricula in Venezuela shows that until2010, CL has not been included as a subject. Consequently, many teachers are notaware of CL. This lack explains why, although there is some research done under CL,most research in EFL in our country focuses on traditional research methods. Thispaper is based on the experience of the TEFL Master Program at UPEL-Barquisimeto, in which CL has been included in the curriculum since 2010. It aims toshow undergraduate and graduate students first a brief theoretical reference on thetopic; second, the benefits of CL as a tool, on the one hand, to describe learningproblems, and on the other, to develop teaching materials.
Keywords: Corpus Linguistics, EFL, material developing
Introduction
In the area of language learning and teaching, Corpus Linguistics (CL) has built up
a reputation as a theory and as a methodology in the last decades (McEnery &
Wilson, 1996; Biber, Conrad, & Reppen, 1998; Tognini-Bonelli, 2001; Meyer, 2004;
Lüdeling, & Kytö, 2008). In consequence, most of current research in linguistics and
applied linguistics regarding EFL around the world is based on CL.
In Venezuela, CL has been used for around 40 years, since Bentivoglio and a
group of researchers from Universidad Central de Venezuela (UCV) started building
the Spanish Spoken Corpus of Caracas in 1976, and updated it in 1987-88. Ever
since, many Venezuelan researchers have devoted to the creation of corpus
construction and design1 (Zapata & Marquez, 2010).
1 For more information on corpora construction in Venezuela, see Chela-Flores & Gelman (1986);Mora & Domínguez (1990-1994); Navarro (1980s); Mora (2006); Velásquez (1993-1996).
The use of CL in Venezuela, as far as Spanish is concerned, is pretty much the
same as with many other languages in the rest of the world. However, in EFL the
situation is different.
A general revision of the undergraduate and graduate T/EFL programs in
Venezuelan universities shows that CL is referred to as a methodology to conduct
research, but students do not receive any specialized training to master such
approach. This explains, on the one hand, why just a few numbers of researchers use
CL in their studies, and on the other, why most researchers rely on traditional
methods. Although traditional research methods are valid for investigation, a
specialized training in CL provides undergraduate and graduate students with a
powerful tool to do research as well as to develop teaching materials.
Considering the above mentioned, the purpose of this paper is to provide
information regarding CL from a theoretical and a methodological perspective, plus
the uses and benefits of it in the TEFL field. To reach this goal, the following
objectives are proposed: 1. to revise briefly the definitions of CL as a theory and as a
methodology. 2. to describe examples of how undergraduate and graduate students
from the Department of Modern Languages are using CL in their study of the
language and in their teaching practice and the benefits of such use.
Literature Review
What is Corpus Linguistics as a theory? Halliday (1993, p. 24) claims that,
“…corpus linguistics re-unites the activities of data gathering and theorizing […] this
is leading to a qualitative change in our understanding of language…” This claim
reveals that not only data are processed through CL, but also hypothesis and
postulates that define language behavior can be drawn.
What is Corpus Linguistics as a methodology? McEnery & Wilson (1996, p. 1)
point out that CL is more about, “…the study of language based on examples of ‘real
life’ language use and a methodology rather than an aspect of language requiring
explanation or description.”
McEnery & Wilson’s point of view has become stronger in the last years. Meyer
(2004, p. xi) emphasizes this view when he declares, “…it becomes quite evident that
corpus linguistics is more a way of doing linguistics […] than a separate paradigm
within linguistics.”
CL and TEFL in the Department of Modern Languages at UPEL-Barquisimeto
In the view of the role of Corpus Linguistics in teacher training, in 2010 CL was
included as a subject in the TEFL graduate program. A methodological approach in
subjects concerning research in the undergraduate program of the DML was
introduced at UPEL-Barquisimeto. Program participants are first instructed in the
principles underlying both theory and methodology. Second, they are exposed to the
use of two tools: The Corpus of Contemporary American English (COCA) by Davis
and AntConc by Anthony. Third, they are asked to select topics of their interest
(related to language learning or teaching) in various fields such as lexicography,
morphology, language teaching, phonetics and phonology to practice what they have
learned in theory. In the end, program participants are to outline their research results
in scientific papers and submit them for final assessment.
The Corpus of Contemporary American English
The Corpus of Contemporary American English (COCA) is the largest freely-
available corpus of English, and the only large and balanced corpus of American
English. The corpus was created by Mark Davies of Brigham Young University.
The corpus contains more than 425 million words of text and is equally divided
among spoken, fiction, popular magazines, newspapers, and academic texts.
The interface allows you to search for exact words or phrases, wildcards, lemmas,
part of speech, or any combinations of these. You can search for surrounding words
(collocates) within a ten-word window (e.g. all nouns somewhere near faint, all
adjectives near woman, or all verbs near feelings), which often gives you good insight
into the meaning and use of a word.
The corpus also allows you to easily limit searches by frequency and compare the
frequency of words, phrases, and grammatical constructions, in at least two main
ways:
By genre: comparisons between spoken, fiction, popular magazines,
newspapers, and academic, or even between sub-genres (or domains), such as
movie scripts, sports magazines, newspaper editorial, or scientific journals
Over time: compare different years from 1990 to the present time
You can also easily carry out semantically-based queries of the corpus. For
example, you can contrast and compare the collocates of two related words
(little/small, democrats/republicans, men/women), to determine the difference in
meaning or use between these words. You can find the frequency and distribution of
synonyms for nearly 60,000 words and also compare their frequency in different
genres, and also use these word lists as part of other queries. Finally, you can easily
create your own lists of semantically-related words, and then use them directly as part
of the query.
AntConc
AntConc was created by Laurence Anthony in 2002. It is a freeware,
multiplatform tool for carrying out corpus linguistics research and data-driven
learning. It contains seven tools:
Concordance Tool: This tool shows search results in a 'KWIC' format. This
allows you to see how words and phrases are commonly used in a corpus of texts.
Concordance Plot Tool: This tool shows search results plotted as a 'barcode'
format. This allows you to see the position where search results appear in target
texts.
File View Tool: This tool shows the text of individual files. This allows you to
investigate in more detail the results generated in other tools of AntConc.
Clusters/N-Grams: The Clusters Tool shows clusters based on the search condition.
In effect it summarizes the results generated in the Concordance Tool or
Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire
corpus for 'N' (e.g. 1 word, 2 words…) length clusters. This allows you to find
common expressions in a corpus.
Collocates: This tool shows the collocates of a search term. This allows you to
investigate non-sequential patterns in language.
Word List: This tool counts all the words in the corpus and presents them in an
ordered list. This allows you to quickly find which words are the most frequent in a
corpus.
Keyword List: This tool shows which words are unusually frequent (or
infrequent) in the corpus in comparison with the words in a reference corpus. This
allows you to identify characteristic words in the corpus, for example, as part of a
genre or ESP study.
The Experience
At the end of terms 2-2011 (undergraduate program) and 3-2011 (graduate
program), an interesting range of papers was available for further analysis. Let us
consider three samples presented by students in both programs. The first paper is
titled The Infinitive Phrase as a Subject (IPS) in the Sentence of English Written
Discourse. To compile the paper students used COCA. Four important findings were
highlighted by the participants: 1. the low frequency of the IPS in written discourse,
2. types of genre the IPS is more common for, 3. the dynamics of the IPS use for the
last twenty years, and 4. the different grammatical formulas the IPS can have in the
sentence. Students discussed the fact that the IPS is a restricted, complex form of the
language mostly used by expert writers for papers and research articles in academic
writing. They concluded that EFL teachers should be aware of this situation and not
expect students in the beginning levels of English to produce compositions that
include IPS. These findings support the idea that the complexity of the IPS demands
more careful instruction and practice with respect to pragmatics.
The second paper produced by graduate students is titled Oral Production of /s/
Phoneme in High-School Students. In this paper, graduate students detected the
problems with the production of the /s/ phoneme faced by high-school students. To
register what the nature of the problem was, they recorded students’ oral discourse
and transcribed it. These data were digitalized and uploaded in AntConc. The tool
allowed the researchers to determine in what gender mispronunciation of the /s/
phoneme is more frequent, as well as the percentages of error frequency. They were
also able to identify the phonic contexts for mispronouncing. This study gave
researchers a better and more precise description and understanding of the nature of
the problem and allowed them to think of the ways to potentially solve it.
Finally, let us consider the third paper titled The Lexis of Hip-Hop Songs. Hip-Hop
was chosen for analysis because of its popularity among students. The objective for
the program participants was to produce a glossary of the most common and frequent
words used in this musical genre. First a number of Hip-Hop lyrics was collected and
digitalized. Later, the documents were uploaded and processed with AntConc. Most
frequent words used in this genre were put on the list, included into a glossary, and
presented to the class in the form of a reference for unknown words.
Final Comments
The experience has demonstrated that CL is a tool that can be exploited in various
ways. It allows pre-service and in-service EFL teachers to prepare teaching materials
based on serious research, to detect learning problems, and include language
descriptions in the process of learning.
ReferencesBiber, D., Conrad, S. y Reppen, R. (1998). Corpus Linguistics: Investigating
Language Structure and Use. Cambridge: Cambridge University Press.Lüdeling, & M. Kytö. (2008). (eds.). Corpus Linguistics: An International Handbook.
Walter de Gruyter GmbH & Co. KG, 10785 Berlin, Germany.McEnery, T. y Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh
University Press.Meyer, C. (2004). English Corpus Linguistics: An Introduction. United Kingdom.
Cambridge University Press.Tognini-Bonelli, E. (2001). Corpus linguistics at work (Studies in Corpus
Linguistics: 6), Amsterdam/Atlanta, GA: John Benjamins.Zapata, C. & Márquez, Y. (2010). (comps.). Corpus Linguistics: An Insight. Selection
of Readings. Núcleo de Investigación para el Estudio y Enseñanza de Lenguas.Departamento de Idiomas Modernos de la UPEL-IPB. Barquisimeto – Venezuela.