corpus linguistics in tefl uses and benefits

This paper was presented by Chinger Zapata. It appears in Proceedings ofVenTESOL’s 30th National Convention “An Affirmation of Excellence,” World TradeCenter, Valencia – Venezuela, 2012, pp. 8-11.

Corpus Linguistics in TEFL: Uses and Benefits

Summary

Corpus Linguistics (CL) has made an impact on language learning and teaching in thelast decades. Literature shows the benefits of CL as a theoretical framework as wellas a methodological approach. In Venezuela, CL has a 40-year, consolidated traditionin linguistics and applied linguistics regarding L1. When it comes to L2, however, ithas unfortunately seen many springs pass by without blossoming. One of the reasonsthat might explain such case is EFL teachers’ lack of information about CL. A reviewof undergraduate and graduate EFL programs’ curricula in Venezuela shows that until2010, CL has not been included as a subject. Consequently, many teachers are notaware of CL. This lack explains why, although there is some research done under CL,most research in EFL in our country focuses on traditional research methods. Thispaper is based on the experience of the TEFL Master Program at UPEL-Barquisimeto, in which CL has been included in the curriculum since 2010. It aims toshow undergraduate and graduate students first a brief theoretical reference on thetopic; second, the benefits of CL as a tool, on the one hand, to describe learningproblems, and on the other, to develop teaching materials.

Keywords: Corpus Linguistics, EFL, material developing

Introduction

In the area of language learning and teaching, Corpus Linguistics (CL) has built up

a reputation as a theory and as a methodology in the last decades (McEnery &

Wilson, 1996; Biber, Conrad, & Reppen, 1998; Tognini-Bonelli, 2001; Meyer, 2004;

Lüdeling, & Kytö, 2008). In consequence, most of current research in linguistics and

applied linguistics regarding EFL around the world is based on CL.

In Venezuela, CL has been used for around 40 years, since Bentivoglio and a

group of researchers from Universidad Central de Venezuela (UCV) started building

the Spanish Spoken Corpus of Caracas in 1976, and updated it in 1987-88. Ever

since, many Venezuelan researchers have devoted to the creation of corpus

construction and design1 (Zapata & Marquez, 2010).

1 For more information on corpora construction in Venezuela, see Chela-Flores & Gelman (1986);Mora & Domínguez (1990-1994); Navarro (1980s); Mora (2006); Velásquez (1993-1996).

The use of CL in Venezuela, as far as Spanish is concerned, is pretty much the

same as with many other languages in the rest of the world. However, in EFL the

situation is different.

A general revision of the undergraduate and graduate T/EFL programs in

Venezuelan universities shows that CL is referred to as a methodology to conduct

research, but students do not receive any specialized training to master such

approach. This explains, on the one hand, why just a few numbers of researchers use

CL in their studies, and on the other, why most researchers rely on traditional

methods. Although traditional research methods are valid for investigation, a

specialized training in CL provides undergraduate and graduate students with a

powerful tool to do research as well as to develop teaching materials.

Considering the above mentioned, the purpose of this paper is to provide

information regarding CL from a theoretical and a methodological perspective, plus

the uses and benefits of it in the TEFL field. To reach this goal, the following

objectives are proposed: 1. to revise briefly the definitions of CL as a theory and as a

methodology. 2. to describe examples of how undergraduate and graduate students

from the Department of Modern Languages are using CL in their study of the

language and in their teaching practice and the benefits of such use.

Literature Review

What is Corpus Linguistics as a theory? Halliday (1993, p. 24) claims that,

“…corpus linguistics re-unites the activities of data gathering and theorizing […] this

is leading to a qualitative change in our understanding of language…” This claim

reveals that not only data are processed through CL, but also hypothesis and

postulates that define language behavior can be drawn.

What is Corpus Linguistics as a methodology? McEnery & Wilson (1996, p. 1)

point out that CL is more about, “…the study of language based on examples of ‘real

life’ language use and a methodology rather than an aspect of language requiring

explanation or description.”

McEnery & Wilson’s point of view has become stronger in the last years. Meyer

(2004, p. xi) emphasizes this view when he declares, “…it becomes quite evident that

corpus linguistics is more a way of doing linguistics […] than a separate paradigm

within linguistics.”

CL and TEFL in the Department of Modern Languages at UPEL-Barquisimeto

In the view of the role of Corpus Linguistics in teacher training, in 2010 CL was

included as a subject in the TEFL graduate program. A methodological approach in

subjects concerning research in the undergraduate program of the DML was

introduced at UPEL-Barquisimeto. Program participants are first instructed in the

principles underlying both theory and methodology. Second, they are exposed to the

use of two tools: The Corpus of Contemporary American English (COCA) by Davis

and AntConc by Anthony. Third, they are asked to select topics of their interest

(related to language learning or teaching) in various fields such as lexicography,

morphology, language teaching, phonetics and phonology to practice what they have

learned in theory. In the end, program participants are to outline their research results

in scientific papers and submit them for final assessment.

The Corpus of Contemporary American English

The Corpus of Contemporary American English (COCA) is the largest freely-

available corpus of English, and the only large and balanced corpus of American

English. The corpus was created by Mark Davies of Brigham Young University.

The corpus contains more than 425 million words of text and is equally divided

among spoken, fiction, popular magazines, newspapers, and academic texts.

The interface allows you to search for exact words or phrases, wildcards, lemmas,

part of speech, or any combinations of these. You can search for surrounding words

(collocates) within a ten-word window (e.g. all nouns somewhere near faint, all

adjectives near woman, or all verbs near feelings), which often gives you good insight

into the meaning and use of a word.

The corpus also allows you to easily limit searches by frequency and compare the

frequency of words, phrases, and grammatical constructions, in at least two main

ways:

By genre: comparisons between spoken, fiction, popular magazines,

newspapers, and academic, or even between sub-genres (or domains), such as

movie scripts, sports magazines, newspaper editorial, or scientific journals

Over time: compare different years from 1990 to the present time

You can also easily carry out semantically-based queries of the corpus. For

example, you can contrast and compare the collocates of two related words

(little/small, democrats/republicans, men/women), to determine the difference in

meaning or use between these words. You can find the frequency and distribution of

synonyms for nearly 60,000 words and also compare their frequency in different

genres, and also use these word lists as part of other queries. Finally, you can easily

create your own lists of semantically-related words, and then use them directly as part

of the query.

AntConc

AntConc was created by Laurence Anthony in 2002. It is a freeware,

multiplatform tool for carrying out corpus linguistics research and data-driven

learning. It contains seven tools:

Concordance Tool: This tool shows search results in a 'KWIC' format. This

allows you to see how words and phrases are commonly used in a corpus of texts.

Concordance Plot Tool: This tool shows search results plotted as a 'barcode'

format. This allows you to see the position where search results appear in target

texts.

File View Tool: This tool shows the text of individual files. This allows you to

investigate in more detail the results generated in other tools of AntConc.

Clusters/N-Grams: The Clusters Tool shows clusters based on the search condition.

In effect it summarizes the results generated in the Concordance Tool or

Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire

corpus for 'N' (e.g. 1 word, 2 words…) length clusters. This allows you to find

common expressions in a corpus.

Collocates: This tool shows the collocates of a search term. This allows you to

investigate non-sequential patterns in language.

Word List: This tool counts all the words in the corpus and presents them in an

ordered list. This allows you to quickly find which words are the most frequent in a

corpus.

Keyword List: This tool shows which words are unusually frequent (or

infrequent) in the corpus in comparison with the words in a reference corpus. This

allows you to identify characteristic words in the corpus, for example, as part of a

genre or ESP study.

The Experience

At the end of terms 2-2011 (undergraduate program) and 3-2011 (graduate

program), an interesting range of papers was available for further analysis. Let us

consider three samples presented by students in both programs. The first paper is

titled The Infinitive Phrase as a Subject (IPS) in the Sentence of English Written

Discourse. To compile the paper students used COCA. Four important findings were

highlighted by the participants: 1. the low frequency of the IPS in written discourse,

2. types of genre the IPS is more common for, 3. the dynamics of the IPS use for the

last twenty years, and 4. the different grammatical formulas the IPS can have in the

sentence. Students discussed the fact that the IPS is a restricted, complex form of the

language mostly used by expert writers for papers and research articles in academic

writing. They concluded that EFL teachers should be aware of this situation and not

expect students in the beginning levels of English to produce compositions that

include IPS. These findings support the idea that the complexity of the IPS demands

more careful instruction and practice with respect to pragmatics.

The second paper produced by graduate students is titled Oral Production of /s/

Phoneme in High-School Students. In this paper, graduate students detected the

problems with the production of the /s/ phoneme faced by high-school students. To

register what the nature of the problem was, they recorded students’ oral discourse

and transcribed it. These data were digitalized and uploaded in AntConc. The tool

allowed the researchers to determine in what gender mispronunciation of the /s/

phoneme is more frequent, as well as the percentages of error frequency. They were

also able to identify the phonic contexts for mispronouncing. This study gave

researchers a better and more precise description and understanding of the nature of

the problem and allowed them to think of the ways to potentially solve it.

Finally, let us consider the third paper titled The Lexis of Hip-Hop Songs. Hip-Hop

was chosen for analysis because of its popularity among students. The objective for

the program participants was to produce a glossary of the most common and frequent

words used in this musical genre. First a number of Hip-Hop lyrics was collected and

digitalized. Later, the documents were uploaded and processed with AntConc. Most

frequent words used in this genre were put on the list, included into a glossary, and

presented to the class in the form of a reference for unknown words.

Final Comments

The experience has demonstrated that CL is a tool that can be exploited in various

ways. It allows pre-service and in-service EFL teachers to prepare teaching materials

based on serious research, to detect learning problems, and include language

descriptions in the process of learning.

ReferencesBiber, D., Conrad, S. y Reppen, R. (1998). Corpus Linguistics: Investigating

Language Structure and Use. Cambridge: Cambridge University Press.Lüdeling, & M. Kytö. (2008). (eds.). Corpus Linguistics: An International Handbook.

Walter de Gruyter GmbH & Co. KG, 10785 Berlin, Germany.McEnery, T. y Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh

University Press.Meyer, C. (2004). English Corpus Linguistics: An Introduction. United Kingdom.

Cambridge University Press.Tognini-Bonelli, E. (2001). Corpus linguistics at work (Studies in Corpus

Linguistics: 6), Amsterdam/Atlanta, GA: John Benjamins.Zapata, C. & Márquez, Y. (2010). (comps.). Corpus Linguistics: An Insight. Selection

of Readings. Núcleo de Investigación para el Estudio y Enseñanza de Lenguas.Departamento de Idiomas Modernos de la UPEL-IPB. Barquisimeto – Venezuela.

corpus linguistics in tefl uses and benefits

Documents