an exploratory corpus study of the ap spanish
TRANSCRIPT
Steven SaffelsApril 2014
An exploratory corpus study of the AP Spanish Exam
An exploratory corpus study of the AP Spanish Exam• The impression of the language used on the AP
Spanish Exam is that it primarily consists of lexically rich but grammatically simple text.• Vocabulary – relatively specialized for specific topics• Mostly of simple sentences and relies on noun phrase
modification• 86% of all verbs are in present, past, and infinitive forms.• Recurrent formulaic expressions are used to introduce
source texts.
Introduction: What is a corpus study?
Corpus-based methodologies
• (Anthony, 2011)
• Corpus research:• Uses a computer program
called a concordancer• analyze key words,
phrases & parts of words• in a large, representative,
computerized collection of texts, called a corpus.
(O’Keefe, McCarthy & Carter 2007)
• Allow very extensive, systematic and descriptive data(De Kock 2001)
Research gap• Relatively few
corpus studies in languages other than English
(Parodi, 2007)
• “Gap” between corpus-based research results and pedagogical practice
(Cortes 2013)
Goal of this studyThe present study aspires to: help redress both
the lack of corpus research in Spanish and the gap between research and practice by applying corpus methodologies to a pedagogical problem from a Spanish L2 classroom:
How to best prepare high school students for success on a high-stakes, skills-based exam of proficiency in Spanish.
Research Questions• RQ1: How representative is the AP Spanish Exam of
broader usage of Spanish? Specifically, in terms of:• Vocabulary• Parts of speech• Verb forms
• RQ2: What are the most frequent recurrent word combinations? • What are the salient 3-, 4-, 5-, or 6-grams used on the exam? • Are there any salient tendencies in n-gram use?
• RQ3: Are the “transition phrases” suggested by a popular test-prep book used frequently on the exam?
Advanced Placement Spanish Exam• Year-end, Skills-based exam• No vocabulary or grammar specifications• Students must use information from authentic
texts to:• Write a personal letter• Compose a synthesis essay• Respond orally to a simulated conversation• Make an oral synthesis presentation
(College Board, 2008-2013)
The AP Corpus• Total of 10 texts
• 18,333 word tokens
• Most of the text is from the articles and radio reports used as sources for the presentational writing and speaking exercises.
A Frequency Dictionary of Spanish (Davies 2006)• List of the 5,000 most
frequent words in Spanish
• Based on a subset of the 100-million-word Corpus del Español (CDE)
(Davies 2002-)
• Balanced, representative corpus:• Spoken/Written• Latin America/Spain
Lexical Analysis
Lexical Analysis: “Absent” Verbs• 30% of the top 300 words in Davies’ (2006) do not
appear on the AP word list.• Of those, 41% are verbs, including many core vocabulary
items for lower-level Spanish classes:PONER (PUT)LLAMAR (CALL)VENIR (COME)SALIR (LEAVE)VOLVER (RETURN)VIVIR (LIVE)MIRAR (LOOK)EMPEZAR (BEGIN)ENTRAR (ENTER)ENTENDER (UNDERSTAND)
PEDIR (REQUEST)RECIBIR (RECEIVE)TERMINAR (FINISH)SACAR (TAKE OUT)NECESITAR (NEED)LEER (READ)ABRIR (OPEN)
Lexical Analysis: “Absent” Nouns• General
Nouns:• COSA (THING)• HOMBRE (MAN)• MUJER (WOMAN)• MODO (WAY)• RELACIÓN
(RELATIONSHIP)
• Body Parts:• MANO (HAND)• OJO (EYE)
• Human Relations:• HIJO (SON)• SEÑOR (MISTER)• MADRE
(MOTHER)• NOSOTROS (WE)• NADIE (NOBODY)
• Religion:• VERDAD
(TRUTH)• SANTO (HOLY)• DIOS (GOD)
• Time/Space• PUNTO (POINT)• LADO (SIDE)• NOCHE (NIGHT)• PRINCIPIO
(BEGINNING)• PUEBLO (TOWN)
Lexical Analysis: “Absent” Adjectives• Several of the generally common adjectives that
are missing from the AP Corpus frequency list are typically pre-modifiers.• AQUEL (THAT)desde aquel día (from that day)• TAL (SUCH) hacerlo de tal manera (to do it in
such a way)• PROPIO (OWN) tiene su propio estilo (has his own
style)• NINGÚN (NONE) no hay ningún problema (there’s no
problem)• CUALQUIER (ANY) puede hacer cualquier cosa (can do
any thing)• ÚNICO (ONLY) ¿Usted es el único hijo? (You are the only
son?)
Lexical Analysis: Unusually frequent terms• Terms used to introduce source texts for the presentational
writing and speaking activities• Not extremely salient for the student taking the exam—
referential information—not necessary for interpreting the texts
• May be helpful to guide students in quickly selecting appropriate strategies to make the most efficient use of time
FUENTE (SOURCE)DIARIO (NEWSPAPER)INFORME (REPORT)ARTÍCULO (ARTICLE)APARECER (APPEAR)
RADIO (RADIO)EMITIR (BROADCAST)SIGUIENTE (FOLLOWING)TITULADO (TITLED)TEXTO (TEXT)CONVERSACIÓN
(CONVERSATION)IMPRESO (PRINTED)PERIÓDICO (NEWSPAPER)ADAPTACIÓN (ADAPTATION)GRABACIÓN (RECORDING)
Lexical Analysis: Example of bicicleta
Lexical Analysis: Thematic Vocabulary• Geography: país (country), mundo (world), ciudad (city), español (Spanish),
lengua (language), mundial (worldwide), idioma (language), estado (state)
• Environment: cambio (change), climático (climate), invierno (winter), oso (bear), ave (bird), combustible (fuel), nieve (snow), calentamiento (warming)
• Wellbeing: agua (water), físico (physique), salud (health), organismo (body), risa (laughter), peso (weight), alimento (food), kilómetros (kilometers)
• Technology: computadora (computer), internet (Internet), digital (digital), electrónico (electronic), red (network), tecnología (technology), virtual (virtual)
• Fine arts: arte (art), música (music), orquesta (orchestra), artista (artist), producción (production), pintura (painting), músico (musician), lienzo (canvas)
• Education: educación (education), niño (child), joven (young person), calidad (quality), escuela (school), estudio (study), alumno (student), clase (class)
Grammatical Analysis: Word Class
Grammatical Analysis: Word Class
WORD CLASSAP TOKENS %
CDE TOKENS %
PREPOSITIONS 3,079
22.79% 5,553,520
24.72%
ARTICLES 2,54418.83
% 4,643,03920.67
%CONJUNCTIONS 1,371
10.15% 3,781,609
16.83%
PRONOUNS 608 4.50% 2,046,356 9.11%VERBS 1,213 8.98% 1,928,260 8.58%ADVERBS 566 4.19% 1,764,952 7.86%COMMON NOUNS 2,435
18.02% 1,459,968 6.50%
ADJECTIVES 1,294 9.58% 614,069 2.73%PROPER NOUNS 296 2.19% 365,057 1.62%NUMERALS 100 0.74% 246,519 1.10%INTERJECTIONS 5 0.04% 64,277 0.29%
TOTAL 13,511 100%22,467,62
6 100%
Grammatical Analysis: Register in CDE
WORD CLASSACADEMIC NEWS
FICTION ORAL
COMMON NOUNS 241,116
222,729
209,619
169,680
PREPOSITIONS 162,221
156,788
132,089
118,329
ARTICLES 153,504139,48
4124,27
2105,91
0
VERBS 116,308136,35
9187,78
8183,30
6ADJECTIVES 90,953 72,177 58,305 50,667CONJUNCTIONS 72,917 76,856 97,745
116,953
PROPER NOUNS 53,161 57,932 22,147 28,177ADVERBS 27,754 37,160 55,152 79,902PRONOUNS 24,821 32,464 65,805 73,150NUMERALS 6,125 8,705 5,426 9,434INTERJECTIONS 93 286 818 8,134
Grammatical Analysis: Verb Forms
VERB FORM AP % CDE %
PRESENT 74761.58
%1,190,97
137.52
%
INFINITVE 21117.39
% 459,89014.49
%
PRETERITE 12610.39
% 386,21812.17
%
IMPERFECT 27 2.23% 443,18213.96
%PAST PARTICIPLE 53 4.37% 259,488 8.18%GERUND 4 0.33% 107,727 3.39%CONDITIONAL 13 1.07% 57,225 1.80%FUTURE 12 0.99% 67,040 2.11%SUBJUNCTIVE-PRESENT 17 1.40% 126,093 3.97%SUBJUNCTIVE-PAST 3 0.25% 73,073 2.30%SUBJUNCTIVE-FUTURE 0 0.00% 3,141 0.10%
TOTALS1,21
3 100%3,174,04
8 100%
Beyond the word: Lexical Bundles & N-grams
Beyond the word: N-grams Structure & Function?-Gram
Freq
Range N-Gram English Structure Functio
nSubcategory
6 10 5apareció en el sitio de internet
appeared on the website
Verb Phrase fragment
Referential Intangible
framing4 15 8 este artículo
apareció enthis article appeared in
Verb Phrase fragment
Referential
Intangible framing
3 20 10 artículo apareció en
article appeared in
Verb Phrase fragment
Referential
Intangible framing
3 19 7 apareció en el appeared on the
Verb Phrase fragment
Referential
Intangible framing
3 14 5 el sitio de the site ofNoun Phrase fragment
Referential Intangible
framing
3 14 5 en el sitio on the site Prep Phrase fragment
Referential
Identification/Focus
3 14 5 sitio de internet internet site
Noun Phrase fragment
Referential
Identification/Focus
3 11 10 informe de la report from the
Noun Phrase fragment
Referential
Identification/Focus
3 11 5 se presentó en was presented on
Verb Phrase fragment
Referential
Intangible framing
Beyond the word: Lexical bundle• Lexical Bundle – an N-gram that occurs a certain
number of times acros a certain number of texts in a corpus • Cut-off numbers determined by the type of corpus and the
length of N-gram• Based on these criteria, the six-word expression
apareció en el sitio de internet (appeared on the website) can be considered a lexical bundle for this corpus.
Beyond the word: Salient N-Grams• Empirically identified, frequency-based
expressions which could be salient for the examinee and therefore useful for interpreting the texts:• todo el mundo (the whole world)• a través de (throughout)• por ciento de (percent of)• una de las (one of the)• se trata de (is about)• cuál es el (what is the…?)• de enero de (of January of)• de noviembre de (of November of)• en la ciudad (in the city)
Transitions
Transitions: AP Spanish: Preparing for the Language Examination• One of the most
popular textbooks for the AP Spanish course.
• Contains exhaustive list of transition words and phrases
• Very few of these appear in the AP Corpus
Transitions: FrequencyTRANSITION English FREQ TRANSITION English FREQ
que that 565 entonces Then 8y and 482 sin embargo However 8
como like, as 84 mientras While 7o or 57 o sea that is 7
pero but 49 ya que Since 7también also 34 al + inf upon + -ing 5
si if 33 sino but rather 5cuando when 30 a partir de as of 4porque because 26 como si as if 4durante during 18 luego later, then 4según according
to 17 primero first 4además in addition 14 sino que but rather 3para que so that 12 tampoco neither 3
por ejemplo
for example 11 una vez que once 3
sobre todo above all 11 tanto… como…
as much… as… 3
aunque although 9
As we have seen…• The impression of the language used on the AP
Spanish Exam is that it primarily consists of lexically rich but grammatically simple text.• High frequency of relatively obscure & specific
vocabulary items;• Many common “general” vocabulary items are missing• Texts consist mostly of simple sentences with few
conjunctions• Communication relies on noun phrase modification—
academic register• 83% of all verbs in present, infinitive or preterite forms.• Recurrent word combinations are primarily used to
introduce source texts.
Pedagogical implications: Vocabulary• In order to successfully interpret the tasks on the
AP Spanish Exam, students must possess a broad vocabulary that is strongly rooted in, but extends well beyond, the most frequent lexical items in the language.
• An AP student’s vocabulary should include a variety of synonyms, especially a wide range of nouns related to specific themes that express concrete entities and abstract concepts.
Pedagogical implications: Grammar & Discourse• Present, Preterite & Imperfect tenses along with
the Infinitive account for:• 86% of all verbs in the AP Corpus• 78% of all verbs in the Corpus del Español
• The most important grammatical focus for the AP class might well be that of the noun phrase.
• Complex verb tenses should not be the organizing factor for an upper-level Spanish curriculum
Selected References:• Anderson, N. J. (2014). Developing Engaged Second Language Readers. In M. Celce-Murcia, D. M. Brinton,
& M. A. Snow (Eds.), Teaching English as a Second or Foreign Language. 4th ed. (pp. 170-188). Boston: Heinle Cengage.
• Anthony, L. (2011). AntConc (Version 3.2.4w) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/
• Biber, D., Johansson, S., Leech, G., Conrad, S., & Finnegan, E. (1999). Longman Grammar of Spoken and Written English. Essex, England: Longman.
• College Board. (2008-2013). AP Spanish Language Exam: Free-Response Questions. Retrieved from http://apcentral.collegeboard.com/apc/public/courses/teachers_corner/221848.html.
• Cortes, V. (2013, January). Waiting for the revolution. Plenary talk presented at the Conference for the American Association of Corpus Linguistics (AACL), San Diego, California, USA.
• Davies, M. (2002-). Corpus del Español: 100 million words, 1200s-1900s. Available online at http://corpusdelespanol.org.
• Davies, M. (2006). A Frequency dictionary of Spanish: Core vocabulary for learners. New York: Routledge. • De Kock, J. (2001). [Preface]. In J. De Kock (Ed.), Gramática española: Enseñanza e investigación (Vol. 7.
Lingüística con corpus). (pp. 7-8). Salamanca: Ediciones Universidad de Salamanca.• Díaz, J. M. (2014). AP Spanish: Preparing for the Language and Culture Examination. Boston: Pearson
Education.• Parodi, G. (2007). Catching up with corpus linguistics: Register-diversified studies from different corpora in
different Spanish-speaking countries. In G. Parodi (Ed.), Working with Spanish Corpora. (pp. 1-10). New York: Continuum.
• Tracy-Ventura, N., Cortes, V., & Biber, D. (2007). Lexical bundles in speech and writing. In G. Parodi (Ed.), Working with Spanish Corpora. (pp. 217-231). New York: Continuum.