an annotated spanish corpus for corpus-based call in professional contexts
DESCRIPTION
An annotated Spanish corpus for Corpus-based CALL in professional contexts. María Sánchez- Tornel Pascual Pérez-Paredes José M. Alcaraz Calero. Authenticating Language Learning: Web Collaboration Meets Pedagogic Corpora February 17-19, 2011. University of Tübingen. OUTLINE :. - PowerPoint PPT PresentationTRANSCRIPT
An annotated Spanish corpus for Corpus-based CALL in professional contexts
María Sánchez-Tornel
Pascual Pérez-Paredes
José M. Alcaraz Calero
Authenticating Language Learning: Web Collaboration Meets Pedagogic Corpora February 17-19, 2011. University of Tübingen
OUTLINE:
1. Corpora in FLT• Background• Our proposal
2. The Backbone project
3. The Spanish subcorpus• Features• Our approach• Corpus compilation• Pedagogic enrichment• Corpus exploitation
4. Conclusion
1. Corpora in FLT
Background
1. Scant scholarly attention
Tertiary education Non-tertiary education0
102030405060708090
10067
9
Number of empirical studies (1991-2010)
CONTEXT
Source: Boulton (2010)
CAUTION!
2. Different contexts different students with different objectives, abilities and needs
Advanced corpus users
Research purposes
Translation
L2 learning at university
Novice corpus users
L2 learning at high school
CLIL
Novice corpus users
L2 learning at high school
CLIL
CLIL – Content and Language Integrated Learning
CLIL
Using the
language to learn
Learning to use
the languag
e
Coyle (2007)
CLIL – Content and Language Integrated Learning
Eurydice (2006:51)
OBSTACLES
Legislation
Qualified staff
Financial restrictions
Materials
Content
Language
Our proposal
1. Scant attention2. Learners’ profile
3. CLIL obstacles (materials)
PEDAGOGIC
CORPORA
Why pedagogical?
Large, general corpora
Novice users
Photo: Kordite, Flickr
Homogeneous and systematic
Thematic relevance
Recontextualisation Authentication
Easy to use query tools and search optionsBraun (2006)
Past initiatives
ELISA
English Language Interview Corpus as a Second-Language Application
• 2003 – 2004
• 25 video interviews in English
•5-15 minutes per interview
• 60.000 words
• Search interface
• Learning materials
• 2005 – 2008
• Video interviews in 7 EU languages - Teen talk
• Corpus compilation and exploitation tools
• Learning materials
• Corpora + tools freely available
2. The project
Lesser taught languages
DIY approach – small spoken corpora
Non-standard regional varieties
Non-native varieties of ELF
CLIL settings: vocational training, secondary education, university.BLENDED LEARNING PRINCIPLES
Learner centeredness
Relevant topics
Connection to CEF progression
ICT implementation
Free online corpora + materials
Moodle integration
Language authentication
Small homogeneous
corpora
Pedagogically relevant topics
MultimodalityFull texts – sections –
concordances
Data-driven learning but…
… pedagogically selected,
annotated and enriched data
3. The Spanish subcorpus
Features
Size
25 interviews 53000 words
300 minutes of video recordings
Speakers from 9 different provinces
Regional varieties
Northern and southern accents
Cultural issues
Topics
Science and research
The environment
World of work Social issues
Economic issues
Healthcare and social security
Government and Politics
EducationUrban and rural life
Our approach
COLLABORATIVEANNOTATION
TRANSCRIPTION
MATERIALS DEVELOPMENT
Corpus compilation
Speaker selection
Age range: 18 - 83
9 provinces
Diverse professional fields
doctor
teacher
archaeologist
sportswoman
ex-lawyer
confectioner
bio-farmertop researcher entrepreneur
Transcription
Orthographic
TEI-compliant markup
<trunc> </trunc>
<unclear> </unclear>
<break/>
<foreign> </foreign><alternative> </alternative>
Backbone Transcriptor
Transcribing and sectioning
Supports metadata information
Video formats: DIVX, XVID,AVI,MPEG, Quick Time, RM,
Audio formats: MP3, WAV, ASF
Timestamping audio-text
Pedagogic enrichment
Step 1: Annotation
Photo: J_O_I_D
Pedagogic annotation
Unit-bound not
text-bound
Pérez-Paredes (2010)
Pedagogic annotation
Teacher-driven &
Learner-oriented
Pérez-Paredes (2010)
Pedagogic annotation
Backbone Annotator
TEI-compliant XML
Drag & drop
Edit options in XML
Manages several corpora
Integrated with Transcriptor and Search Tool
Backbone annotator
Corpus Management
Tool
Collaborative
annotation
Corpus Management Tool
CMT
Annotator 1Bob
Annotator 2 David
Annotator 3
Helen
Annotator 4 Hugh
Annotator 5
Jane
Sánchez-Tornel et al. (Forthcoming)
CORPORA OUTPUT
Backbone Search
Tool
Step 2: Materials development
Photo: the waving cat
Thematic relevance
Two types of materials:- Learning modules
- Corpus-based communicative and exploratory activities
The Virtual Resource Pool
Integration in Search Tool
Learning modules
19 modules – 107 activities
Telos Language Partner
1 section – 1 module – several activities
Comprehension & focus on form
Sample module: Science and society
Learning modules
Comprehension activities
Learning modules
Multiple choice: comprehension
Learning modules
Fill in the gaps
Learning modules
Multiple choice: vocabulary
Learning modules
Matching: idiomatic expressions
Learning modules
Production: word order
Learning modules
Exploratory and communicative activities
10 packages – 93 activities
Corpus exploration:- lexis-grammar
- communication
Lexi
s an
d gr
amm
ar
Com
mun
icati
on
Online integration of learning materials
Learning modules and C&E packages linked to interview sections
The Virtual Resource Pool
The Virtual Resource Pool
Corpus exploitation
The learning space
Access to corpora + learning materials
Four search modes: - Browse
- Section search- Concordances- Co-ocurrences
Wordlists
The Search Tool
The browse mode
The section search mode
The section search mode
The co-occurrences search mode
The concordances search mode
The word lists view
Moodle integration
4. Conclusion
Previously on...
Scant scholarly attention in non-tertiary education settings
CL methods and tools
Pedagogic mediation!
Where we are now…
From the possibilities scenario
the feasibility scenario
Corpora can be exploited in the
language classroom in different ways
FLT
Language research-oriented
paradigm
CL Methods Sampling
RepresentativenessMorphological
tagging
The possibilities scenario
Alcaraz & Pérez-Paredes (2008)
Corpora are devised to be exploited in the language
classroom by language learners in different ways FLT
Language learning-oriented
paradigm
Mediation roleTheory-informed taggingParametric framework-
awareLearner-oriented tagging
Representative of the world of the learner
The feasibility scenario
Alcaraz & Pérez-Paredes (2008)
References• Alcaraz, J.M. & Pérez-Paredes, P. (2008). What do annotators annotate? An analysis of language teachers' corpus pedagogical annotation. In A. Frankenburg-Garcia (Ed) Proceedings of the 8th Teaching and Language Corpora Conference. Lisbon, Portugal: Associação de Estudos e de Investigação Cientifíca do ISLA-Lisboa, (p. 27-37).
• Boulton, A. (2010). Learning outcomes from corpus consultation. In M. Moreno Jaén, F. Serrano Valverde & M. Calzada Pérez (Eds), Exploring New Paths in Language Pedagogy: Lexis and corpus-based language teaching. London: Equinox. Expanded web supplement available at http://arche.univ-nancy2.fr/file.php/967/DDL_empirical_list.pdf
• Braun, S. (2006). ELISA – a pedagogically enriched corpus for language learning purposes. In S. Braun, K. Kohn & J. Mukherjee (Eds.), Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods (pp. 25-47). Frankfurt/M.: Peter Lang.
• Coyle, D. (2007). Content and Language Integrated Learning: Towards a connected research agenda for CLIL pedagogies. The international journal of bilingual education and bilingualism 10(5), 543-562
• Eurydice Report. (2006). Content and Language Integrated Learning (CLIL) at School in Europe. Retrieved February 2011 from http://eacea.ec.europa.eu/eurydice/ressources/eurydice/pdf/0_integral/071EN.pdf
• Pérez-Paredes, P. (2010). Appropriation and integration issues in corpus methods and mainstream language education. In T. Harris & C. Pérez Basanta (Eds.), Corpus Linguistics and Language Teaching. Linguistic Insights Series. Berlin: Peter Lang
• Sánchez-Tornel, M., Alcaraz Calero, J.M. & Pérez-Paredes , P. (in press). Collaborative annotation in implementing corpora for content and language integrated learning web services. Proceedings of the 2009 Eurocall Conference: New trends in CALL – Working together. Madrid: Macmillan ELT
Acknowledgements
The authors gratefully acknowledge the funding provided by the European Commission. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. BACKBONE 143502-2008-LLP-DE-KA2-KA2MP
María Sánchez-Tornel gratefully acknowledges the support provided by the University of Murcia under its PhD Scholarships Programme (R.-549/2009).
María Sá[email protected]
Pascual Pé[email protected]
José M. Alcaraz [email protected]
Authenticating Language Learning: Web Collaboration Meets Pedagogic Corpora February 17-19, 2011. University of Tübingen
Thank you!
www.um.es/backbone
http://u-002-segsv001.uni-tuebingen.de/backbone/moodle/