presentation slides: french learner language oral corpora (flloc), myles and mitchell corpus

22
French Learner Language Oral Corpora (FLLOC) Fall 2016 | SLAT 596O JOAN PALMITER BAJOREK

Upload: arizona

Post on 10-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

French Learner Language Oral

Corpora (FLLOC)

Fall 2016 | SLAT 596O

JOAN PALMITER BAJOREK

Overview9 Corpora

Spoken French Learner Data

3 Million Words

Compiled in the UK

Dr. Florence MylesUniversity of Essex

Languages of Interest: French and SpanishDirector of the Centre for Research in Linguistics

and Language Sciences

Previously Worked at Newcastle University

Dr. Rosamond MitchellUniversity of Southampton

Languages of Interest: French and Irish

Founder: Centre for Applied Language Research

Objectives and IntroductionGoal:

“Our long term goal is to promote research relating to the acquisition of French as a second/foreign language, by providing access to a growing database of French Learner Language Oral Corpora” (Myles and Mitchell, 2016).

Variety/Genre: Spoken/Oral Corpora, Language Learner Data

Register: Academic/Informal

ConventionsTranscription Procedures: CHILDES project (MacWhinney, 2000)

Analysis Software Programs:

CHAT and analysis software known as CLAN,

Parts of Speech Tagging for French, CLAN MOR program (Christophe Parisse,

2000; C Parisse & Le Normand, 1997)

*P81: euh # c' est pas facile .%mor: co|euh pro|ce/ces&SING v:exist|être&PRES&3SV adv:neg|pas adj|facile .

Content & DescriptionsContents: (Myles & Mitchell, 2016)

· Corpus owners and the learners

· Information about the tasks used

· Transcription conventions used, explanations

· Transcript header descriptions

· Summary of the files contained in the database

· Organizing principles used for each dataset

LANGSNAP Corpus: study abroad longitudinal study (?)

Young Learners Corpus: n=72, ages 5-11, 6 elicitation tasks

Newcastle Corpus: n=45, ages 17-18, 6 elicitation tasks

Linguistic Development Corpus: n=60, ages 13-15, 4 elicitation tasks

Progression Corpus: n=60, ages 11-14, 2-3 elicitation tasks, varied

Brussels Corpus: n=150, aged 18, story narration

Reading Corpus: n=60, aged 16, oral interviews

Salford Corpus: n=12, ages 19-23, 4-10 elicitation tasks

UEA Corpus: n=32, ages 19-23, 58, 1 elicitation task

Publications 2002-201131 Journal/Book Publications from Corpora

90 Conference Presentations

2 Specifically Research Vocabulary:● DAVID, A. (2008). Vocabulary breadth in French L2 learners. Language Learning Journal, 36(2), 167-180.

● MARSDEN, E., & DAVID, A. (2008). Vocabulary use during conversation: a cross-sectional study of development from

year 9 to year 13 amongst learners of Spanish and French. Language Learning Journal, 36(2), 181-198.

All data is free to access and download. No registration. No subscription. No ads.

Wave, MP3, Transcription, Tagged, XML

What’s the catch?→ Weaknesses

Weaknesses: Disorganized Website

Weaknesses: Poor Search Tools

Weaknesses: Variation in tasks & collection methodologiesSalford Corpus

Weaknesses: Ultimately:

Need to feel

comfortable with data analysis

What’s really awesome?→ Strengths

Strengths: Large, free, and well documented data-base of corpora

Strengths: Many research and pedagogical implications

Strengths: Comparisons to the Spanish Learner Language Oral Corpus (SPLLOC)

FIN

Citations:MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk: Volume I: Transcription format and programs, volume II: The database. Computational Linguistics, 26(4), 657-657.

Mitchell, R. F., Domínguez, L., Arche, M. J., Myles, F., & Marsden, E. (2008). Linguistic development in L2 Spanish: creation and analysis of a learner corpus. EuroSLA Yearbook, 287-304.

Myles, F., & Mitchell, R. (2016a). French Language Learner Oral Corpora (FLLOC). Retrieved from http://www.flloc.soton.ac.uk/index.html

Myles, F., & Mitchell, R. (2016b). Spanish Learner Language Oral Corpus: SPLLOC. Retrieved from http://www.splloc.soton.ac.uk

Parisse, C. (2000). Automatic disambiguation of morphosyntax in spoken language corpora. Behavior Research Methods, Instruments, & Computers, 32(3), 468-481.

Parisse, C., & Le Normand, M.-T. (1997). Etude des catégories lexicales chez le jeune enfant à partir de deux ans à l'aide d'un traitement automatique de la morphosyntaxe. Paper presented at the Bulletin d'audiophonologie. Annales scientifiques de l'Université de Franche-Comté. Médecine & pharmacie.