spintx corpus-to-classroom: a teacher-centered pedagogical interface for the spanish in texas corpus

43
SPinTX Corpus-to-Classroom: A Teacher-Centered Pedagogical Interface for the Spanish in Texas Corpus Barbara E. Bullock, Almeida Jacqueline Toribio, Rachael Gilg, Martí Quixal & Arthur Wendorf

Upload: spanish-in-texas-project

Post on 13-Jun-2015

1.096 views

Category:

Education


0 download

DESCRIPTION

Presentation at CALICO 2013: Corpora provide a promising way of creating language learning materials that accurately depict languages, but corpus search interfaces typically aren't designed with this goal in mind. The SPinTX Corpus-to-Classroom project is developing a website for educators to search and adapt authentic video for the teaching of Spanish. This presentation will describe the main results to date: (1) a pedagogically friendly interface to search over 300 tagged video clips from the Spanish in Texas Corpus; (2) tools for educators to easily create lessons and activities based on the videos; (3) an open source model for developing video corpora for language learning.

TRANSCRIPT

  • 1. SPinTX Corpus-to-Classroom:ATeacher-Centered Pedagogical Interface forthe Spanish in Texas CorpusBarbara E. Bullock, Almeida JacquelineToribio, Rachael Gilg, Mart Quixal & ArthurWendorf

2. Who we are Barbara E. Bullock & Almeida Jacqueline Toribio Project Directors / Sociolinguistics Researchers Rachael Gilg Project Manager / Web Developer Arthur Wendorf Corpus Linguist / Developer Mart Quixal Computational Linguist / Developer Carl Blyth Director of COERLL2 3. Agenda Part 1: Introduction to the Corpus-to-Classroom Project Part 2: Project Results The SpinTX Video Archive: a pedagogically-friendly interface to theSpanish in Texas Corpus Involving teachers in the development of open educationalresources A model for open source corpus development3 4. Corpus-to-Classroom4 5. Corpora in the Classroom: the promise Corpus: a large, structured, collection of language Benefits: Naturalistic language use Motivation Real language Discovery learning Examples:5 6. Corpora in the Classroom: the reality Large linguistic corpora are of limited utility to untrainedend users. Designed for researchers, not educators. Collections such as YouTube are popular for languageclasses, but can present problems Searching for appropriate content is time-consuming usingavailable search methods. Content is not necessarily openly-licensed and can disappearwithout warning.6 7. Our two-pronged approachSpanish in Texas Corpus ProjectA project of COERLL, a National Foreign LanguageResource Center (2010-2014) Video interviews provide rich contentSpinTX: Corpus-to-Classroom ProjectGrant from the University of Texas LonghornInnovation Fund for Technology (2012-2013) Collection of pre-selected, corrected, annotatedclips from the larger corpus Open-source, pedagogically-friendly search andauthoring tools7 8. Spanish in Texas Corpus: Goals To make publically available authentic data aboutvariation in Spanish as spoken in Texas for education for research Encourage teachers/students/public to view localvarieties as a resource8 9. Corpus-to-Classroom: Goals develop a pedagogically friendly interface for usingthe Spanish in Texas corpus involve teachers and learners, via crowd-sourcing,social networking, and workshops, in thedevelopment of open educational resources create a model for using open source tools and apedagogical interface that can be adapted for anylanguage corpus collection9 10. Corpus OverviewSpanish in Texas corpus Approx. 92 videos of sociolinguistic interviews (avg.3045 min) Transcribed (approx. 600,000 words) Time-synced video caption files Tagged for linguistic featuresSpinTX Video Archive corpus Approx. 327 video clips from 33 speakers (avg. 1-4min) Transcribed (approx. 80,000 words) Time-synced video caption files Tagged for linguistic and pedagogical features Completely open (no registration required, open CClicense) Teacher-friendly interface10 11. Corpus Tagging: Basic Time-synced captions Part-of-speech tags (dual language) POS POS, simplified Gender Tense Aspect Mood Speaker identification Age Gender Region11 12. Corpus Tagging: Pedagogical Topics (manually added) Automatic tags using custom rulesets Grammatical aggregated from textbooks Pragmatics discourse markers, place holders (este), attenuators Vocabulary concept words Functional (planned) greetings, ask for help, express opinions Bilingual forms (planned) CS, loans, loan translations12 13. 13 14. Interview Metadata 15. Original Transcript (from Automatic Sync) 16. Upload Video and Transcript to YouTube 17. Review Transcript in Google Docs 18. Download SRT file 19. Prepare Transcript for TreeTagger 20. Run through TreeTagger 21. Combine Data from SRT File andTreeTagger File, and add additional Tags 22. Divide CSV Files and Videos into Clips andadjust Timings and Numberings 23. The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus23 24. Needs assessment: teacher interviews How do you use authentic video in your teaching? Describe searches you have done in the past for videocontent. What were you looking for and were you able tofind it? How can you imagine using clips from the Spanish inTexas video corpus in your classes?24 25. Needs assessment results: primary goals Enable teachers to easily videos that suit thecurriculum/work plan Search by grammar, theme, vocabulary, etc. Provide open, non-ephemeral content Downloadable from open site with a license enabling remixing Curating sets of videos for comparison and study Favoriting and tagging videos Provide access to supporting materials. Creating a community of practice around the videos so materialscan be shared among educators.25 26. Needs assessment results: secondary goals Materials for teacher trainers Teachers of heritage learners can learn about local variation Video recording as a cross-competence task Interviews collected by students can be contributed to the corpus26 27. 27 28. Ideas for future development Advanced search capability support for wildcards improved phrase searching improved keyword in context result view Data visualizations word and/or tag clouds language maps Enhanced word-level annotations hover over a word in a transcript and see all annotations28 29. Formative evaluation of Beta versionData collection methods: Online user survey Web analytics (navigation patterns, popular content) Search analytics User observation and feedback through ongoingworkshops and focus groupsResults will drive future development of the interface.29 30. Involving Teachers in theDevelopment of OER30 31. Workshops with Educators Summer 2012 Workshop ~100 secondary and college Spanish teachers Fall 2012 Working Group ~10 Univ. of Texas Spanish teachers Spring 2013 Workshops Multiple conferences & Univ. of Texas Spanish teachers Summer 2013 Working Group ~10 secondary and college Spanish teachers31 32. Sample materials from the community (1)32 33. 33 34. Sample materials from the community (2) Idea from teacher workshop: Use videos for grammarlessons to develop the students metalinguistic and criticalthinking skills as they pertain to language. Searched and selected clips for lesson on por vs. para. Lesson tested in heritage learners class. Anecdotal evidence that video lessons were effective andmotivating to students.34 35. Template development ideas Using video clips from the SpinTX video archive, createan activity for classroom use (at any level). Focus on Topics: Familia, Idioma, Identidad Focus on Grammar: Por vs. Para, Gustar, Ser vs. Estar Four steps Predict: Before watching Observe: While watching Discuss: After watching Produce: Follow-up activity35 36. Publication of OER Community-developed lesson plans will be available onthe SpinTX website by August, 2013 We encourage the publication of videos on third-partyplatforms for remixing educational content, such as TedEd(http://www.ed.ted.com)36 37. A Model for Open SourceCorpus Development37 38. Open source development Open Source Software TreeTagger (part-of-speech tagger) Drupal Open APIs YouTube Captioning API Google Fusion Tables API Custom code developed for the project Freely available in our GitHub repository: http://github.com/coerll38 39. Enable sharing of content and data With educators: SpinTX interface allows embedding, downloading, & social sharingof videos and transcripts. With researchers: Source tagged data in our GitHub repositoryhttps://github.com/coerll/SpinTXCorpusData Documentation of data in our GitHub wikihttps://github.com/coerll/SpinTXCorpusData/wiki39 40. Open content licenses Creative Commons provides licenses for OpenEducational Resources We use CC BY-NC-SA (Attribution, Non-Commercial, Share-Alike)40 41. Open Project Documentation Research protocols, development processes andmethodologies, and other project documentationpublically available: Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-classroom/ For Researchers page onspanishintexas.orghttp://spanishintexas.org/for-researchers/41 42. Questions42 43. Links SpinTX Video Archive:http://www.spintx.org Spanish in Texas Corpus:http://www.spanishintexas.org43