1 tom cobb université du québec à montréal didactique des langues ddl for french learners: a...
DESCRIPTION
3 Some personal stuff Is my presence here a bit fraudulent? > Not a corpus linguist > Not a French linguist or teacher > Not «au courant » about French educ. Not a v. good French speaker Just an applied linguist with amateur programming ability and a belief that corpus is a useful tool in learning and teaching - Whose class notes became a best-seller - Whose class website became must-clickTRANSCRIPT
1
Tom [email protected] Université du Québec à Montréal Didactique Des Langues
DDL for French learners: A resource wish-list
8-10 Sept 2011Association for French Language StudiesColloque annuel - Nancy
2
OR – What we’ve got + what we still need
to make DDL 1.0 in French
3
Some personal stuffIs my presence here a bit fraudulent?
> Not a corpus linguist> Not a French linguist or teacher> Not «au courant » about French educ.
Not a v. good French speaker
Just an applied linguist with amateur programming ability and a belief that corpus is a useful tool in learning and teaching
- Whose class notes became a best-seller- Whose class website became must-click
4
5
6
7
A huge pile of tutorial & language analysis tools
> But at least all in one place> And do not require “administrative
privileges” to access in a lab> Works on most of the various browser x
platform combinations
Trying to keep it all together is like riding a whirlwindBut one does make an effort…
8
9
10
So, good « concordanciers » all,
We instantly extract the pattern :
1. Pretty much a Nordic sport
2. When Europe sleeps> N. America lextutors
3. China never sleeps
11
12
But what do all those energetic folks “do” on Lextutor?
Site Stats are pretty bareBut we get some idea
from their itineraries
15
PATHWAYS 3
16
PATHWAYS 4
17
PATHWAYS 5
18
PATHWAYS 6
19
PATHWAYS 7
20
PATHWAYS 8
21
PATHWAYS 9
22
PATHWAYS 9a
23
PATHWAYS 10
24
PATHWAYS 11
25
PATHWAYS 12
26
PATHWAYS 13
27
PATHWAYS 14
28
PATHWAYS 15
29
PATHWAYS 16
30
PATHWAYS 17
31
32
> Some of it random> Some of it inexplicable> Some of it makes sense
So we combine the well motivated, well-worn paths
33
34
My only experience of French education was…
35
Young Americans at the Sorbonne learned…
le subjonctif du passé historique<< que j’eusse >> … etc
… who could not order breakfast… who did not know 300 basic words… who had never said or understood a full
sentence
A STRONG ARGUMENT FOR SOME
“COMMUNICATIVE METHOD”! + some small awareness of frequency
36
But as we now know…COMMUNICATIVE METHOD had its own problems
Top end:> Tendency to plateau
Lower end:> Missing HF vocab + verb tenses> Non-grammaticized multiword units> No sense of “language as object”
ENTER FOCUS ON FORM and LANGUAGE AWARENESS
37
I have always wondered why…
TEXT COMPUTING + FonF / LA have not seemed a more obvious link
The computer since about 2005
- commandeered as “just another means of communication”
With all the limitations of the earlier communicative era?
My job has become to make and sell this link
38
Some underlying assumptions
Much interesting research in applied linguistics makes extensive use of a language technology
Few learners at any level will have any major experience of this
Most language technologies used in research are never “pedagogicalized”
Many easily could be
39
Some underlying assumptions (2)Many interesting language technologies
are hard to get your hands on
Most can be reverse engineered
A corpus is not just something techy to keep the stronger students busy
Rather it is a necessary tool in SLA
40
Some underlying assumptions (3)Unless Chomsky was right
Language acquisition depends on inputAt least in L2 for post-adolescents
But patterns in natural input are- fragmentary, distributed, imperceptible
Requiring 15 years - to form via osmosis
Successful SLA can only occur- with some sort of data assembly + compression
The best form of this is ~ a corpus + way to query it (concordance)
41
Some underlying assumptions (4)But a pedagogical corpus is not necessarily the same
as the computational linguist’s
Corpus as a learning tool ~Need not be enormous
As in language pedagogy generally, “Do more with less”WITH PROGRAMMING + IMAGINATION + USER fB
Need not be taggedIt’s the learner’s task to parse surface structure input
42
Some underlying assumptions (4)
Corpus as a learning tool ~
Need not follow a sampling principleFrequency may be more useful
(“This corpus is 95% first thousand level words…”)Or “Our Course Materials” Or “The words of a Author X” etc
Need not pass through the education establishmentWith the Web can reach learners directly
43
Some underlying assumptions (5)
Corpus as a learning tool ~
Is probably the only ground between the extremes of Hot Potatoes exercises and pricey R+D in “intelligent” iCALL …
that can serve as a framework for interesting, practical, real-world CALL development
44
Some underlying assumptions (6)
Corpus as a learning tool ~
Is probably the only way of developing a truly multilingual CALL
Just a question of finding or making comparable corpora in different languages
- and passing them through the samealgorithms
45
Is there any proof for any of this?
46
Some research findings for pedagogical concordancing
Deep vocabulary knowledge can result from multi-contextual encounters with words in a concordancing activity
Breadth & depth of vocabulary learning can be reconciled with concordancing
47
Some research findings for pedagogical concordancing (2)
A plausible case for constructivism can be constructed within a DDL/ concordancing framework
The value of collaborative learning in SLA can be shown in co-constructed learner concordances
48
Some research findings for pedagogical concordancing (3)
Concordance feedback to learner writing via embedded links (a) works well, (b) helps some learners, ( c) hurts none
Many functions of an NS “reading buddy” can be realized by “resource-assisted reading” esp. text-linked concordancing
49
Some research findings for pedagogical concordancing (4)
…the reading buddy
AND, LINKED TO GOOGLE BOOKS, CAN DELIVER “THE TEACHER WITH THE TEXT” so that African universities need not build $$$ libraries $$$
50
BUT the multi-languages idea?
51
52
53
What can we do for our learners with all these corpora?
54
55
56
57
58
59
60
61
62
63
64
65
So what else is needed for a French DDL?
66
Wish List 1 A resident 3-5 million word general French corpus
67
68
Some of this may be a “familization” problem
The word exists in a 1 million word corpus, just not in a particular form
69
Following Nation’s heroic cracking of the 100 million-word BNC into 14 k-family units…
Many things became possible on Lextutor
Especially with smallish corpora
70
71From Wang & Nation (2004), based on Bauer & Nation (1993)
72
73
1k 80.17%2k 5.65%AWL 1.68%
Off 12.50%
GSL + AWL
BNC LISTS (N, 2006)
BNC +proper noun
auto-extract (C+L, 09)
74
75
76
77
78
Wish List 2. A complete, familized
French frequency list from a big corpus
Or at least lemmatized
79
With such a list we can offer learners…
A summary of the whole lexicon of a language
+ a plan for getting acquainted with it
80
81
82
83
Of course, for learning basic vocabulary items
(1k, 2k, 3k)Contextual learning from a “real” will not work
84
Build comprehensible corpus
85
Wish List 3. Graded French corpora
Of at least 1 million words
Ideally at 1k, 2k, 3k levelsSo we can say,
“For this learner, 95% of items in this corpus are
comprehensible”
86
And… so far all this only deals with
individual words
87
Some Multi-Word Units independent, non-compositional meaning
are so frequent that… they are actually 1st and 2nd 1000 items
E.g., learners will meet “of course” More frequently than 2k item “window”
505 of these belong in the most frequent 5,000 (>10%) Schmitt & Alvarez, Nottingham
However… We have now discovered
*the V.H.F Multi-Word*
88
89
90
Wish List 4. A list of the VHF idiomatic
multi-words in French(to incorporate in Wish-List item 3)
91
And when all that is done ~The frequency list can be
adjusted for homographs
« Les poules du couvent couvent » « Tu as l’as de pique » « Les vers marchent versAvignon »
A combined frequency rating based on word form?
English wish list contextually sensitive word lists (DDL 2.0)
92
Wish List Résumé1. A mid-size general corpus of French for DDL web work (3 million-ish)
2. A graded corpus for DDL web work
3. A complete, familized frequency list from a big corpus
4. Identification of HF multiwords for eventual inclusion in 3
93
Suite…Software
lextutor.ca/
Paperslextutor.ca/cv/
Anything else> Find me here> Write me at
94
95
Technology and Language Testing
Corpus-Based Testing Parallel Concordancing Analyzing Speech Corpora Concordancing Language Teacher Training
in Technology Language Trainer Training
in Technology Technology and Listening Computer-Assisted
Language Learning Effectiveness Research
Learner Modeling in Intelligent Computer-Assisted Language Learning
Intelligent Computer-Assisted Language Learning
Natural Language Processing in Intelligent Computer-Assisted Language Learning
Learner Corpora Automated Speech Recognition Technology and Discourse
Intonation Computer-Assisted Vocabulary
Load Analysis Technology and Usage-Based
Teaching Applications Information Retrieval for
Reading Tutors Online Communities of Practice Emerging Technologies for
Language Learning Lexical Bundles Technology and Phrases Technology and Teaching
Writing Latent Semantic Analysis
96
Mobile Assisted Language Learning
Technology and Phonetics Computer-Mediated
Communication and Second Language Use
Computer-Mediated Communication and Second Language Development
Multimodal Computer-Mediated Communication and Distance Education
Distance Language Learning Massively Multiplayer Online
Games Digital Divide Technology and Teaching
Vocabulary Exporting Applied Linguistics
Technology Monolingual Lexicography Bilingual Lexicography Lexicography Across Languages Internet and World English
Searchlinguistics Internationalization and
Localization Translation Terminology Technology and Literacy Corpora and Literature Keyword Analysis Connectionism Text-to-Speech Synthesis
Development Text-to-Speech Synthesis
Research Lexical Priming Technology and Culture Computer-Assisted
Language Learning and Machine Translation