tools and resources (not only) for french, italian and spanish thomas koller nclt seminar series,...
TRANSCRIPT
![Page 1: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/1.jpg)
Tools and resources (not only) for French, Italian and Spanish
Thomas Koller
NCLT seminar series, 22.11.2005
![Page 2: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/2.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 3: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/3.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 4: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/4.jpg)
Plurilingual learning
• Exploits learners’ knowledge of similar languages
• Raises language awareness by showing similar properties in several languages
• Aims to avoid learners’ typical errors related to transfer processes
![Page 5: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/5.jpg)
Plurilingual learning: Fields of similarity
• Pan-Romance vocabulary (dormir, sang, vin) – 39 words in all languages– 141 words in 8-9 languages– 227 words in 5-7 languages
• Sound correspondences– sp. ñ → fr. / it. gn, n :
señor, campaña → seigneur, campagne / signore, campagnaaño → an / anno
• Morphosyntactic elements
![Page 6: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/6.jpg)
Plurilingual learning: Example
El
Il
Le
padre
padre
père
habla
parla
parle
con
con
avec
su
suo
son
hijo
figlio
fils
de la
della
de l’
escuel
a
scuola
école
paternal
Pater
parl-
su-
fi l-
de la
Schule
school
![Page 7: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/7.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 8: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/8.jpg)
Existing resources: Linguistic tools
• POS tagger– TreeTagger– SVMTool (Spanish, English, Catalan)
• IBM JFrost lemmatiser– provides possible base forms + POS– morphological information (no POS tagging)
• Verb conjugator– English, German, French, Italian and Spanish– generates all forms for all tenses
![Page 9: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/9.jpg)
Existing plurilingual resources
• Pan-Romance wordlist: 840 words eau agua acqua -- utiliser utilizar utilizzare
• Profile words: 340 words avec con con -- presque casi quasi
• Sound correspondences: – Italian → Spanish: 19 chi- → ll- chiamare →
llamar– Italian → French: 19 -ott- → -uit- notte → nuit– Spanish → Italian: 23 -ue- → -uo- bueno → buono– Spanish → French: 31 ll- → pl- llorar → pleurer– French → Italian: 17 qu- → ch- que → che– French → Spanish: 27 -ein → -eno plein → lleno
![Page 10: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/10.jpg)
Existing resources
• Bilingual wordlists– wordlists can easily be converted into
• different XML formats• relational databases
– used to create multilingual XML lexicons
• Plurilingual lexicon– French, Italian, Spanish (Portuguese,
Romanian)– 1800 entries
![Page 11: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/11.jpg)
Existing resources: Plurilingual lexicon
– [1]actuar, [2]tratarse
agir [v] {1 intransitif, 2 pronominal impers.}
[1]agire, [2]trattarsi
– [caldo->'bouillon'], caliente
chaud [adj]
caldo
– contar [+'raconter']
compter [v]
contare [+'raconter']
![Page 12: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/12.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 13: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/13.jpg)
Created resources
• Multilingual XML lexicon– 43 topics– French: 11,500 lemmas / 14,900 entries– Italian: 13,400 lemmas / 17,800 entries– Spanish: 14,600 lemmas / 19,700 entries– English: 17,600 lemmas / 25,900 entries– German: 5,200 lemmas / 7,300 entries– POS: nouns (m, n, f), verbs, adverbs, adjectives,
conjunctions, articles, pronouns, prepositions, interjections, numerals
– Language levels: 1 - 4
![Page 14: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/14.jpg)
Multilingual XML lexicon: sample entry
![Page 15: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/15.jpg)
Created resources: verb lexicons
Verb lexicons with 500 verbs for each language containing verb pattern information
accepter <vt> <v pron>[de + INF][de faire qch][par][qch de qn][que]
![Page 16: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/16.jpg)
Created resources: verb lexicons
Full-form verb lexicons for 1500 – 1700 verbs
échappeéchapper:pres:1séchapper:pres:3séchapper:subj_pres:1séchapper:subj_pres:3séchapper:impe:2s
abandonner1s_abandonne2s_abandonnes3s_abandonne1p_abandonnons2p_abandonnez3p_abandonnent
![Page 17: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/17.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 18: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/18.jpg)
Overview
Developed tools
Animated grammar presentations
Dictionary tools
Plurilingual analysis module
![Page 19: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/19.jpg)
Animated grammar presentations
• Dynamic representation of grammatical properties / processes
• Tailor-made presentations – Replacing indications of place– Emphasising the subject– Irregular verb conjugations– Spatial prepositions and movements
• Authoring tool for creation of slide-based learning materials with animated content– produces slide-based learning materials– animated and/or static text can be included
![Page 20: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/20.jpg)
Authoring tool: Presenter
• Can be embedded in web page or used as standalone tool in Windows
• XML data can be created automatically and then fed into the presenter→ suitable for flexible feedback
• Several XML files can be provided for use in one page and then e.g. chosen via PHP or JavaScript
![Page 21: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/21.jpg)
Dictionary tools
• Input: any text in French, Italian or Spanish
• Provide word-by-word translations• Multilingual dictionary tool
– Tense, number, person for verb forms– POS– Topic
• Plurilingual dictionary tool– Similar word forms– Profile words
![Page 22: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/22.jpg)
Multilingual dictionary: Resources
• Used resources– Multilingual XML lexicons, multilingual
MySQL database– Full-form verb lexicons
• Dictionary tool can easily be used with any other data base– special language dictionaries– monolingual definition dictionaries
![Page 23: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/23.jpg)
Plurilingual dictionary: Tools and resources
• TreeTagger provides most likely POS
• Pan-Romance wordlist and list of profile words
• Tool makes use of – sound correspondences – Levenshtein string similarity measure – multilingual MySQL database
to automatically detect graphically similar words with the same meaning
![Page 24: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/24.jpg)
Plurilingual dictionary: Word detection
• Basically all words of target language with “distance” ≤ 2 are displayed
• Sp. posibilidad -- Fr. possibilité → Normal distance: 4
• Sound correspondence: Sp. -dad -- Fr. -té→ Intermediate form: posibilité
• Distance between intermediate form and French form is now only 1
![Page 25: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/25.jpg)
Plurilingual analysis module
• Exploits similar sentence structures in Romance languages
• Able to analyse learner input up to (paragraphs of) simple sentences and to give detailed feedback
![Page 26: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/26.jpg)
Resources
• JFrost: – possible lemmas + POS – (extended morphological information)
• Verb lexicons
• Hand-crafted grammar
![Page 27: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/27.jpg)
Parser type
Robust island parser
Hoy la madre no ha vuelto a hablar con su hijo.
Verb group:
V V P V
• has a fixed position and extension in the sentence
• only contains verbs and certain POS
subject objectVerb group
sentence is splitted at potential verb groups
only parts before and after verb group are actually parsed
![Page 28: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/28.jpg)
Analysis module: Recognised errors
• Agreement errors– inside NPs– between sentence components
• Subcategorisation errors– too many/few sentence components– wrong preposition– wrong infinite verb form
• Position errors– Negation– Adverbs
• ...
![Page 29: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/29.jpg)
Error recognition
• Constraint relaxation
– no constraints during parsing– suite of tests after parsing
• Agreement• Position of adverbs• Correctness of Verb group
• Error rules
![Page 30: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/30.jpg)
Modules
• Grammar reader– Reads in grammar file– Extrapolates phrase structure rules
NP -> (det) n (AP)– Provides direct access to subparts of the grammar
”give me all NP rules for Spanish”
• Verb group divider– Divides sentence at its verbal group– Returns the sentence chunks before and after the VG
• NP finder– Finds all possible NP occurrences in sentence
chunks– Returns positions of NPs in sentence chunks
![Page 31: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/31.jpg)
Overview
Plurilingual learning
Existing resources
Created resources
Developed tools
Software architecture
![Page 32: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/32.jpg)
Interaction of software components
Flash
Server Client
Web page
Shared
Object
XML
MySQL NLP
PHP
Perl
Java
NLP
XML
![Page 33: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/33.jpg)
Software architecture: Pros
• Uniform representation on several platforms, browser-independent
• Easy integration of different media types (audio, video, images, animation)
• Embed fonts for many character sets (Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean)
• Flash Remoting: sending complex data structures (Java objects, arrays, hashes) to and from server
![Page 34: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/34.jpg)
Software architecture: Pros
• Flash files can interact mutually via JavaScript, LocalConnection class or using the same Local Shared Objects
• Local Shared Objects provide the opportunity to save structured data (e.g. XML data) on the client side
• No reload necessary for incoming server data
• Can read XML files, you can use XPath and regular expressions
![Page 35: Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005](https://reader030.vdocuments.mx/reader030/viewer/2022032722/56649ceb5503460f949b6d24/html5/thumbnails/35.jpg)
Software architecture: Cons
• (Requires browser plug-in)
• Steep learning curve at the beginning
• Contents cannot be read by search engines
• Software is not for free