grammar for fun: it-based gmmar teaching with visl

42
Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick Eckhard Bick

Upload: vin

Post on 14-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Eckhard Bick. Grammar for Fun: IT-based Gmmar Teaching with VISL. Eckhard Bick, 2004. Talk outline. Teaching projects. CTU 1996-99: Internet based grammar teaching software (research and development) ELU1 1998-2000: VISL tools for Danish universities and teacher seminaries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grammar for Fun: IT-based Gmmar Teaching with VISL

Grammar for Fun:IT-based Gmmar

Teaching with VISLEckhard Bick, 2004

Eckhard BickEckhard Bick

Page 2: Grammar for Fun: IT-based Gmmar Teaching with VISL

Talk outlineTalk outline

• Background: VISL project activities

• A unified approach to grammar teaching

• Internet based teaching tools

• Grammar Games

• TextPainter: Visualising grammatical text properties

• Research corpora: A ressource for teaching

• Slot filler exercises: Towards evaluation

Page 3: Grammar for Fun: IT-based Gmmar Teaching with VISL

Teaching projectsTeaching projects• CTUCTU 1996-99: Internet based grammar teaching 1996-99: Internet based grammar teaching

software (research and development)software (research and development)

• ELU1ELU1 1998-2000: VISL tools for Danish universities 1998-2000: VISL tools for Danish universities and teacher seminariesand teacher seminaries

• VISL-HHXVISL-HHX 2001-03: VISL tools for Danish business 2001-03: VISL tools for Danish business schoolsschools

• VISL-GYMVISL-GYM 2001-02: VISL tools for Danish 2001-02: VISL tools for Danish gymnasiumsgymnasiums

• PaNoLa, GREIPaNoLa, GREI 2002-2004: Major Nordic languages 2002-2004: Major Nordic languages

• VISL-SEMVISL-SEM 2004-05: VISL didactics for teacher training 2004-05: VISL didactics for teacher training collegescolleges

• URKASURKAS 2004-05: “Almen sprogforståelse” (1.g) 2004-05: “Almen sprogforståelse” (1.g)

Page 4: Grammar for Fun: IT-based Gmmar Teaching with VISL

Unity in diversity:A unified approach for 22

languages

Page 5: Grammar for Fun: IT-based Gmmar Teaching with VISL

VISL research languagesVISL research languages revised

syntactic trees (nodes)

morphological analysis

syntactic analysis

semantics

200.000* 4 subcorpora

lexicon and rule based analyzer + CG

CG + tree-generator

semantic prototypes Po->Da MT

40.400 13 subcorpora

integrated TWOL/CG (lingsoft) + add-on

CG + PSG WordNet based tagging

200.000* 10 subcorpora

lexicon and rule based analyzer + CG

CG + PSG + topological

semantic prototypes Da->Esp MT

8.400 3 subcorpora

lexicon and rule based analyzer + CG

CG + tree-generator

-

16.000 3 subcorpora

integrated TWOL/CG (lingsoft) + add-on

CG + PSG -

20.000 3 subcorpora

Decision Tree Tagger (H.Schmid & A.Stein)

CG + PSG -

1.000 2 subcorpora

Decision Tree Tagger (H.Schmid & A.Stein)

- -

- morpheme based analyzer + CG

CG (experimental)

Da->Esp MT

Page 6: Grammar for Fun: IT-based Gmmar Teaching with VISL

The VISL teaching networkThe VISL teaching network

Page 7: Grammar for Fun: IT-based Gmmar Teaching with VISL

kompleksitetsprogressionkompleksitetsprogression

Page 8: Grammar for Fun: IT-based Gmmar Teaching with VISL

Grammy i Grammy i KlostermølleskovenKlostermølleskoven

Story-line about grammar

Interactive exercises Book = IT

Comments for teachers

Explanations for students

Page 9: Grammar for Fun: IT-based Gmmar Teaching with VISL

The Paintbox gameThe Paintbox game

Page 10: Grammar for Fun: IT-based Gmmar Teaching with VISL

ShootingGallery: Hit a noun!ShootingGallery: Hit a noun!

Page 11: Grammar for Fun: IT-based Gmmar Teaching with VISL

WordFall - Tetris for grammariansWordFall - Tetris for grammarians

Page 12: Grammar for Fun: IT-based Gmmar Teaching with VISL

Labyrinth - a word class Labyrinth - a word class mazemaze

Page 13: Grammar for Fun: IT-based Gmmar Teaching with VISL

Post office - stamping syntactic Post office - stamping syntactic functionfunction

Page 14: Grammar for Fun: IT-based Gmmar Teaching with VISL

Syntris - syntax brick by Syntris - syntax brick by brickbrick

Page 15: Grammar for Fun: IT-based Gmmar Teaching with VISL

SpaceRescue: Alien syntaxSpaceRescue: Alien syntax

Page 16: Grammar for Fun: IT-based Gmmar Teaching with VISL

Constituent treesConstituent trees

Page 17: Grammar for Fun: IT-based Gmmar Teaching with VISL

Interactive syntactic treesInteractive syntactic trees

Page 18: Grammar for Fun: IT-based Gmmar Teaching with VISL

Choose tool e.g. inspection, build tree or label tree

Choose complexity e.g. minor (dynamic sentence dependent reduction in category complexity) or major

Choose notation e.g. symbols or abbrebiations and/or colors

Choose teaching environment e.g. latinate Danish gymnasium

Choose meta-language e.g. English

Choose visualisation e.g. graphical trees or field analysis

Choose level e.g. VISL-lite (for schools)

Choose subcorpus e.g. VISL-HHX (business gymnasium)

Choose target language e.g. German or Swedish

Teaching corpora of analyzed sentences

Page 19: Grammar for Fun: IT-based Gmmar Teaching with VISL

Function categoriesFunction categories

Page 20: Grammar for Fun: IT-based Gmmar Teaching with VISL

BuildTree: Drag & drop constituentsBuildTree: Drag & drop constituents

Page 21: Grammar for Fun: IT-based Gmmar Teaching with VISL

LabelTree: Drag & drop syntactic LabelTree: Drag & drop syntactic functionfunction

Page 22: Grammar for Fun: IT-based Gmmar Teaching with VISL

Cross-language problems:Cross-language problems:Infinitive markerInfinitive marker

Page 23: Grammar for Fun: IT-based Gmmar Teaching with VISL

Cross-language problems:Cross-language problems:participial clausesparticipial clauses

Page 24: Grammar for Fun: IT-based Gmmar Teaching with VISL

Cross-language problems:Cross-language problems:DiscontinuityDiscontinuity

Page 25: Grammar for Fun: IT-based Gmmar Teaching with VISL

VISL source notationVISL source notation

VISL lite vertical tree(non-graphical notation, filtered)

VISL vertical tree(non-graphical notation, incl. morphology)

UTT:clS:prop VISLP:v erCs:g=D:art et=H:n forskningsprojekt=D:cl==S:pron der==P:v involverer==Od:g===D:pron mange===D:adj forskellige===H:n sprog

STA:fclS:prop("VISL") VISLP:v-fin("være",pr,akt) erCs:np=DN:art("en",neu,sg,idf) et=H:n("forskningsprojekt",neu,sg,idf,nom) forskningsprojekt=DN:fcl==S:pron-rel("der",nG,nN,nom) der==P:v-fin("involvere",pr,akt) involverer==Od:np===DN:pron-indef("mange",nG,pl,nom) mange===DN:adj("forskellig",nG,pl,nD,nom) forskellige===H:n("sprog",neu,pl,idf,nom) sprog

Page 26: Grammar for Fun: IT-based Gmmar Teaching with VISL

CG source notation CG source notation (function/dependency)(function/dependency)

Page 27: Grammar for Fun: IT-based Gmmar Teaching with VISL

Supported xml-formatsSupported xml-formats

• TIGER-xml (constituents)

• TIGER-xml (dependency)

• MALT-xml

• VISL data file markers:pedagogical topic and chaptering

attributesfor dynamic html-layout

Page 28: Grammar for Fun: IT-based Gmmar Teaching with VISL

Search interfaces Search interfaces for annotated corporafor annotated corpora

Page 29: Grammar for Fun: IT-based Gmmar Teaching with VISL

Menu-based searchesMenu-based searches

Page 30: Grammar for Fun: IT-based Gmmar Teaching with VISL

Statistical toolsStatistical tools

Page 31: Grammar for Fun: IT-based Gmmar Teaching with VISL

Corpus annotationCorpus annotation

Page 32: Grammar for Fun: IT-based Gmmar Teaching with VISL

Annotated corporaAnnotated corpora

Morphosyntactically tagged

• Korpus90 and Korpus2000, mixed genre, 56M words

• DFK, mainly transscribed parliamentary discussions, 7M words

• CETEMPúblico, European Portuguese, news text, 180M words

• Folha de São Paulo, Brazilian news text, 90M words

• CORDIAL-SIN, dialectal Portuguese, 30K words

• NURC, transscribed Brazilian speech, 100K words

• Tycho Brahe, historical Portuguese, 50K words

Valency tagged

• NILC corpus, Brazilian Portuguese, journalistic and essays, 39M words

Treebanks

• Floresta Sintá(c)tica, European Portuguese, 1M words (35K revised)

• Arboretum, Danish, 50K words revised

Page 33: Grammar for Fun: IT-based Gmmar Teaching with VISL

Integrating live NLPIntegrating live NLPand language awareness teachingand language awareness teaching

Page 34: Grammar for Fun: IT-based Gmmar Teaching with VISL

KillerFiller: Towards KillerFiller: Towards evaluationevaluation

Page 35: Grammar for Fun: IT-based Gmmar Teaching with VISL

Performance statisticsPerformance statistics

Page 36: Grammar for Fun: IT-based Gmmar Teaching with VISL

VISLVISLhttp://visl.sdu.dkhttp://visl.sdu.dk

Eckhard Bick, [email protected]

**************

Page 37: Grammar for Fun: IT-based Gmmar Teaching with VISL

The most common syntactic categoriesThe most common syntactic categories

@SUBJ subject @ADVL free (adjunct) adverbial

@ACC direct (accusative) object @PRED free (adjunct) predicative

@DAT indirect (dative) object @APP apposition

@PIV prepositional object @>N prenominal dependent

@SC subject complement @N< postnominal dependent

@OC object complement @>A adverbial pre-dependent

@SA subject related adverbial argument @A< adverbial post-dependent

@OA object related adverbial argument @P< argument of preposition

@MV main verb @INFM infinitive marker

@AUX auxiliary @VOK vocative

Page 38: Grammar for Fun: IT-based Gmmar Teaching with VISL
Page 39: Grammar for Fun: IT-based Gmmar Teaching with VISL

The DanGram system in current numbers

Lexemes in morphological base lexicon: 146.342(equals about 1.000.000 full forms), of these:

proper names: 44839 (experimental)polylexicals: 460 (+ names and certain number expressions)

Lexemes in the valency and semantic prototype lexicon: 95.308Lexemes in the bilingual lexicon (Danish-Esperanto): 36.001

Danish CG-rules, in all: 6.233morphological CG disambiguation rules: 2.678syntactic mapping-rules: 1.701syntactic CG disambiguation rules: 1.854(plus 429 bilingual rules in separate MT grammars, and a smaller number of semantic case-role and proper name-

rules in the semantics and name grammars)

Danish PSG-rules: 490 (for generating syntactic tree structures)

Performance:At full disambiguation (i.e., maximal precision), the system has an average correctness of 99% for word class (PoS), and about 96% for syntactic tags (depending, on how fine grained an annotation scheme is used)

Speed:full CG-parse: ca. 400 words/sec for larger texts (start up time 3-6 sec)morphological analysis alone: ca. 1000 words/sec

Page 40: Grammar for Fun: IT-based Gmmar Teaching with VISL

VISL parsing tools VISL parsing tools

• Preprocessing: word- and sentence boundaries, Preprocessing: word- and sentence boundaries, polylexicalspolylexicals

• Lexicon and rule based morphological analysis: Lexicon and rule based morphological analysis: Inflexion, derivation, composita recognitionInflexion, derivation, composita recognition

• Postprocessing: Valency and semantic potentialPostprocessing: Valency and semantic potential

• Morphological contextual disambiguation (CG)Morphological contextual disambiguation (CG)

• Syntactic mapping og diambiguation (CG)Syntactic mapping og diambiguation (CG)

• Names CG , feature propagation CG, Case role-CGNames CG , feature propagation CG, Case role-CG

• PSG-overbygning: Teaching, Arboretum, FlorestaPSG-overbygning: Teaching, Arboretum, Floresta

Page 41: Grammar for Fun: IT-based Gmmar Teaching with VISL

Research projectsResearch projects

• SHFSHF 1999-2001: CG, syntax & semantics (da,en,po) 1999-2001: CG, syntax & semantics (da,en,po)

• AC/DCAC/DC 1999-?: Portuguese CG-corpora 1999-?: Portuguese CG-corpora

• FlorestaFloresta 2000-?: Portuguese treebank 2000-?: Portuguese treebank

• DSLDSL 2001-?: Korpus90/2000 (Danish CG-corpora) 2001-?: Korpus90/2000 (Danish CG-corpora)

• ArboretumArboretum 2002-?: Danish treebank 2002-?: Danish treebank

• PaNoLaPaNoLa 2002-2003: Integration of Nordic CG research 2002-2003: Integration of Nordic CG research

• Nomen NescioNomen Nescio: Automatic named entity recognition: Automatic named entity recognition

Page 42: Grammar for Fun: IT-based Gmmar Teaching with VISL

Da [da] KS @SUB den [den] ART UTR S DEF @>N gamle [gammel] ADJ nG S DEF NOM @>N sælger [sælger] N UTR S IDF NOM @SUBJ> kørte [køre] <mv> V IMPF AKT @FS-ADVL> hjem [hjem] N NEU P IDF NOM @<ACC i [i] PRP @<ADVL sin [sin] <poss> <refl> DET UTR S @>N bil [bil] N UTR S IDF NOM @P< , så [se] <mv> V IMPF AKT @FMV han [han] PERS UTR 3S NOM @<SUBJ mange [mange] <quant> DET nG P NOM @>N små [lille] ADJ nG P nD NOM @>N dyr [dyr] N NEU P IDF NOM &ACI-SUBJ @<ACC på [på] PRP @<OA de [den] ART nG P DEF @>N våde [våd] ADJ nG P nD NOM @>N veje [vej] N UTR P IDF NOM @P<

Running CG-annotation