towards the new czech grammar-checker · current best czech gc is part of proprietary system create...
TRANSCRIPT
Introduction Goal
Goal
New grammar-checker of CzechWeb-based applicationUsing new and existing tools developed at MU
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 2 / 13
Introduction Motivation
Motivation
There are tools existing / in development at MUCurrent best Czech GC is part of proprietary systemCreate an alternative to applications like Grammarly but forCzech
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 3 / 13
Current version Interface
The current interface
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 4 / 13
Current version Interface
The current interface
Based on on-line text processor tinyMCEMostly in JavaScript as tinyMCE modulesAsynchronous processCommunication with backend via AJAX
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 5 / 13
Current version Processing diagram
Processing diagram
tokenization correctiondisplaying
lemmatization& tagging
somemodule
somemodule
somemodule
somemodule
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 6 / 13
Current version Correction displaying
Correction displaying
The dog is runing .0 1 2 3 4 5 6 7
Tokens to display mistake at: 6Correction: 6/runing/running
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 7 / 13
Current version Implemented modules
Implemented modules
Correction TP FP TN FN pre recMisspellings (excl. proper nouns) 24 0 487 16 1,000 0,600Misspellings (incl. proper nouns) 7 17 497 6 0,292 0,538Vocalisation of prepositions 4 0 8 0 1,000 1,000Multiple whitespaces 4 0 515 0 1,000 1,000Whitespace in the interpunction proximity 7 0 119 0 1,000 1,000Conditionals 2 0 1 0 1,000 1,000Commas in a sentence 3 0 0 4 1,000 0,429
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 8 / 13
Proximate issues Genuine testing
A problem with testing
Testing data were too smallMistakes were artificial⇒ Need for API & collection of correctly annotated genuine texts
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 9 / 13
Proximate issues Implemented modules
Implemented modules
Correction TP FP TN FN pre recMisspellings (excl. proper nouns) 24 0 487 16 1,000 0,600Misspellings (incl. proper nouns) 7 17 497 6 0,292 0,538Vocalisation of prepositions 4 0 8 0 1,000 1,000Multiple whitespaces 4 0 515 0 1,000 1,000Whitespace in the interpunction proximity 7 0 119 0 1,000 1,000Conditionals 2 0 1 0 1,000 1,000Commas in a sentence 3 0 0 4 1,000 0,429
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 10 / 13
Proximate issues Spell-checking
A problem with spell-checking
Precision is lowNot often updated dictionary⇒ Method of adding new words, using different lexicon. . .
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 11 / 13
Proximate issues Error reporting
A problem with error reporting
Allow users to flag miscorrectionsHow to not display miscorrection afterwards?⇒ Probably module-depending
V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 12 / 13
Thank you for your attention!
This work was supported by the project of specific research Čeština v jednotě synchronie a diachronie (Czechlanguage in unity of synchrony and diachrony; project no. MUNI/A/0862/2017).