The Landscape of Irish Language TechnologyTeresa LynnADAPT Centre, Dublin City University
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
www.adaptcentre.ieIrish – a minority language
NationalLanguageFirstOfficialLanguage
Census:2016Population:4,761,865Abilitytospeak:1,761,420peopleDailyusage:73,803people
www.adaptcentre.ieIrish Language Features
WordOrder=VerbSubjectObject
English: `Isawtheboy’
Irish: Chonaic mé anbuachaill
Gloss: Saw Itheboy
www.adaptcentre.ieIrish Language Features
www.adaptcentre.ieIrish Language Features
Vowel Harmony
Caithim – `I spend’Casaim – `I turn’
Rithfinn – `I would run’D’íosfainn – `Iwould eat’
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
www.adaptcentre.ieSome Terminology Issues
Irish = minority language(spoken by the minority)
Irish = low-resourced language (lacking language tools and resources)
BUT
Does “low-resourced” always mean “minority”??
www.adaptcentre.ieTagalog (Philippines)
• 21millionL1speakers• 50millionL2speakers
Notaminoritylanguage…
…butisconsideredlow-resourced
www.adaptcentre.ieSome Terminology Issues
Irish = A minority European LanguageIrish = A low-resourced European Language
www.adaptcentre.ieIrish language technology survey
META-NET white paper series (Judge et al., 2012)
o EU-led studyo Survey of 31 EU languageso Language resources and technologies
www.adaptcentre.ie
MT
13
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician,
Greek, Icelandic, Irish, Latvian,Lithuanian, Maltese, Norwegian,
Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch, Finnish, French, German,
Italian, Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek,
Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene,
Swedish
weak or no support
Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh
excellent
English
good
Spee
ch
English
good
Dutch, French, German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak,
Slovene, Swedish
weak or no supportexcellent
English
good
Czech, Dutch, French, German, Hungarian,
Italian, Polish, Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician,
Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene
weak or no supportexcellent
Res
ourc
esTe
xt A
naly
sis
Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian,
Welsh
Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
www.adaptcentre.ieExamples of existing resources
o Speechsynthesizer/ScreenReader
o Multipleelectronicdictionaries,terminologyDBs,
o POStagger/Morphologicalanalyser/stemmer
o POStaggedcorpus,Dependencytreebank,SpokenCorpus,ParallelData,Monolingualcorpus(30millionwords),Vicipéid (43karticles),DBpedia
o POStaggedTwittercorpus,POS-taggerforIrishtweets,
o Chunkingparser,statisticalparser
o BasicCALLsystems
o 2xMachineTranslationsystems(oneinusebyGovernmenttranslators)
www.adaptcentre.ieExamples of unfunded contributions (Kevin Scannell)
o Spell-checkerforIrish
o GrammarCheckerforIrish
o Localisation of:GNU/Linux,Mozilla,OpenOffice,Gmail,Facebook,Twitter
o Web-corpuscollection
o EnglishIrishSMT/Irish-ScotsGaelicSMT
o IndigenousTweetssite
o IrishWebcrawler
o WordNetforIrish
o Code.org inIrish
o PredictiveTextToolforIrish
www.adaptcentre.ieLanguage at Risk – in Digital Age
“PrintingPressresultedintheextinctionofmanyminorityandregionallanguages”
WilltechnologyhavethesameimpactonIrish?
www.adaptcentre.ieLanguage at Risk – in Digital Age
Needtoensurecontinuinglanguageusage…….throughtechnology
o Edutainmentpackageso Wordprocessingtoolso Webpagetranslationo Searchengineso Gameso Socialmedia
o Sociolinguisticstudyo Trackmisuse
Source:http://www.leuphana.de/institute/ies/llt2015.html
www.adaptcentre.ieDigital Strategy for the Irish Language 2017
Contributors:
o TeresaLynn DublinCityUniversityo JohnJudge DublinCityUniversityo ElaineUí Dhonnchadha TrinityCollegeDublino Neasa Ní Chiaráin TrinityCollegeDublino Ailbhe Ní Chasaide TrinityCollegeDublin
www.adaptcentre.ieDigital Strategy for the Irish Language 2017
LinguisticResources Corpora Knowledge
Bases NLPTools NLGTools
SpeechModels
SpeechSynthesis
SpeechRecognition
SpokenDialogueSystems
MachineTranslation
InformationRetrieval
StateandPublicUse CALL Disabilityand
Access
Synergies(Industryand
Public)
TopicsCovered:
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
"Uk mapEngland"byUKPhoenix79- Image:British IslesUnitedKingdom.svg.
www.adaptcentre.ieIrish on Twitter
2millionIrishlanguagetweets
www.adaptcentre.ie
Source:indigenoustweets.com
Irish on Twitter
Source:indigenoustweets.com
ant-amseo ant7ainseo chugainn bei 2ag partyáil lemuintir Ráth Daingin!Hopeyoure nottooscared#upthevillage
ant-amseo antseachtain seo chugainn,beidh tú ag partyáil lemuintir Ráth Daingin!Hopeyoure nottooscared#upthevillage
Basque:10,490,641tweets
Kapampangan:2,182,515tweets
Kiswahili:8,187,127tweets
Welsh:5,602,170tweets
Irish:1,718,687tweets
Frisian:905,259tweetsSetswana:787,990tweets
Asturianu:559,652tweets
Hausa:436,244tweets
Yorùbá:288,513tweets
Ikinyarwanda:355,397tweets
March2017
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
www.adaptcentre.ieCurrent Irish LT Projects at DCU
o Tapadóir SMT project (PhD student – Meghan Dowling)
o European Language Resource Coordination
o Code-switching in Irish tweets
o Universal Dependencies for Irish
www.adaptcentre.ieCurrent Irish LT Projects at DCU
GaelTech Project (2017-2021)
o Automatic Identification of Multiword Expressions (PhD student, Abigail Walsh)
o Irish User-Generated Content
o Dependency Treebank(s) expansion
www.adaptcentre.ieOutline
o Irish Language
o Status of Irish language technology
o Minority languages and social media
o Current Irish LT projects at DCU
o Conclusion
www.adaptcentre.ieConclusion
Landscape of Irish language technology has improved….How?
Influenced Government Policy through:o online useo demand for technologyo empirically demonstrating evolution of languageo starting off with pilot systems and demonstrate the benefits of LTo team up with other (similar) minority languageso engaging with larger NLP projects (e.g. UD, COST Action)o organise workshops for sharing knowledge/collaborations/networking
www.adaptcentre.ie
#GRMA
Go raibh maith agaibhThank you (pl)