![Page 1: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/1.jpg)
Celtic Language Technologies in the Digital AgeJohn Judge, Adapt Centre, DCU
![Page 2: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/2.jpg)
www.adaptcentre.ieBackground on Me
• Background: Computational Linguist – research and real world• Interests in: Natural Language Processing, Text Analytics,
Machine Translation, …• National Centre for Language Technology• Research Integration Coordinator for the ADAPT Centre of
Excellence for Digital Content and Media Innovation• Focus on EU collaborations
• META-NET• QT LaunchPad• LT Web• FALCON• Mli• QT21• TraMOOC• EXPERT
![Page 3: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/3.jpg)
www.adaptcentre.ieADAPT Centre
• ADAPT Science Foundation Ireland Direct Funding over six years (until 2020)
• Academic/Industry partnership built on top of CNGL• Five research themes• Six application areas• TCD and DCU co-leads; UCD and DIT partners• Open ended number of industry partners
![Page 4: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/4.jpg)
www.adaptcentre.ieADAPT Centre
E-Commerce Financial
E-LearningLife SciencesICT Localisation Content & Media Entertainment
Industry Partners
![Page 5: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/5.jpg)
www.adaptcentre.ieGlobal Digital Content: Platform Research
![Page 6: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/6.jpg)
www.adaptcentre.ieAmbitious Metrics for Success
13Spin Out
Companies
€5mCommercialisation
Awards
1,650Top QualityPublications
€110mWon in TotalCompetitive
Research
500Jobs
€9mFrom
CommercialSources
60Major EUInitiatives
200Postgraduate
Students
88Licence
Agreements
![Page 7: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/7.jpg)
www.adaptcentre.ieAgus Gaeilge…?
How much of all of this relates to Irish?
![Page 8: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/8.jpg)
www.adaptcentre.ie
Language Technology
![Page 9: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/9.jpg)
www.adaptcentre.ieLanguage Technology and Applications
![Page 10: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/10.jpg)
www.adaptcentre.ieLT is not…
• Localised Software• A website in your language• A static online dictionary
But these are all VERY valuable resources for a language!…and can form part of a healthy LT ecosystem
![Page 11: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/11.jpg)
www.adaptcentre.ieWhat is LT – Where I’m coming from
• Technology for processing information (speech, text, gestures,…) in a given language
• An enabling technology• Added intelligence to both content (creation,
management/etc) and HCI• Set of tools and resources – part of a bigger picture and a
larger ecosystem• Interactive• Not monolithic resources
![Page 12: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/12.jpg)
www.adaptcentre.ieIt’s already right under your noses
• These concepts (and some others) already being used for a wide range of applications• Marketing/Brand awareness• Customer Sentiment Analysis• Political barometers (Obama)• Information analysis and extraction (IBM Watson)• Offensive content filtering• Security applications
![Page 13: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/13.jpg)
www.adaptcentre.ie
A look at the Irish LT perspective
![Page 14: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/14.jpg)
www.adaptcentre.ieLT Landscape in Ireland
• Historically strong in Translation and Localisation industry
• Home to several internationally recognised research centres • NCLT• DERI• CNGL >>> ADAPT• INSIGHT
• Government funding for research has been consistent despite worsening economic conditions
![Page 15: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/15.jpg)
www.adaptcentre.ieLT for Irish
• Many of the basics are covered
• Spell checker• Grammar checking• T9 predictive text, smartphone predictive text (through additional
software)• Localisation of open source software, and many major applications
• Some of the more advanced stuff
• Speech synthesiser• Part-of-Speech Tagger• (Dependency Parser)
![Page 16: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/16.jpg)
www.adaptcentre.ieLT for Irish
• But there’s not much else• Availability of text corpora, speech corpora, parallel texts,
wordnets and other LT building blocks is limited or poor• Some resources exist – small, narrow coverage, restricted
availability• Lack of basic linguistic resources is stifling development of
modern language processing technologies for Irish• Yet our own research centres are producing world leading
LT for other languages
![Page 17: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/17.jpg)
www.adaptcentre.ieState of LT Support for Irish
Source: META-NET Whitepaper Series The Irish Language in the Digital Age
![Page 18: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/18.jpg)
www.adaptcentre.ie
MT
19
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish,
Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,
Latvian, Lithu-anian, Maltese, Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch, Finnish, French, German, Italian,
Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish,
Norwegian, Polish, Serbian, Slovak, Slovene, Swedish
weak or no support
Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Sp
eec
h
English
good
Dutch, French, German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech, Danish, Finnish,
Galician, Greek, Hungarian, Norwegian, Polish, Portuguese,
Romanian, Slovak, Slovene, Swedish
weak or no supportexcellent
English
good
Czech, Dutch, French, German,
Hungarian, Italian, Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,
Finnish, Galician, Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
weak/no supportexcellent
Resou
rce
sTe
xt
An
aly
sis
Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh
Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
![Page 19: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/19.jpg)
www.adaptcentre.ieEurope’s Languages and LT
DutchFrenchGermanItalian
Spanish
CatalanCzech
FinnishHungarian
PolishPortugues
eSwedish
BasqueBulgarian
DanishGalicianGreek
Norwegian
RomanianSlovakSlovene
CroatianEstonianIcelandicIrish
LatvianLithuanian
MalteseSerbianWelsh
English
good support through Language
Technology
weak orno support
No Surprises Here!
![Page 20: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/20.jpg)
www.adaptcentre.ieSo What?
• Take a closer look at the least equipped languages• Only 3 compete with English in their native countries• Maltese native fluency ~100% (Eurobarometer)• Irish and Welsh are at risk
• So too are other RMLs which compete with any better resourced language on a day to day basis
CroatianEstonianIcelandicIrish
LatvianLithuanian
MalteseSerbianWelsh
weak orno support
BasqueBulgarian
DanishGalicianGreek
Norwegian
RomanianSlovakSlovene
![Page 21: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/21.jpg)
www.adaptcentre.ieLanguages at risk in the pre-digital age
![Page 22: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/22.jpg)
www.adaptcentre.ieLanguages at risk in the print age
• Invention of the moveable type printing press• Improved literacy• Standardisation• The Reformation• The Renaissance• The Enlightenment
• Death of hundreds of European RMLs that never made it into print
![Page 23: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/23.jpg)
www.adaptcentre.ieLanguages in the Digital Age
• The leap into the digital age has had profound effects• Need to equip all languages with digital resources to ensure
survival• Otherwise they are doomed to history
• The Celtic Languages need to address under-resourcing
![Page 24: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/24.jpg)
www.adaptcentre.ie
A High Level Solution - Europe
![Page 25: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/25.jpg)
www.adaptcentre.ieEuropean Level Action
• Multilingual Europe Technology Alliance• Bring together Language Technology stakeholders• Concerted effort to influence EU research programmes for LT• Strategic Research Agenda for Multilingual Europe
• Success in H2020 Funding calls – specifically in ICT 17 “Cracking the Language Barrier”• “.. to facilitate multilingual online communication for the
benefit of the digital single market which is still fragmented by language barriers that hamper a wide penetration of cross-border commerce, social communication and exchange of cultural content.”
• “Special focus is on the 21 EU languages (both as source and target languages) that have ‘fragmentary’ or ‘weak/no’ machine translation support according to the META-NET language white papers.”
![Page 26: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/26.jpg)
www.adaptcentre.ieAddressing the Gap – CRACKER Project
• CRACKER (Feb 2015) – follow up to META-NET. Stated goals:• Initiating a programme of ground-breaking actions that will deliver,
by 2025, an online EU internal market free of language barriers, delivering automated translation quality, equal to currently best performing language pair/direction, in most relevant use situations and for at least 90% of the EU official languages.
• Significantly improving the quality, coverage and technical maturity of automatic translation for at least half of the 21 EU languages that currently have "weak or no support" or "fragmentary support" of machine translation solutions, according to the META-NET Language White Papers referenced before.
• Attracting a community of hundreds of contributors of language resources and language technology tools (from all EU Member States and Associated Countries) to adopt and support a single platform for sharing, maintaining and making use of language resources and tools; establishing widely agreed benchmarks for machine translation quality and stimulating competition between methods and systems.
![Page 27: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/27.jpg)
www.adaptcentre.ieEU Actions Recap
• The EU is calling for improved resources for our languages• The big players (industry and research) are organising to do
something about it• Celtic languages can be part of this if we position ourselves
to be there
![Page 28: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/28.jpg)
www.adaptcentre.ieEU Actions – Getting on board
• Riga Summit 2015, April 27-29• http://www.rigasummit2015.eu
• Venue for META-FORUM• Multilingual Technologies for the Digital Single Market• Language Technologies for the Big Data Challenge and Data
Economy• High-Quality Machine Translation• Towards European Language Technology Platforms• Strategic Agenda for the Multilingual Digital Single Market
![Page 29: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/29.jpg)
www.adaptcentre.ieSummit Agenda
Opening addresses
H.E. Andris Bērziņš, President of the Republic of Latvia
First sessionSetting the Strategic Agenda for the Multilingual Digital Single Market
Coffee break
Second sessionBreaking the Language Barrier for Cross-Border Public Services
Lunch
Third sessionLanguage Technology: Enabling European Business
Coffee break
Fourth sessionEmpowering the Multilingual Data Economy
Closing session EU Innovation Excellence to Address Multilingual Challenges
![Page 30: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/30.jpg)
www.adaptcentre.ieNational Policy/Funding Agency Round Table
• Roundtable session to discuss where languages and language technologies currently stand in the different countries and regions and how to improve the situation
• Goal: Shape a Strategic Research and Innovation Agenda with input (and buy in) directly from those responsible for our languages at a regional level
![Page 31: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/31.jpg)
www.adaptcentre.ie
Towards a Celtic Language Technology Community
![Page 32: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/32.jpg)
www.adaptcentre.ieLanguages in the Digital Age
• Not all doom and gloom!• Significant opportunity: LT and language
promotion/rejuvenation• Community effort can provide the basic building blocks• Techniques can do more with less• Policy makers can be hard to convince• We have to start somewhere – Celtic Language Technology
Community Workshop
![Page 33: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/33.jpg)
www.adaptcentre.ieCeltic Language Technology Workshop
“The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing NLP (Natural Language Processing) resources and technologies for Celtic languages.
As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.”
![Page 34: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/34.jpg)
www.adaptcentre.ieFirst CLTW at COLING 2014
• Held in association with COLING 2014 (top tier CL/LT conference)• Full day of research presentations (papers and posters)• Attended by about 30 people• Published 12 papers• Representing work on: Irish, Welsh, Scots Gaelic, Breton (and an
invited talk that covered aspects of Manx)• Including an open forum session to discuss how to move the
area forward• Endorsed by Irish Government, Ofis Publik ar Brezhoneg (among
others)
![Page 35: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/35.jpg)
www.adaptcentre.ieCLTW Topics of Interest
• Language resources• Syntax, semantics, grammar,
lexicons• Phonology / morphology,
tagging• Morphological analysis• Part-of-speech taggers• Computer-Assisted Language
Learning (CALL)• Translation memory• Machine translation• Parsing / chunking• Ontologies, terminology and
knowledge representation• Speech processing /
generation
• Digital humanities• Corpus development /
analysis• Treebanking• Evaluation methods• Ontology-lexica• Metadata• Linked data resources• Linguistic linked data
resources• Semantic annotation• Information Extraction
![Page 36: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/36.jpg)
www.adaptcentre.ieWorkshop Outcomes
• A great time!• Community forum• Momentum• Ideas for further collaboration• Possible EU level action to address under-resourcing
![Page 37: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/37.jpg)
www.adaptcentre.ie
Future Directions
![Page 38: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/38.jpg)
www.adaptcentre.ieWithin the LT Community
• Under resourced languages are a challenge for science• The best researchers LOVE a challenge• Celtic LT community position itself as a provider of
interesting challenges• BUT: We still need wider language community help to
ensure adequate data is available to the R&D community
![Page 39: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/39.jpg)
www.adaptcentre.ieWhat Can/Should We Do?
• Concerted Community Action• Data is key
• Collections of digital data in a language• Appropriate format• Appropriate annotation• Appropriate licence• Appropriately available
• The R&D community will combine to build more sophisticated tools and solve bigger problems…
• This should not be done in isolation by each RML community
• Band together and also look to EU initiatives
![Page 40: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/40.jpg)
www.adaptcentre.ieCeltic LT Community Efforts
• Next CLTW – Proposal for part of LREC 2016• Semi formal meet ups (today)• Budding Irish LT lobby group CIGILT• COST (European COoperation in Science and Technology)
Action• Reaching out further to the Humanities
• Needs support from policy makers• Needs to produce results that generate buy in from language
communities
![Page 41: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/41.jpg)
www.adaptcentre.ieThe Grass Roots
• Small numbers of speakers• Typically minority (or marginalised languages)• Everyone has a role to play• LT Community needs to speak out more• Show tangible benefits
![Page 42: Celtic language technologies in the digital age](https://reader036.vdocuments.mx/reader036/viewer/2022081519/55bda894bb61eb2e228b47f3/html5/thumbnails/42.jpg)
www.adaptcentre.ieDiolch! – Thank You!
[email protected]://ie.linkedin.com/in/judgejohn/http://www.adaptcentre.ie
CLTWhttps://groups.google.com/forum/#!forum/celtic-language-technology
META-NET LWPshttp://www.meta-net.euhttp://www.meta-net.eu/whitepapers/e-book/welsh.pdfhttp://www.meta-net.eu/whitepapers/e-book/irish.pdfhttp://www.meta-net.eu/whitepapers/e-book/basque.pdf
EU initiativeshttp://www.cracker-project.euhttp://www.rigasummit2015.eu