celtic language technologies in the digital age

42
Celtic Language Technologies in the Digital Age John Judge, Adapt Centre, DCU

Upload: techiaith

Post on 02-Aug-2015

99 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Celtic language technologies in the digital age

Celtic Language Technologies in the Digital AgeJohn Judge, Adapt Centre, DCU

Page 2: Celtic language technologies in the digital age

www.adaptcentre.ieBackground on Me

• Background: Computational Linguist – research and real world• Interests in: Natural Language Processing, Text Analytics,

Machine Translation, …• National Centre for Language Technology• Research Integration Coordinator for the ADAPT Centre of

Excellence for Digital Content and Media Innovation• Focus on EU collaborations

• META-NET• QT LaunchPad• LT Web• FALCON• Mli• QT21• TraMOOC• EXPERT

Page 3: Celtic language technologies in the digital age

www.adaptcentre.ieADAPT Centre

• ADAPT Science Foundation Ireland Direct Funding over six years (until 2020)

• Academic/Industry partnership built on top of CNGL• Five research themes• Six application areas• TCD and DCU co-leads; UCD and DIT partners• Open ended number of industry partners

Page 4: Celtic language technologies in the digital age

www.adaptcentre.ieADAPT Centre

E-Commerce Financial

E-LearningLife SciencesICT Localisation Content & Media Entertainment

Industry Partners

Page 5: Celtic language technologies in the digital age

www.adaptcentre.ieGlobal Digital Content: Platform Research

Page 6: Celtic language technologies in the digital age

www.adaptcentre.ieAmbitious Metrics for Success

13Spin Out

Companies

€5mCommercialisation

Awards

1,650Top QualityPublications

€110mWon in TotalCompetitive

Research

500Jobs

€9mFrom

CommercialSources

60Major EUInitiatives

200Postgraduate

Students

88Licence

Agreements

Page 7: Celtic language technologies in the digital age

www.adaptcentre.ieAgus Gaeilge…?

How much of all of this relates to Irish?

Page 8: Celtic language technologies in the digital age

www.adaptcentre.ie

Language Technology

Page 9: Celtic language technologies in the digital age

www.adaptcentre.ieLanguage Technology and Applications

Page 10: Celtic language technologies in the digital age

www.adaptcentre.ieLT is not…

• Localised Software• A website in your language• A static online dictionary

But these are all VERY valuable resources for a language!…and can form part of a healthy LT ecosystem

Page 11: Celtic language technologies in the digital age

www.adaptcentre.ieWhat is LT – Where I’m coming from

• Technology for processing information (speech, text, gestures,…) in a given language

• An enabling technology• Added intelligence to both content (creation,

management/etc) and HCI• Set of tools and resources – part of a bigger picture and a

larger ecosystem• Interactive• Not monolithic resources

Page 12: Celtic language technologies in the digital age

www.adaptcentre.ieIt’s already right under your noses

• These concepts (and some others) already being used for a wide range of applications• Marketing/Brand awareness• Customer Sentiment Analysis• Political barometers (Obama)• Information analysis and extraction (IBM Watson)• Offensive content filtering• Security applications

Page 13: Celtic language technologies in the digital age

www.adaptcentre.ie

A look at the Irish LT perspective

Page 14: Celtic language technologies in the digital age

www.adaptcentre.ieLT Landscape in Ireland

• Historically strong in Translation and Localisation industry

• Home to several internationally recognised research centres • NCLT• DERI• CNGL >>> ADAPT• INSIGHT

• Government funding for research has been consistent despite worsening economic conditions

Page 15: Celtic language technologies in the digital age

www.adaptcentre.ieLT for Irish

• Many of the basics are covered

• Spell checker• Grammar checking• T9 predictive text, smartphone predictive text (through additional

software)• Localisation of open source software, and many major applications

• Some of the more advanced stuff

• Speech synthesiser• Part-of-Speech Tagger• (Dependency Parser)

Page 16: Celtic language technologies in the digital age

www.adaptcentre.ieLT for Irish

• But there’s not much else• Availability of text corpora, speech corpora, parallel texts,

wordnets and other LT building blocks is limited or poor• Some resources exist – small, narrow coverage, restricted

availability• Lack of basic linguistic resources is stifling development of

modern language processing technologies for Irish• Yet our own research centres are producing world leading

LT for other languages

Page 17: Celtic language technologies in the digital age

www.adaptcentre.ieState of LT Support for Irish

Source: META-NET Whitepaper Series The Irish Language in the Digital Age

Page 18: Celtic language technologies in the digital age

www.adaptcentre.ie

MT

19

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,

Latvian, Lithu-anian, Maltese, Norwegian, Portuguese, Serbian,

Slovak, Slovene, Swedish, Welsh

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish,

Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,

Welsh

excellent

English

good

Sp

eec

h

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Czech, Danish, Finnish,

Galician, Greek, Hungarian, Norwegian, Polish, Portuguese,

Romanian, Slovak, Slovene, Swedish

weak or no supportexcellent

English

good

Czech, Dutch, French, German,

Hungarian, Italian, Polish, Spanish,

Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,

Finnish, Galician, Greek, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

weak/no supportexcellent

Resou

rce

sTe

xt

An

aly

sis

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh

Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh

Page 19: Celtic language technologies in the digital age

www.adaptcentre.ieEurope’s Languages and LT

DutchFrenchGermanItalian

Spanish

CatalanCzech

FinnishHungarian

PolishPortugues

eSwedish

BasqueBulgarian

DanishGalicianGreek

Norwegian

RomanianSlovakSlovene

CroatianEstonianIcelandicIrish

LatvianLithuanian

MalteseSerbianWelsh

English

good support through Language

Technology

weak orno support

No Surprises Here!

Page 20: Celtic language technologies in the digital age

www.adaptcentre.ieSo What?

• Take a closer look at the least equipped languages• Only 3 compete with English in their native countries• Maltese native fluency ~100% (Eurobarometer)• Irish and Welsh are at risk

• So too are other RMLs which compete with any better resourced language on a day to day basis

CroatianEstonianIcelandicIrish

LatvianLithuanian

MalteseSerbianWelsh

weak orno support

BasqueBulgarian

DanishGalicianGreek

Norwegian

RomanianSlovakSlovene

Page 21: Celtic language technologies in the digital age

www.adaptcentre.ieLanguages at risk in the pre-digital age

Page 22: Celtic language technologies in the digital age

www.adaptcentre.ieLanguages at risk in the print age

• Invention of the moveable type printing press• Improved literacy• Standardisation• The Reformation• The Renaissance• The Enlightenment

• Death of hundreds of European RMLs that never made it into print

Page 23: Celtic language technologies in the digital age

www.adaptcentre.ieLanguages in the Digital Age

• The leap into the digital age has had profound effects• Need to equip all languages with digital resources to ensure

survival• Otherwise they are doomed to history

• The Celtic Languages need to address under-resourcing

Page 24: Celtic language technologies in the digital age

www.adaptcentre.ie

A High Level Solution - Europe

Page 25: Celtic language technologies in the digital age

www.adaptcentre.ieEuropean Level Action

• Multilingual Europe Technology Alliance• Bring together Language Technology stakeholders• Concerted effort to influence EU research programmes for LT• Strategic Research Agenda for Multilingual Europe

• Success in H2020 Funding calls – specifically in ICT 17 “Cracking the Language Barrier”• “.. to facilitate multilingual online communication for the

benefit of the digital single market which is still fragmented by language barriers that hamper a wide penetration of cross-border commerce, social communication and exchange of cultural content.”

• “Special focus is on the 21 EU languages (both as source and target languages) that have ‘fragmentary’ or ‘weak/no’ machine translation support according to the META-NET language white papers.”

Page 26: Celtic language technologies in the digital age

www.adaptcentre.ieAddressing the Gap – CRACKER Project

• CRACKER (Feb 2015) – follow up to META-NET. Stated goals:• Initiating a programme of ground-breaking actions that will deliver,

by 2025, an online EU internal market free of language barriers, delivering automated translation quality, equal to currently best performing language pair/direction, in most relevant use situations and for at least 90% of the EU official languages.

• Significantly improving the quality, coverage and technical maturity of automatic translation for at least half of the 21 EU languages that currently have "weak or no support" or "fragmentary support" of machine translation solutions, according to the META-NET Language White Papers referenced before.

• Attracting a community of hundreds of contributors of language resources and language technology tools (from all EU Member States and Associated Countries) to adopt and support a single platform for sharing, maintaining and making use of language resources and tools;  establishing widely agreed benchmarks for machine translation quality and stimulating competition between methods and systems.

Page 27: Celtic language technologies in the digital age

www.adaptcentre.ieEU Actions Recap

• The EU is calling for improved resources for our languages• The big players (industry and research) are organising to do

something about it• Celtic languages can be part of this if we position ourselves

to be there

Page 28: Celtic language technologies in the digital age

www.adaptcentre.ieEU Actions – Getting on board

• Riga Summit 2015, April 27-29• http://www.rigasummit2015.eu

• Venue for META-FORUM• Multilingual Technologies for the Digital Single Market• Language Technologies for the Big Data Challenge and Data

Economy• High-Quality Machine Translation• Towards European Language Technology Platforms• Strategic Agenda for the Multilingual Digital Single Market

Page 29: Celtic language technologies in the digital age

www.adaptcentre.ieSummit Agenda

Opening addresses

H.E. Andris Bērziņš, President of the Republic of Latvia

First sessionSetting the Strategic Agenda for the Multilingual Digital Single Market

Coffee break

Second sessionBreaking the Language Barrier for Cross-Border Public Services

Lunch

Third sessionLanguage Technology: Enabling European Business

Coffee break

Fourth sessionEmpowering the Multilingual Data Economy

Closing session EU Innovation Excellence to Address Multilingual Challenges

Page 30: Celtic language technologies in the digital age

www.adaptcentre.ieNational Policy/Funding Agency Round Table

• Roundtable session to discuss where languages and language technologies currently stand in the different countries and regions and how to improve the situation

• Goal: Shape a Strategic Research and Innovation Agenda with input (and buy in) directly from those responsible for our languages at a regional level

Page 31: Celtic language technologies in the digital age

www.adaptcentre.ie

Towards a Celtic Language Technology Community

Page 32: Celtic language technologies in the digital age

www.adaptcentre.ieLanguages in the Digital Age

• Not all doom and gloom!• Significant opportunity: LT and language

promotion/rejuvenation• Community effort can provide the basic building blocks• Techniques can do more with less• Policy makers can be hard to convince• We have to start somewhere – Celtic Language Technology

Community Workshop

Page 33: Celtic language technologies in the digital age

www.adaptcentre.ieCeltic Language Technology Workshop

“The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing NLP (Natural Language Processing) resources and technologies for Celtic languages.

As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.”

Page 34: Celtic language technologies in the digital age

www.adaptcentre.ieFirst CLTW at COLING 2014

• Held in association with COLING 2014 (top tier CL/LT conference)• Full day of research presentations (papers and posters)• Attended by about 30 people• Published 12 papers• Representing work on: Irish, Welsh, Scots Gaelic, Breton (and an

invited talk that covered aspects of Manx)• Including an open forum session to discuss how to move the

area forward• Endorsed by Irish Government, Ofis Publik ar Brezhoneg (among

others)

Page 35: Celtic language technologies in the digital age

www.adaptcentre.ieCLTW Topics of Interest

• Language resources• Syntax, semantics, grammar,

lexicons• Phonology / morphology,

tagging• Morphological analysis• Part-of-speech taggers• Computer-Assisted Language

Learning (CALL)• Translation memory• Machine translation• Parsing / chunking• Ontologies, terminology and

knowledge representation• Speech processing /

generation

• Digital humanities• Corpus development /

analysis• Treebanking• Evaluation methods• Ontology-lexica• Metadata• Linked data resources• Linguistic linked data

resources• Semantic annotation• Information Extraction

Page 36: Celtic language technologies in the digital age

www.adaptcentre.ieWorkshop Outcomes

• A great time!• Community forum• Momentum• Ideas for further collaboration• Possible EU level action to address under-resourcing

Page 37: Celtic language technologies in the digital age

www.adaptcentre.ie

Future Directions

Page 38: Celtic language technologies in the digital age

www.adaptcentre.ieWithin the LT Community

• Under resourced languages are a challenge for science• The best researchers LOVE a challenge• Celtic LT community position itself as a provider of

interesting challenges• BUT: We still need wider language community help to

ensure adequate data is available to the R&D community

Page 39: Celtic language technologies in the digital age

www.adaptcentre.ieWhat Can/Should We Do?

• Concerted Community Action• Data is key

• Collections of digital data in a language• Appropriate format• Appropriate annotation• Appropriate licence• Appropriately available

• The R&D community will combine to build more sophisticated tools and solve bigger problems…

• This should not be done in isolation by each RML community

• Band together and also look to EU initiatives

Page 40: Celtic language technologies in the digital age

www.adaptcentre.ieCeltic LT Community Efforts

• Next CLTW – Proposal for part of LREC 2016• Semi formal meet ups (today)• Budding Irish LT lobby group CIGILT• COST (European COoperation in Science and Technology)

Action• Reaching out further to the Humanities

• Needs support from policy makers• Needs to produce results that generate buy in from language

communities

Page 41: Celtic language technologies in the digital age

www.adaptcentre.ieThe Grass Roots

• Small numbers of speakers• Typically minority (or marginalised languages)• Everyone has a role to play• LT Community needs to speak out more• Show tangible benefits