taus mt showcase, mt@ec for european public administrations and online services, spyridon pilos,...
TRANSCRIPT
Wednesday, 4 June
MT@EC for Europen Public Administra>ons and Online Services
Spyridon Pilos, European Commission
TAUS Machine Transla>on Showcase 2014 Dublin (Ireland)
The research within the project MosesCore leading to these results has received funding from the European Union 7th Framework Programme, grant agreement no 288487
MT@EC European Commission machine translation
for public administrations and digital services in the European Union
Spyridon Pilos Head of Language applications, IT unit
Directorate-General for Translation (DGT)
Dublin, 4 June 2014 2
European Commission machine translation
• European Commission and languages • MT@EC: machine translation for EU users • What next?
3
6
Why do we need machine translation?
• The Commission… • DGT has 1700 translators • Over 2 M pages translated in 2013
• But… …just to make europa.eu fully multilingual
almost 6.8 M documents to be translated or 8 500 translators/year!
The result: Thousands of non-translated documents (and this does not include user generated content)
MT and EC: a long history Started in the 1970s • Eurotra (78-92): research, high expectations • Rule-based ECMT (75-97), costly to develop – not scalable
(18 language pairs in 20 years - coverage of post-2004 languages never attempted- system shut down in 2010
Data-driven systems (Statistical MT) : • cheap and quick to develop… if you have good data • EC needs solution for all EU languages… and has good data EC action plan (2009), Inter-service task force (2010) • The goal: MT@EC offering machine translation for all
languages to and from English, operational in July 2013
MT for understanding (inbound)
MT
L2
L3
…
Ln
L1
Robustness, Coverage Practically unlimited demand; free web-based services cover much of it
Requirements for MT@EC • Provide MT as a (simple and robust) service • Optimise quality for understandability (gisting) • Deal with many domains, document types, formats, … • Scale to huge volumes
Two Usage Scenarios for MT@EC
MT for dissemination (outbound)
Textual quality
MT
L2
L3
…
Ln
L1
Publishable quality can only be authored by humans; Translation Memories & CAT-Tools used by professional translators
• Requirements for MT@EC • Provide MT as a tool within a CAT workflow • Develop new ways to incorporate feedback
• explicit feedback on MT quality, implicit feedback via TM • improvements requiring language-specific knowledge • towards hybrid approaches
• Optimise quality for post-editing
Two Usage Scenarios for MT@EC
MT@EC: a European Commission product •
• Released : 26 June 2013 (version 1.0) • Languages: All 24 EU official languages
552 language pairs (61 direct) • Technology: Statistical machine translation
using open source software Moses co-funded by EU Framework Programmes for research and innovation
• Development by DGT: between 2010-2013 co-funded by the ISA* programme (action 2.8)
• * Interoperability solutions for public administrations
10
• Delivery: - web user interface (human to machine) - web services (machine to machine)
• Security: Host (EC data centre) + access (ECAS) + transfer (sTesta)
• Special features: • Source document format/formatting maintained • Specific output formats for translation: tmx and xliff • Can translate multiple documents to multiple languages • Translation can also be returned by email • Indication of quality for language pairs • Feedback mechanism
11
MT@EC description
Quality evaluation and improvement…
• “Maturity Check” (April-May 2011) • Can baseline MT engines already be used as such? • Identify main sources of problems for various languages,
cluster them across languages • Real-life trial (July 2011-June 2013)
• Make first MT results available to translators • Auto-MT for 10..19 “best” language pairs (now: all) • On-demand MT for others (now all languages get MT)
• Automatic scores • BLEU scores for internal tuning and regression testing • Can help to identify domains/document types where MT
is most useful, but also point to systematic difficulties
… with the help of DGT translators
Maturity check 2011 (EN->X)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%ES FR IT PT RO DE DA NL SV BG CS PL SK SL EL M
T LT LV ET FI HU
useful useless
Romancelanguages
inflected
Germaniclanguages
Slaviclanguages
Balticlang.
analytic
Sem
itic
highly inflected languages
Helle
nic
Finno-Ugric
compositastrong aggluti-nation
DGT's SMT maturity check outcome as a ( ) sentences ratio + morphology
Language differences
+ Aid for typing + time savings + “original” proposed solution + guides the terminological research
From the translator's point of view
— gender/numbers and order of words — can be "fluent", but with mistranslations — omissions and additions — risk of error when incorrect terminology suggested — quality dependent on the quality of the originals
14
15
§ … the staff of European institutions and bodies:
§ European Commission, § European Parliament, § Council of the European Union, § European Court of Justice, § Court of Auditors, § Economic and social committee § Committee of the regions § European Central Bank, § European Investment Bank § Translation Centre § … and more
MT@EC is already available to…
è DGT took into account the needs of translators and other staff when designing the servcie
MT@EC is also integrated into EC digital services
à operational
20
Service Description/URL IMI Internal Market Information System
http://ec.europa.eu/internal_market/imi-net/index_en.html
SOLVIT SOLVIT is an on-line problem solving network concerning missapplication of Internal Market law by public authorities. http://ec.europa.eu/solvit/
è DGT supports and advises for better integration on the customer side
Integration into EC digital services à under development (indicative list)
21
Service Description/URL
nLex A common gateway to National Law http://eur-lex.europa.eu/n-lex/
TED TED (Tenders Electronic Daily) is the online version of the 'Supplement to the Official Journal of the European Union', dedicated to European public procurement. http://ted.europa.eu/
e-Justice The future electronic one-stop-shop in the area of justice. http://e-justice.europa.eu/
Joinup Joinup is an open collaborative platform supporting interoperability in Europe. https://joinup.ec.europa.eu/
Integration into EC digital services à initiated (indicative list)
22
Service Description/URL
ODR Platform to facilitate the resolution of consumer disputes out-of-court (Alternative Dispute Resolution) http://ec.europa.eu/consumers/redress_cons/adr_en.htm
EURES The European Job Mobility portal newtorking the European employment services. https://ec.europa.eu/eures/
EQF The portal supporting the implementation fo the European Qualifications Framework for lifelong learning. http://ec.europa.eu/eqf/home_en.htm
ESCO The multilingual classification of European Skills, Competences, Qualifications and Occupations; identifies and categorises skills and competences, qualifications and occupations in 22 European languages. Supports EURES and other similar portals. https://ec.europa.eu/esco/
MT@EC for public administrations
23
Free real-life trial in 2014: § - Staff can have direct free access to the standard MT@EC
service (upon request)
• - Organisations can participate in a "customisation" pilot project, where DGT builds specific engines with their data (based on bilateral cooperation agreements)
è DGT to understand better their needs and constraints and develop appropriate service delivery models
Customisation pilots • Pilot A: Connect an information system to the standard
MT@EC service. • Pilot B: DGT builds custom engines (their data) available
to all through MT@EC • Pilot C: DGT builds custom engines (their data) available
only to them through MT@EC • Pilot D: DGT builds custom engines (their data) for you to
run in their premises • Pilot E: DGT assists you to build their own custom
engines for you to run in their premises
24
MT@EC: right for the EU
Quality: • built on data derived from EU translations
(Euramis translation memory system: 800 M segments in 24 languages and annual growth rate > 20% )
• designed for EU relevant collaboration • team of computational linguists working with
translators and linguists in DGT • work to improve MT for all EU languages
Security
Customer support
25
MT@EC: what next
26
• CEF (Connecting Europe Facility) • A funding programme for building and deploying
infrastructures. • Includes deploying mature technologies to build, enable and
operate pan- European Digital Services. • Includes an Automated Translation (AT) platform as one of
its core building blocks for digital services. • A key component of the AT platform is MT@EC.
The automated translation platform
27
• To facilitate cross-border information exchange and enable cross-border access to online content and services provided by the digital service infrastructures of the CEF.
• To offer MT services to EU institutions and public administrations in the Member States.
• To build on the existing Commission Machine Translation service (MT@EC)
• Emphasis is placed on secure, quality, customisable machine translation.
è Follow this space: http://ec.europa.eu/digital-agenda/en/connecting-europe-facility