medical-domain machine translation in kconnect · medical-domain machine translation in kconnect...

Post on 23-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect).

Medical-domain Machine Translation in KConnect

Pavel PecinaCharles University, PragueFaculty of Mathematics and PhysicsInstitute of Formal and Applied LinguisticsCzech Republic

Apr 4th, 2017 – QT21 workshop, Valencia, Spain

Outline

● Context of the project (Khresmoi)

● Project details goals and objectives

● Role of MT in the project

● Industry requirements/constraints

● Solutions and tools

● Prototypes/Demos

● What is still needed

Khresmoi

● „Collect and make sense of biomedical information, then make it freely and easily available in several languages.“

● FP7-ICT, No. 257528, Collaborative project

● Total cost: EUR ~10M, 2010/09-2014/08

● Topic: ICT-2009.4.3 - Intelligent Information Management

● Coordinator: Henning Müller, University of Applied Sciences Western Switzerland, Sierre

● Consortium: 12 institutions

● http://www.khresmoi.eu/

Khresmoi objectives

● Effective automated information extraction from (unstructured) biomedical documents

● Linking information extracted from unstructured biomedical texts/images to structured information in knowledge bases

● Support of cross-language search, including multi-lingual queries, and returning machine-translated pertinent excerpts

● Adaptive user interfaces to assist in formulating queries and display search results via ergonomic/interactive visualizations

● Automated analysis and indexing for medical images

Khresmoi results (MT related)

● MT component to allow cross-lingual search and access

● Based on Moses and domain-adaptation techniques

● Deployed as (cloud-based) web-service

● Translation in two „modes“:– Translation of search queries from user languages to the

documents languages (query translation)– Translation of sentences from automaticaly created

summaries of medical documents (summary translation)

● Languages: Czech, German, French ↔ English

KConnect – a follow-up of Khresmoi

● „Development and commercialization of cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications.“

● H2020 project, No. 644753, Innovation action

● Total cost: EUR ~4M, 2015/02–2017/07

● Topic: ICT-15-2014 Big data and open data innovation and take-up

● Coordinator: Allan Hanbury, Technical University in Viena

● Consortium: 10 institutions (5 from Khresmoi)

● http://www.kconnect.eu

Consortium● Academia:

– Technische Universitaet Wien (Austria) – coordination

– University of Sheffield (United Kingdom)

– King’s College London (United Kingdom)

– Charles University, Prague (Czech Republic)

● Industry:

– Findwise AB (Sweden)

– Precognox Informatikai Kft (Hungary)

– Ontotext AD (Bulgaria)

– Trip Database Ltd (United Kingdom)

– Health on the Net Foundation (Switzerland)

– Jonopkins Lan (Sweden)

KConnect objectives

● Productisation of the multilingual medical text processing tools developed in Khresmoi.

● Creating professional services community of companies trained to build solutions based on the KConnect Services.

● Development of toolkits for straightforward adaptation of the commercialised services to new languages.

● Adapting the services to Electronic Health Records processing, which is particularly challenging due to misspellings, neologisms, organisation-specific acronyms, etc.

● Languages: Hungarian, Polish, Spanish, Swedish ↔ English

MT Application Scenarios

1. Query translation– Translation of medical/health-related search queries from a

user language to the document language(s)– Queries usually non-grammatical, short sequences of terms– Lay-people queries vs. expert queries

2. Summary translation – Sentences taken from automaticaly created abstracts of

medical documents translated back to the user language– Usually longer, highly informative sentences

Requirements, constraints

● Requirements– Cloud-based solution, easily accessible as webservice– Local instalation (hospitals)– Instant response, scalable – Low computation resources (local instalations)– Easily (re)trainable

● Constraints– No (very limited) domain-specific in-house training data

Solutions and tools

● Moses (phrase-based, domain adaptated)

● MT Monkey – MT webservice architecture

● Eman Lite – MT traninig pipeline

● Manually translated dev/test sets for medical domain

● Training data colllected and made available for WMT 17

MT Monkey

● Webservice architecture

● Developed at CUNI within Khresmoi

● Activelly extended and maintained within KConnect

● Scalable (see Tamchyna et al, 2013 for evaluation)

● Recently Dockerized

Eman Lite

● fully automated MT system training

● command-line and web-based interface

Prototypes/demos

● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)

● Health-on-the-Net Search– http://everyone.khresmoi.eu/– Health-focused web-search engine– Readability and trustablity prediction

● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/– http://quest.ms.mff.cuni.cz/khresmoi/client/

Trip Database Search

Trip Database Search

HON Search

HON Search

HON Search

HON Search (new version)

HON Search (new version)

HON Search (new version)

Prototypes/demos

● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)

● Health-on-the-Net Search– http://everyone.khresmoi.eu/– http://jupiter.honservices.org/beta/– Health-focused web-search engine– Readability and trustablity prediction

● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/

Issues

● Availability of (in-domain) training data

● Training data licences not clear (UMLS,MeSH, SnomedCT)

● Translation quality for some languages (e.g. Hungarian)

● Lay-people language vs. expert language

top related