welcome to the cloud! terminology as a service, chat2013

25
Welcome to the Cloud! Terminology as a Service Andrejs Vasiļjevs Tilde tekom 2013 / Wiesbaden / 07.11.2013.

Upload: taus-enabling-better-translation

Post on 20-May-2015

280 views

Category:

Technology


1 download

DESCRIPTION

Presenter: Andrejs Vasiljevs (Tilde) This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

TRANSCRIPT

Page 1: Welcome to the Cloud! Terminology as a Service, CHAT2013

Welcome to the Cloud! Terminology as a Service

Andrejs Vasiļjevs

Tilde

tekom 2013 / Wiesbaden / 07.11.2013.

Page 2: Welcome to the Cloud! Terminology as a Service, CHAT2013

Term identification in the source text Consulting online databases and local files for translation

equivalents Creating and maintaining terminology glossaries Sharing term glossaries and involving others in their

polishing Structuring data in the industry standard formats Integrating term glossaries in CAT and other productivity

tools Keeping terminology up to date etc.

Complexity of terminology works

Page 3: Welcome to the Cloud! Terminology as a Service, CHAT2013

cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data

Terminology as a Service

Page 4: Welcome to the Cloud! Terminology as a Service, CHAT2013

TaaS User Needs Survey Results:Importance of terminology work

43.5%

39.9%

14.8%1.8%

Very important

Quite important

Less important

Not important

Page 5: Welcome to the Cloud! Terminology as a Service, CHAT2013

TaaS User Needs Survey: willingness to share

24.9%

19.2%

14.2%

11.4%

7.6%

6.0%

16.7%

Yes, provided that…

Joint contribution to the DBAccess controlLegal aspectsExternal quality controlLittle effortAnonymityOther

48.6%

22.0%

16.5%

8.3%4.6%

No, because…

Legal restrictions

Poor quality/Lack of time

Own asset

Risk of misunderstanding

60.5% 39.5%

Page 6: Welcome to the Cloud! Terminology as a Service, CHAT2013

Tilde Latvia (Coordinator)

TAUS Netherlands

Kilgray Hungary

Cologne University

of Applied Sciences Germany

University of Sheffield UK

TaaS Partners

Page 7: Welcome to the Cloud! Terminology as a Service, CHAT2013

Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries

Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools

Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data

TaaS Mission

Page 8: Welcome to the Cloud! Terminology as a Service, CHAT2013

Automatic extraction of monolingual term candidates from user uploaded documents

Automatic retrieval of translation equivalents from different public and industry terminology databases

Translation candidate acquisition from multilingual web data

Facilities for cleaning-up by users automatically acquired terminological data;

Data sharing and integration facilities through APIs and export tools

Key services of TaaS

Page 9: Welcome to the Cloud! Terminology as a Service, CHAT2013

Research

Development

Usage

Focus areas

Term extraction

Collection of domain specific multilingual corpora

Max(FTC)

Usability

Outreach

Sustainability

Quality

Performance

Scalability

Interoperability

Page 10: Welcome to the Cloud! Terminology as a Service, CHAT2013

TaaS Services

Page 11: Welcome to the Cloud! Terminology as a Service, CHAT2013

TAUS Datarepository of multilingual translation memories

EuroTermBankdatabank of federated multilingual terminology

IATEinter-institutional termbank of European Union

META-SHARE distributed Pan-European repository of language resources

Target Repositories

Page 12: Welcome to the Cloud! Terminology as a Service, CHAT2013

Support for industry standard formats

Integration into CAT and productivity tools

API to integrate TaaS services into various software applications

Integration

Page 13: Welcome to the Cloud! Terminology as a Service, CHAT2013

Term identification and annotation

Page 14: Welcome to the Cloud! Terminology as a Service, CHAT2013

HTML Term AnnotationTerm entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.

Page 15: Welcome to the Cloud! Terminology as a Service, CHAT2013

XLIFF Term Annotation

Page 16: Welcome to the Cloud! Terminology as a Service, CHAT2013

Identifying and marking terms

Machine users

TaaS Terminology Services

ITS 2.0 enriched content

ITS2.0term-annotated content

export / visualisation

Showcase Web Page

Terminology Annotation

Web Service API

Plaintext

Term-annotated content

ITS 2.0 enriched content

ITS2.0term-annotated

content

CAT Tools MT Systems

ITS 2.0 enriched content

ITS2.0term-annotated

content

Human users(e.g., translators,

terminologists)

New W3C standard for InternationalizationTag Set ITS 2.0

Page 17: Welcome to the Cloud! Terminology as a Service, CHAT2013
Page 18: Welcome to the Cloud! Terminology as a Service, CHAT2013

TaaS Architecture

Presentation Layer

Web Page UI Public API

Application Logic LayerTerminology

collection management

User management

Terminologycollection

search

Terminology collection creation

Data Storage Layer(Shared Term Repository)

High-performance Computing (HPC) Cluster

SGE

External TDBs

CAT tools MT

htt

ps

RES

T

htt

p/h

ttp

sh

tml

htt

ps

RES

T

htt

ps

RES

T

incl

ud

ed

CPUCPU

incl

ud

ed

Shared Term Repository

DB

File Store

Web Browsers

HPC frontend

CPU

CPUCPU CPU

CPUCPU CPU

Term extraction workflowsFull collection

creation workflow

Monolingual collection creation

Translation candidateextraction

....

Modules

Result processing

Collection Importer

Marked Text enrichment

Text tagging

with terms

Statistical DB acquisition

Statistical DB feeding

Bilingual Term Extraction System

Parameter retriever

Translation lookup

ETB & STR

IATE

TAUS API

Statistical DB

Collection merger

CPUCPU CPU

Term extractionTXT extractor

TWSC

Kilgray TermExtractor

Collection creator

Term normalizer

Statistical DB

Page 19: Welcome to the Cloud! Terminology as a Service, CHAT2013

How to instruct SMT to use the right terms?

ko

ks

tim

be

r

Page 20: Welcome to the Cloud! Terminology as a Service, CHAT2013

Put TaaS in the service for MT

Page 21: Welcome to the Cloud! Terminology as a Service, CHAT2013
Page 22: Welcome to the Cloud! Terminology as a Service, CHAT2013

s

do-it-yourselfMT factory on the cloud

Page 23: Welcome to the Cloud! Terminology as a Service, CHAT2013

Narrow Domain Automotive MT

English – Latvian

DATA

2 M unique parallel sentences

1.9 M monolingual sentences

0.2 M in-domain monolingual

QUALITY

16% improvement from terminology integration

Boost in the quality of machine translation

Page 24: Welcome to the Cloud! Terminology as a Service, CHAT2013

Come & Trydemo.taas-project.eu

Page 25: Welcome to the Cloud! Terminology as a Service, CHAT2013

Thank [email protected]

The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312