improving translator productivity with mt: a patent translation case study

21
Improving Translator Productivity with MT a patent translation case study John Tinsley CEO and Co-founder PSLT @ MT Summit. Miami. 30 th October 2015

Upload: iconic-translation-machines

Post on 23-Jan-2018

227 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Improving Translator Productivity with MT: A Patent Translation Case Study

Improving Translator Productivity with MT a patent translation case study

John TinsleyCEO and Co-founder

PSLT @ MT Summit. Miami. 30th October 2015

Page 2: Improving Translator Productivity with MT: A Patent Translation Case Study

We provide Machine Translation solutions with Subject Matter Expertise

MT solutions and services provider, specializing in providing customised solutions with subject matter expertise for specific technical sectors, such as Patents/IP, life sciences, and financial.

Page 3: Improving Translator Productivity with MT: A Patent Translation Case Study
Page 4: Improving Translator Productivity with MT: A Patent Translation Case Study

Pre-processing Post-processing

Input Output

Training Data

Data Engineering

How does that work?

Page 5: Improving Translator Productivity with MT: A Patent Translation Case Study

Chinese pre-ordering rules

StatisticalPost-editing

Input

Output

Training Data

Spanish med-deviceentity recognizer Multi-output

Combination

Korean pharmatokenizer

Patent inputclassifier

Client TM/terminology (optional)

Japanese scriptnormalisation

GermanCompounding rules

Moses

RBMT

Moses

Moses

Domain Adaptation and Data Selection

•  MML with Vocabulary Saturation Filtering (VSF)

•  Language and translation model interpolation (linear/log linear)

•  Terminology extraction using IR

Hybrid is a misnomer

•  Statistical MT•  Syntax-based methods•  Grammar rules•  Example-based templates

On-the-fly system combinationHierarchical models Translation Memory Integration

Syntactic pre/post-ordering Template-driven translation

Combining linguistics, statistics, and MT expertise

The Ensemble ArchitectureTM

Page 6: Improving Translator Productivity with MT: A Patent Translation Case Study

The Challenge of Patents

L is an organic group selected from -CH2-(OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 …

maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C.

Long Sentences

Technical constructions

Largest single document: 249,322 words

Longest Sentence: 1,417 words

Page 7: Improving Translator Productivity with MT: A Patent Translation Case Study

The Challenge of Patents

  Very  long  sentences  as  standard    Gramma1cally  incomplete  using  nominal  and  telegraphic  style  (!)    Passive  forms  are  frequent    Frequent  use  of  subordinate  clauses,  par1ciples,  implicit  constructs    Inconsistent  and  incorrect  spelling    High  use  of  neologisms      Instances  of  synonymy  and  polysemy      Spurious  use  of  punctua1on  

Authoring guide for “to be translated” text

Patents break almost all of the rules!

Page 8: Improving Translator Productivity with MT: A Patent Translation Case Study

IPTranslatorPatent Translation by Iconic Translation Machines

Page 9: Improving Translator Productivity with MT: A Patent Translation Case Study

MT for Information Purposes

MT Application Areas

MT for Post-editing Productivity

•  Development focuses on improving key information translation•  Terminology is important•  Evaluation driven by “usability”

•  Development focuses on reducing edits required•  Feedback loop is crucial•  Evaluation through practical translation tasks

Page 10: Improving Translator Productivity with MT: A Patent Translation Case Study

Lots of different ways to do evaluation–  automatic scores

•  BLEU, METEOR, GTM, TER

–  fluency, adequacy, comparative ranking–  task-based evaluation

•  error analysis, post-edit productivity

Different metrics, different intelligence–  what does each type of metric tell us?–  which ones are usable at which stage of evaluation?

e.g. can we really use automatic scores to assess productivity?

e.g. does productivity delta really tell us how good the output is?

MT Evaluation – where do we start!?

Page 11: Improving Translator Productivity with MT: A Patent Translation Case Study

ProblemLarge Chinese to English patent translation project. Challenging content and language

QuestionWhat if any efficiencies can machine translation add to the workflow of RWS translators?

How we applied different types of MT evaluation and different stages in the process, at various go/no stages, to help RWS to assess whether MT is viable for this project

Client Case Study – RWS

- UK headquartered public company- Founded 1958- 9th largest LSP (CSA 2013 report)- Leader in specialist IP translations

Page 12: Improving Translator Productivity with MT: A Patent Translation Case Study

Can we improve our baseline engines through customisation? Step 1: Baseline and Customisation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

BLEU TER

Iconic Baseline

Iconic Customised

What next?

How good is the output relative to the task, i.e. post-editing?- fluency/adequacy not going to tell us- let’s start with segment level TER

-  Huge improvement

-  Intuitively, scores reflect well but don’t really say anything

-  Let’s dig deeper

Page 13: Improving Translator Productivity with MT: A Patent Translation Case Study

Translation Edit Rate: correlates well with practical evaluations

If we look deeper, what can we learn?

INTELLIGENCE

• Proportion of full matches (i.e. big savings)

• Proportion of close matches (i.e. faster that fuzzy matches)

• Proportion of poor matches

ACTIONABLE INFORMATION

• Type of sentence with high/low matches

• Weaknesses and gaps

• Segments to compare and analyse in translation memory

Page 14: Improving Translator Productivity with MT: A Patent Translation Case Study

TER

sco

re

Step 2: Segment-level automatic analysis

Distribution of segment-level TER scores

This represents a 24% potential productivity gain

segment length

Page 15: Improving Translator Productivity with MT: A Patent Translation Case Study

With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the TAUS Dynamic Quality Framework

Step 3: Productivity testing

Productivity Test

Page 16: Improving Translator Productivity with MT: A Patent Translation Case Study

Productivity Test

Page 17: Improving Translator Productivity with MT: A Patent Translation Case Study

With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the TAUS Dynamic Quality Framework

Beware the variables!•  Translators: different experience, speed, perceptions of MT

–  24 translators: senior, staff, and interns

•  Test sets: not representative; particularly difficult–  2 tests sets, comprising 5 documents, and cross-fold validation

•  Environment and task: inexperience and unfamiliarity–  Training materials, videos, and “dummy” segments

Step 3: Productivity testing

Page 18: Improving Translator Productivity with MT: A Patent Translation Case Study

Overall average

Findings and Learnings

25% productivity gain

Experienced: 22%Staff: 23%

Interns: 30%

Test set 1.1: 25%Test set 1.2: 35%Test set 2.1: 06%Test set 2.2: 35%

Correlates with TER

Rollout with junior staff for more immediate impact on bottom line?

Don’t be over concerned by outliers.Use data to facilitate source content profiling?

What it tells us

By Translator Profile

By Test Set

Page 19: Improving Translator Productivity with MT: A Patent Translation Case Study

Look our for anomalies–  segments with long timings (above average ratio words/minute)–  sentences that don’t change much from MT to post-edit–  segments with unusually short timings

In this case, the next step is production roll-out to validate these in the actual translator workflow over an extended period.

Warnings, Tips, and Next Steps

Now would be the right time to do fluency/adequacy if you need to verify that post-editing is producing, at least, similar quality output

Page 20: Improving Translator Productivity with MT: A Patent Translation Case Study

“The biggest room in the world is the room for improvement”

Page 21: Improving Translator Productivity with MT: A Patent Translation Case Study

Thank You! [email protected]

@IconicTrans