what is quality? a machine translation perspective

No Hardware. No Software. No Hassle MT.

Machine Translation & Quality

Machine Translation and Quality


What we aim to cover?The MT & Quality Relationship

What is quality?Possible ways of measuring it

Automated evaluation methodsWho needs to measure quality

Localisation stakeholdersConclusion


The Quality & MT Relationship


Attributes of QualityLanguage Attributes

Adequacy Accuracy of generated texts Based on word recall & precision

Fluency Comprehensibility of texts Readability, understandability Based on phrase reuse and

assembly

Task-oriented AttributesProductivity

Post-editing speedAcceptability

Fit-for-purpose measurement Usable translations within the

context of the end user


Automated EvaluationsMany difference techniques available

All compute similarity of generated texts to reference texts The smaller the difference => the better the quality!

Language Task

F-Measure TER

NIST

GTM

BLEU

METEOR

Fluency

Adequacy

Usability

Productivity

Acceptability


Who needs to measure Quality?The Localisation Stakeholder Dilemma

Developers of MT Engines Automated BLEU, METEOR, F-MEASURE, TER ideal and practical No individual measurement has absolute meaning

but points quality curve in the right direction within a domain


Who needs to measure Quality?The Localisation Stakeholder Dilemma

Production Teams (PMs, LEs and QEs) Need segment measurements on quality and PE efforts

Determine tiered segment post-edit rate Distribution of post-editing tasks based on segment quality

Localisation Managers Need productivity measurements to predict budget and schedule

Aka Project Segment Reports MT Measurements need to ‘fit’ business planning and charge models

Translators Unfortunately, don’t get a fair deal

No segment information, just top level project


The Quality & MT Relationship

NISTGTMBLEU

F-Measure

TERMETEO

R

MT

Dev

elop

ers

Prod

uctio

n


ConclusionsThere are many automated MT quality measurements

Mostly suitable for MT developers Not optimal for production teams Of no use to translators

All rely on reference texts to compute measurementsWhat’s needed?

Segment level measurements Drive project schedule and charge model High correlation to human effort

Do not rely on reference texts to compute measurements

what is quality? a machine translation perspective

Technology