how does your mt system measure up? tekom/tcworld 2014

26
Tony O’Dowd Founder & Chief Architect C7.1: How does your MT system measure up?

Upload: kantanmt

Post on 03-Jul-2015

427 views

Category:

Technology


1 download

DESCRIPTION

KantanMT Founder and Chief Architect Tony O’Dowd presented at the annual tekom Trade Fair for Technical Communication on the 12th November as part of the GALA track. The tekom trade fair is organized by tcworld and is the biggest technical communication event worldwide. The presentation, entitled; ‘How Does Your Machine Translation System Measure Up?’ outlines how to measure the performance of your MT engines and the efficiency of your translation processes. This presentation is aimed towards professionals in the localization industry. Key Discussion Points: • Measuring performance of Statistical MT • Recent advances in MT and data visualization techniques • Tracking MT efficiency in the translation process Please contact Louise Irwin ([email protected]) for more information

TRANSCRIPT

Tony O’DowdFounder & Chief Architect

C7.1: How does your MT system

measure up?

Measuring MT Quality

What we aim to cover today?

What is KantanMT.com?

State of the Nation

Current MT Quality Measurements

Comparative Quality Measurement

Future Directions

Predictive Quality Measurements

Conclusions

Q&A

45 minutes

What is KantanMT.com? Cloud-based SMT/Hybrid

Highly scalable

Inexpensive to operate

Quick to access, learn and deploy

Our Vision To put Machine Translation…

Customization

Improvement

Deployment… into your hands

Your Benefits Faster Project Turn-arounds

Increased Productivity

Lower Costs

Increased Production Capacity

Active KantanMT Engines

6,341

Training Words Uploaded

46,051,110,634

Member Words Translated

538,291,925

Fully Operational 16 months

Measuring MT Quality

KantanMT Community

Measuring MT Quality

The Quality & MT Relationship

Let’s agree a model for defining quality!

Taking into consideration quality of MT outputs and level of quality defined by your clients.

Quality Target (defined by client)

No Quality (baseline)

Measuring MT Quality

Attributes of Quality

Fluency

Adequacy

Productivity

Acceptability

Language Attributes Task-oriented Attributes

Language Task

Attributes of Quality – Model

Adequacy Meaning of generated texts

expressed in source/target

Fluency Comprehensibility & readability

Factors include

Grammar errors

word selection

syntax

Productivity Post-editing speed

Acceptability Fit-for-purpose measurement

Usable translations within the context of the end user/client

Measuring MT Quality

Attributes of Quality

Fluency

Adequacy

Productivity

Acceptability

Language Attributes Task-oriented Attributes

Language Task

Attributes of Quality – Model

Adequacy Meaning of generated texts

expressed in source/target

Fluency Comprehensibility & readability

Factors include

Grammar errors

word selection

syntax

Productivity Post-editing speed

Acceptability Fit-for-purpose measurement

Usable translations within the context of the end user/client

Translation Style Business Model

Measuring MT Quality

Attributes of Quality

Fluency

Adequacy

Productivity

Acceptability

Language Task

Attributes of Quality – Model

Translation Style Business Model

FuzzyMatch

Language Attributes Task-oriented Attributes

Measuring MT Quality

Types of MT Quality Measurement

Comparative Measurements

Uses a reference translation to calculate:-

Word recall & precision

Text Similarities

Word Order correlations

Linguistic similarities

Approach

Comparing MT output to a reference known translation

Measuring MT Quality

Comparative Measurements

F-Measure Recall & Precision Metric

Flaw: no penalty for reordering

Reference Translation

MT Output

Precision

correctMT-Len

66%

Recall

correctRef-Len

80%

F-Measure

Precision * Recall(Precision + Recall) /2

73%

Measuring MT Quality

Comparative Measurements

TER (Tranlsation Error Rate) Min number of edits to transform output to match reference

Levenshtein distance measure

General indicator of Post-Editing Effort

Reference Translation

MT Output

TER

Substitutions + insertions + deletionsReference-length

Measuring MT Quality

Comparative Measurements

BLEU Score

Put simply – measures how many words overlap, giving higher scores to sequential words

High correlation between BLEU and human judgement of translation quality

Reference Translation

MT Output

Measuring MT Quality

F-Measure Score

Recall & Precision calculation

Closely linked to the relevancy of word systems

Comparative Measurements

Kantan BuildAnalytics™

Measuring MT Quality

BLEU Score

Improvement upon F-Measure

Takes word-order into consideration

Linked to a sense of translation ‘fluency’

Comparative Measurements

Kantan BuildAnalytics™

Measuring MT Quality

Comparative Measurements

TER Score

A method to help predict the post-editing effort

TER is quick to use and correlates highly with actual post-editing effort

Kantan BuildAnalytics™

Measuring MT Quality

Comparative Measurements

Fluency

Adequacy

Productivity

Acceptability

Language Task

F-Measure TER

NIST

GTM

BLEU

METEOR

Attributes of Quality – Model

Translation Style Business Model

Language Attributes Task-oriented Attributes

Measuring MT Quality

The Quality & MT Relationship

NISTGTMBLEU

F-Measure

TER METEOR

Measuring MT Quality

Conclusions: Comparative

Measurements

Useful for

Engine Development Baseline measurements

Determination of ‘possible’ engine quality and relevancy

Reference set of comparative translations required Does not work on unseen translations

Of limited use in determining PE effort

Resources

Costs Kantan BuildAnalytics™

Measuring MT Quality

Types of MT Quality Measurement

Predictive Measurements

No reference texts required

Used to predict project

Scope

Cost

Resources

Billables / Chargeables –Profit

Like FuzzyMatch for MT!

Measuring MT Quality

Predictive Measurements

Quality Estimation Score

Predicts quality of translations from MT engine

Correlates closely to post-editing effort

Creates potential for tiered pricing model

Measuring MT Quality

Predictive Measurements

KantanAnalytics™ The Power of 2! Combined TM & MT

measurements Predictive, not comparative

Benefits Tiered Pricing Model Prioritise PE activity Schedule Resources Cost Seamlessly integrated into

all CAT tools

KantanAnalytics™ - a predictive quality estimation technology

Measuring MT Quality

Predictive Measurements

Fluency

Adequacy

Productivity

Acceptability

Language Task

F-Measure TER

NIST

GTM

BLEU

METEOR

Attributes of Quality – Model

Translation Style Business Model

Language Attributes Task-oriented Attributes

KantanAnalytics™ - MT Quality Estimation

aka FuzzyMatch

Measuring MT Quality

Predictive Measurements

NISTGTMBLEU

F-Measure

TER METEOR

MT Quality Estimation

MT

De

velo

per

s

Pro

du

ctio

n

Measuring MT Quality

Conclusions

Automated scores are only useful for MT developers

No practical use to consumers of MT services

Predictive Quality Estimation is a must have for MT vendors

It creates the potential for tiered pricing, predictive quality and reliable MT outputs

The most progressive MT systems provide both measurement types!

MT systems that don’t are dinosaurs!

Tony O’Dowd

[email protected]

Tony O’Dowd

[email protected]