large-scale deployment of statistical machine translation · large-scale deployment of statistical...

10/26/2008

1

Large-scale deployment of statistical machine translation

Example Microsoft

[email protected]

Microsoft Research – Machine Translation

Agenda

• Microsoft MT engine basics

• Architecture and design for scale

• Translator in Practice

• Microsoft internal use: Human Translation and Raw Publishing

http://www.amtaweb.org/AMTA2008.html

10/26/2008

2

Time Line

1991

•Microsoft Research is founded, with NLP as one of its first research areas

•NLP team is active in rule-based parsing and grammar checking

1996•Grammar Checker in Word ‘97

1999•Work on Machine Translation begins

2003

•V1: First public visibility with the Microsoft Knowledge Base

•Example-based system: V1 of Microsoft Translator

2005

•V2: Switch to Treelet systems for all from English language pairs

•Treelet system consitutes V2 of Microsoft Translator

2007

•First consumer availability at http://translator.live.com in 2007

•Mixed Systran and Microsoft Translator V2 deployment

2008

•Adding a phrasal systems for all to English language pairs

•http://translator.live.com powered exclusively by Microsoft‘s own systems

Microsoft’s Statistical MT Engine

HTML handling

Sentence breaking

Source

language

parser

Syntactic tree based decoder

Source

language

word breaker

Surface string based decoder

Rule-based post

processing

Case restoration

Syntactic

reordering

model

Contextual

translation

model

Syntactic word

insertion and

deletion model

Target

language

model

Distance and

word-based

reordering

Languages with source

parser

Other source languages

Models

10/26/2008

3

Training Architecture

Parallel

Data

Source/Target

word breaking

Source language

parsing

Syntactic

reordering

model

Contextual

translation

models

Syntactic word

insertion and

deletion model

Target

language

model

Distance and

word-based

reordering

Target

language

monolingual

data

Word alignment

Treelet +

Syntactic structure

extraction

Language

model

training

Phrase table

extraction

Surface

reordering

training

Syntactic models

training

Case

restoration

model

Discrim. Train

model weights

Model

weights

Treelet table

extraction

Runtime Architecture

Watchdog #1

Monitor, reset, restart

…..

…..

Model Server #1

Model Server #n

…..

Watchdog #1

Monitor, reset, restart

Internet

Translator #1

Translator #2

Translator #3

Translator #n-1

Translator #n

Front Door Machine

#1

User Interface

Sentence Breaking

Front Door Machine

#n

User Interface

Sentence Breaking

Tra

ffic

Dis

trib

utio

n

10/26/2008

4

Front Door

• Microsoft Internet Information Server • Landing Page

– HTTP interface for Bilingual Viewer• Fetches web page, sentence & html breaking, creates

marked up version• Sends page to client, asynchronously fills translation

requests

• Distributor– SOAP API– Distributes sentences to multiple leaves– In memory cache of sentence translations

Automatic evaluation: BLEU

• A fully automated MT evaluation metric– Modified N-gram precision, comparing a test sentence to

reference sentences

• Automatic and cheap: runs daily and for every check-in

• Standard in the MT community– Immediate, simple to administer

– Shown to correlate with human judgments

• Warning: Does not compare between engines or between languages.

10/26/2008

5

• 3 to 5 independent human evaluators are asked to rank translation quality for 250 sentences on a scale of 1 to 4– Comparing to human translated sentence– No source language knowledge required

4 = IdealGrammatically correct, all information included

3 = AcceptableNot perfect, but definitely comprehensible, and with accurate transfer of all important information

2 = Possibly AcceptableMay be interpretable given context/time, some information transferred accurately

1 = UnacceptableAbsolutely not comprehensible and/or little or not information transferred accurately

• Each sentence is evaluated by all raters, and scores are averaged• Relative evaluations

– Track progress against ourselves and a competitor

Human evaluations

9

Language pairs on translator.live.com

en_es18%

en_de15%

en_pt13%

en_zh-chs7%

en_fr6%

es_en6%

pt_en4%

en_it4%

en_ar3%

en_ja3%

en_zh-cht3%

en_ko3%

de_en2%

en_es

en_de

en_pt

en_zh-chs

en_fr

es_en

pt_en

en_it

en_ar

en_ja

en_zh-cht

en_ko

de_en

other The fact to note in this distribution is

the relative popularity of the

English>German language pair

among consumers, in contrast to the

lack of popularity for this language

pair among the technical audience.

10/26/2008

6

Products

• Bilingual Viewer– Used by Live Search results page

• Translator landing page• Toolbar Translator Button• Translator Add-in for 3rd party pages• Internet Explorer 8 accelerator• Community built Firefox version• Translator Bot ([email protected])• Office Research Pane• SOAP API for product team use• Microsoft Localization

– CSS KB, MSDN Technet, Products

Two ways to apply MT in a product

• Post-Editing

– Increase human translator’s productivity

– In practice: 0% to 25% productivity increase• Varies by content, style

and language

• Raw publishing

– Publish the output of the MT system directly to end user

– Best with bilingual UI

– Good results with IT Pro and Developer audience

Increasing the extent of localization

10/26/2008

7

MT with post-editing

Source TargetApply TM on

>85% match

Translation

memory (TM)

Apply MT on the

rest

Human

editing

Product 1: Post-editing ResultsWithout specific post-editor training

8%

-3.80%

-12%

-9.20%

11%

1.80%

-23%

-30% -20% -10% 0% 10% 20%

French

Italian

German

Spanish

Chinese S.

Chinese T.

Japanese

Productivity gain/loss


10/26/2008

8

Product 2: Post-Editing ResultsA couple of weeks later: with training

14.50%

20.00%

8%

28.60%

6.10%

14.70%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00%

French

Brazilian

Swedish

Danish

Czech

Dutch

Productivity gain

Productivity gain

Product 3: Post-editing ResultsWith training

11.89%

33.35%

29.12%

10.98%

22.01%

6.56%

-4.05%

-1.90%

-10.00% 0.00% 10.00% 20.00% 30.00% 40.00%

French

Italian

German

Spanish

Chinese S.

Chinese T.

Japanese

Brazilian



10/26/2008

9

Post-Editing: Lessons Learned

• Training of the translator is required– Understand the peculiarities of the engine used– Always read the source sentence first– Understand when to discard the MT

• “Two seconds is Too much”

• Acknowledge different suitability for different style and terminology

• Customize terminology per individual project –use of dictionary

• Productivity gains of 5% to 25% are achievable, but investment is required

Raw MT Publishing

Source TargetApply TM on

100% match

Translation

memory (TM)

Apply MT on the

rest

10/26/2008

10

10/26/2008

11

History of MT in Customer Support

• Since 2003 CSS has been actively using Machine Translation for Knowledge Base articles

– Spanish was the first language deployed

– Japanese went live one year later

• Current Languages

– 10 Languages deployed: Spanish, German, French, Italian, Japanese, Portuguese, Brazilian Portuguese, Chinese Simplified, Chinese Traditional, Arabic

– 3 Languages in Testing: Korean, Turkish and Russian

Microsoft Knowledge Base

Language

Human translated, or originally authored in

language %

English 235,425 100%

Japanese 70,684 27%

French 35,310 14%

German 30,459 12%

Spanish 16,980 7%

Italian 14,401 6%

Chinese (Simplified) 12,873 5%

Chinese (Traditional) 10,372 4%

Portuguese (Brazil) 10,205 4%

Portuguese (Iberian) 7,129 3%

Arabic 2,152 1%

MT & HT distribution across languages

Traffic to the knowledge base is

fairly unevenly distributed. By

targeting human translation to the

high page view articles, 80% of the

Japanese total page views are for

human translated articles.

Even in Arabic 54% of the page

views end up on human quality

articles.

10/26/2008

12

Customer Feedback: KB Inline Survey

Knowledge Base – average resolve rate of human translated vs. machine translated articles

31.8%

35.3%

35.4%

22.5%

25.0%

33.3%

27.8%

27.6%

28.7%

29.2%

25.4%

32.6%

29.0%

20.9%

18.7%

26.5%

17.8%

23.3%

28.7%

24.1%

Arabic

Chinese (Simplified)

Chinese (Traditional)

French

German

Italian

Japanese

Portuguese

Portuguese (Brazil)

Spanish

Machine Translation Human Translation

English 25.5%

10/26/2008

13

Global English• Support started rewriting source language to account

for MT in 2003 (6 months after Spanish MT was deployed)

• Retrained the writers to write with global audience and MT in mind.

• Top five rules to make source language content suitable for MT:

1. Use Standard English writing style2. Use correct punctuation – especially the following:

– Missing punctuation causing incorrect sentence break– Hyphens– Commas

3. Eliminate long sentences4. Use capitalization correctly5. Use correct spelling

Impact of Global EnglishResolve rate of articles authored to standard guidelines

19%

23%

20%

24%

18%

25%

22%

26%

27%

22%

19%

22%

24%

34%

20%

27%

36%

31%

40%

25%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

DE - German

ES - Spanish

FR-French

IT - Italian

JA - Japanese

PT - Portuguese

PT-BR - Portuguese Brazil

ZH-CN - Chinese Simplified

ZH-TW - Chinese Traditional

MT Languages Combined

% Yes - Global English % Yes - Old Article

10/26/2008

14

Translation Wiki

Benefits of using MT

• Localize into more languages without increasing budget

Larger language set

• Increase the extent of localization without proportional budget increaseLocalize More

• 5% to 25% productivity increase

• >25% in software localization

Higher Productivity

• Remove delay in translation

• Especially desired by technical audience

Faster Availability

28

10/26/2008

15

Conclusions

• Automation enriches the customer’s experience

• Long term investments in tools and processes are required, and patience in seeing results is needed

• Reaching customers through multiple forums and media is important

• Metrics are more useful than opinions

• Using customer feedback and community provides better solutions

Thank you

http://www.amtaweb.org/AMTA2008.html

10/26/2008

16

References

• Menezes, Arul, Kristina Toutanova and Chris Quirk. Microsoft Research Treelet translation system: NAACL 2006 Europarl evaluation. Workshop on Machine Translation, NAACL 2006

• Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43--65

large-scale deployment of statistical machine translation · large-scale deployment of statistical...

Documents