large-scale deployment of statistical machine translation · large-scale deployment of statistical...
TRANSCRIPT
10/26/2008
1
Large-scale deployment of statistical machine translation
Example Microsoft
Microsoft Research – Machine Translation
Agenda
• Microsoft MT engine basics
• Architecture and design for scale
• Translator in Practice
• Microsoft internal use: Human Translation and Raw Publishing
10/26/2008
2
Time Line
1991
•Microsoft Research is founded, with NLP as one of its first research areas
•NLP team is active in rule-based parsing and grammar checking
1996•Grammar Checker in Word ‘97
1999•Work on Machine Translation begins
2003
•V1: First public visibility with the Microsoft Knowledge Base
•Example-based system: V1 of Microsoft Translator
2005
•V2: Switch to Treelet systems for all from English language pairs
•Treelet system consitutes V2 of Microsoft Translator
2007
•First consumer availability at http://translator.live.com in 2007
•Mixed Systran and Microsoft Translator V2 deployment
2008
•Adding a phrasal systems for all to English language pairs
•http://translator.live.com powered exclusively by Microsoft‘s own systems
Microsoft’s Statistical MT Engine
HTML handling
Sentence breaking
Source
language
parser
Syntactic tree based decoder
Source
language
word breaker
Surface string based decoder
Rule-based post
processing
Case restoration
Syntactic
reordering
model
Contextual
translation
model
Syntactic word
insertion and
deletion model
Target
language
model
Distance and
word-based
reordering
Languages with source
parser
Other source languages
Models
10/26/2008
3
Training Architecture
Parallel
Data
Source/Target
word breaking
Source language
parsing
Syntactic
reordering
model
Contextual
translation
models
Syntactic word
insertion and
deletion model
Target
language
model
Distance and
word-based
reordering
Target
language
monolingual
data
Word alignment
Treelet +
Syntactic structure
extraction
Language
model
training
Phrase table
extraction
Surface
reordering
training
Syntactic models
training
Case
restoration
model
Discrim. Train
model weights
Model
weights
Treelet table
extraction
Runtime Architecture
Watchdog #1
Monitor, reset, restart
…..
…..
Model Server #1
Model Server #n
…..
Watchdog #1
Monitor, reset, restart
Internet
Translator #1
Translator #2
Translator #3
Translator #n-1
Translator #n
Front Door Machine
#1
User Interface
Sentence Breaking
Front Door Machine
#n
User Interface
Sentence Breaking
Tra
ffic
Dis
trib
utio
n
10/26/2008
4
Front Door
• Microsoft Internet Information Server • Landing Page
– HTTP interface for Bilingual Viewer• Fetches web page, sentence & html breaking, creates
marked up version• Sends page to client, asynchronously fills translation
requests
• Distributor– SOAP API– Distributes sentences to multiple leaves– In memory cache of sentence translations
Automatic evaluation: BLEU
• A fully automated MT evaluation metric– Modified N-gram precision, comparing a test sentence to
reference sentences
• Automatic and cheap: runs daily and for every check-in
• Standard in the MT community– Immediate, simple to administer
– Shown to correlate with human judgments
• Warning: Does not compare between engines or between languages.
10/26/2008
5
• 3 to 5 independent human evaluators are asked to rank translation quality for 250 sentences on a scale of 1 to 4– Comparing to human translated sentence– No source language knowledge required
4 = IdealGrammatically correct, all information included
3 = AcceptableNot perfect, but definitely comprehensible, and with accurate transfer of all important information
2 = Possibly AcceptableMay be interpretable given context/time, some information transferred accurately
1 = UnacceptableAbsolutely not comprehensible and/or little or not information transferred accurately
• Each sentence is evaluated by all raters, and scores are averaged• Relative evaluations
– Track progress against ourselves and a competitor
Human evaluations
9
Language pairs on translator.live.com
en_es18%
en_de15%
en_pt13%
en_zh-chs7%
en_fr6%
es_en6%
pt_en4%
en_it4%
en_ar3%
en_ja3%
en_zh-cht3%
en_ko3%
de_en2%
en_es
en_de
en_pt
en_zh-chs
en_fr
es_en
pt_en
en_it
en_ar
en_ja
en_zh-cht
en_ko
de_en
other The fact to note in this distribution is
the relative popularity of the
English>German language pair
among consumers, in contrast to the
lack of popularity for this language
pair among the technical audience.
10/26/2008
6
Products
• Bilingual Viewer– Used by Live Search results page
• Translator landing page• Toolbar Translator Button• Translator Add-in for 3rd party pages• Internet Explorer 8 accelerator• Community built Firefox version• Translator Bot ([email protected])• Office Research Pane• SOAP API for product team use• Microsoft Localization
– CSS KB, MSDN Technet, Products
Two ways to apply MT in a product
• Post-Editing
– Increase human translator’s productivity
– In practice: 0% to 25% productivity increase• Varies by content, style
and language
• Raw publishing
– Publish the output of the MT system directly to end user
– Best with bilingual UI
– Good results with IT Pro and Developer audience
Increasing the extent of localization
10/26/2008
7
MT with post-editing
Source TargetApply TM on
>85% match
Translation
memory (TM)
Apply MT on the
rest
Human
editing
Product 1: Post-editing ResultsWithout specific post-editor training
8%
-3.80%
-12%
-9.20%
11%
1.80%
-23%
-30% -20% -10% 0% 10% 20%
French
Italian
German
Spanish
Chinese S.
Chinese T.
Japanese
Productivity gain/loss
Productivity gain/loss
10/26/2008
8
Product 2: Post-Editing ResultsA couple of weeks later: with training
14.50%
20.00%
8%
28.60%
6.10%
14.70%
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00%
French
Brazilian
Swedish
Danish
Czech
Dutch
Productivity gain
Productivity gain
Product 3: Post-editing ResultsWith training
11.89%
33.35%
29.12%
10.98%
22.01%
6.56%
-4.05%
-1.90%
-10.00% 0.00% 10.00% 20.00% 30.00% 40.00%
French
Italian
German
Spanish
Chinese S.
Chinese T.
Japanese
Brazilian
Productivity gain/loss
Productivity gain/loss
10/26/2008
9
Post-Editing: Lessons Learned
• Training of the translator is required– Understand the peculiarities of the engine used– Always read the source sentence first– Understand when to discard the MT
• “Two seconds is Too much”
• Acknowledge different suitability for different style and terminology
• Customize terminology per individual project –use of dictionary
• Productivity gains of 5% to 25% are achievable, but investment is required
Raw MT Publishing
Source TargetApply TM on
100% match
Translation
memory (TM)
Apply MT on the
rest
10/26/2008
10
10/26/2008
11
History of MT in Customer Support
• Since 2003 CSS has been actively using Machine Translation for Knowledge Base articles
– Spanish was the first language deployed
– Japanese went live one year later
• Current Languages
– 10 Languages deployed: Spanish, German, French, Italian, Japanese, Portuguese, Brazilian Portuguese, Chinese Simplified, Chinese Traditional, Arabic
– 3 Languages in Testing: Korean, Turkish and Russian
Microsoft Knowledge Base
Language
Human translated, or originally authored in
language %
English 235,425 100%
Japanese 70,684 27%
French 35,310 14%
German 30,459 12%
Spanish 16,980 7%
Italian 14,401 6%
Chinese (Simplified) 12,873 5%
Chinese (Traditional) 10,372 4%
Portuguese (Brazil) 10,205 4%
Portuguese (Iberian) 7,129 3%
Arabic 2,152 1%
MT & HT distribution across languages
Traffic to the knowledge base is
fairly unevenly distributed. By
targeting human translation to the
high page view articles, 80% of the
Japanese total page views are for
human translated articles.
Even in Arabic 54% of the page
views end up on human quality
articles.
10/26/2008
12
Customer Feedback: KB Inline Survey
Knowledge Base – average resolve rate of human translated vs. machine translated articles
31.8%
35.3%
35.4%
22.5%
25.0%
33.3%
27.8%
27.6%
28.7%
29.2%
25.4%
32.6%
29.0%
20.9%
18.7%
26.5%
17.8%
23.3%
28.7%
24.1%
Arabic
Chinese (Simplified)
Chinese (Traditional)
French
German
Italian
Japanese
Portuguese
Portuguese (Brazil)
Spanish
Machine Translation Human Translation
English 25.5%
10/26/2008
13
Global English• Support started rewriting source language to account
for MT in 2003 (6 months after Spanish MT was deployed)
• Retrained the writers to write with global audience and MT in mind.
• Top five rules to make source language content suitable for MT:
1. Use Standard English writing style2. Use correct punctuation – especially the following:
– Missing punctuation causing incorrect sentence break– Hyphens– Commas
3. Eliminate long sentences4. Use capitalization correctly5. Use correct spelling
Impact of Global EnglishResolve rate of articles authored to standard guidelines
19%
23%
20%
24%
18%
25%
22%
26%
27%
22%
19%
22%
24%
34%
20%
27%
36%
31%
40%
25%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
DE - German
ES - Spanish
FR-French
IT - Italian
JA - Japanese
PT - Portuguese
PT-BR - Portuguese Brazil
ZH-CN - Chinese Simplified
ZH-TW - Chinese Traditional
MT Languages Combined
% Yes - Global English % Yes - Old Article
10/26/2008
14
Translation Wiki
Benefits of using MT
• Localize into more languages without increasing budget
Larger language set
• Increase the extent of localization without proportional budget increaseLocalize More
• 5% to 25% productivity increase
• >25% in software localization
Higher Productivity
• Remove delay in translation
• Especially desired by technical audience
Faster Availability
28
10/26/2008
15
Conclusions
• Automation enriches the customer’s experience
• Long term investments in tools and processes are required, and patience in seeing results is needed
• Reaching customers through multiple forums and media is important
• Metrics are more useful than opinions
• Using customer feedback and community provides better solutions
Thank you
10/26/2008
16
References
• Menezes, Arul, Kristina Toutanova and Chris Quirk. Microsoft Research Treelet translation system: NAACL 2006 Europarl evaluation. Workshop on Machine Translation, NAACL 2006
• Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43--65