2013 alc boston: your trained moses smt system doesn't work. what can you do?
DESCRIPTION
10 Decisions to make before starting to use Machine Translation (MT), including details on how to improve MT engines.TRANSCRIPT
![Page 1: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/1.jpg)
Your Trained Moses SMT System doesn't work.
What can you do?
Diego Bartolome, CEO tauyou <language technology>[email protected]@diegobartolome
![Page 2: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/2.jpg)
Where are you now?
![Page 3: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/3.jpg)
Where are you now?
![Page 4: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/4.jpg)
Why Machine Translation?
Strategic decision
Increase sales
Shorten delivery times
Reduce costs
Differentiation
Forced decision
Clients ask for it!
![Page 6: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/6.jpg)
Welcome to the jungle
![Page 7: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/7.jpg)
Decision 1: Internal – external
Core competence
Resources
ROI
Time to market
![Page 8: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/8.jpg)
Decision 1: Internal – external
Core competence
Resources
ROI
Time to market
![Page 9: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/9.jpg)
MT Costs
Internal development
Free tools
DOiY solutions
Traditional pricing model
tauyou managed solution
![Page 10: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/10.jpg)
Decision 2: MT Type (I)
Rule-based MT
Statistical MT
Hybrid MT
![Page 11: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/11.jpg)
Decision 2: MT Type (II)
Do we really care?
![Page 12: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/12.jpg)
Decision 3: Languages (I)
Source: translate.autodesk.com
![Page 13: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/13.jpg)
Decision 3: Languages (II)
Source: Philipp Koehn
![Page 14: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/14.jpg)
Decision 4: Domains
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
![Page 15: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/15.jpg)
Decision 5: Workflow
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
![Page 16: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/16.jpg)
Decision 6: Feedback
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
![Page 17: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/17.jpg)
Decision 7: Post-editors
What are the skills needed?
Post-editing guidelines
How do we pay them?
![Page 18: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/18.jpg)
Decision 8: Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
![Page 19: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/19.jpg)
Decision 9: Business Model
![Page 20: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/20.jpg)
![Page 21: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/21.jpg)
Decision 10: Start!
![Page 22: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/22.jpg)
Let's play with Moses
![Page 23: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/23.jpg)
Let's play with Moses
Best resource to start
www.statmt.org/moses
TAUS tutorial
www.translationautomation.com
tauyou slides
www.speakerdeck.com/tauyoucom
![Page 24: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/24.jpg)
Everything is clear!
Gather TMs and other linguistic assets
Select domains
Train systems
BLEU score is great
… but …
Translation quality is awful
![Page 25: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/25.jpg)
Why?
Not enough data
Too much data
Unclean TMs
Misalignments
Difficult language pairs
Selection of wrong parameters
Suboptimal techniques
![Page 26: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/26.jpg)
![Page 27: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/27.jpg)
Some steps
Maximum exploitation of existing assets
Source content optimization
Data selection and cleaning
Improvement of the models
Linguistic processing
Continuous improvement
![Page 28: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/28.jpg)
Linguistic assets
Translation memory sharing
Clients, Partners, EU, UN, TAUS
Relevant on-line data retrieval
Advanced TM techniques
Sub-segment matching
Parts of Speech replacement
![Page 29: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/29.jpg)
Source optimization (I)
Spell check
Grammar check
Style check
Terminology check
Client checklist
newdoc
proposeddoc + html
report
![Page 30: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/30.jpg)
Summarization
% to reduce
Use translation memories
Project
Client
All
newdoc
proposeddoc + html
report
![Page 31: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/31.jpg)
Data selection + cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data
![Page 32: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/32.jpg)
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve recasing
![Page 33: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/33.jpg)
Linguistic processing
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
![Page 34: 2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?](https://reader033.vdocuments.mx/reader033/viewer/2022060109/555928dcd8b42a543d8b457b/html5/thumbnails/34.jpg)
Life is about the people you meet and the things you create with them.
So go out and start creatingPart of the Holstee Manifesto
Diego BartolomeCEO tauyou <language technology>[email protected]@diegobartolome