8. qun liu (dcu) hybrid solutions for translation
TRANSCRIPT
Hybrid Solutions for Translation: Going hybrid
Qun Liu (DCU)Dr. Manuel Herranz (Pangeanic)
12 November 2013, Birmingham, UK
PART A
Qun Liu (DCU)[email protected]
Winter School 2013, Birmingham
Outline
Why Hybrid MT? An overview of Hybrid MT Typical Hybrid MT Approaches Conclusion
Winter School 2013, Birmingham
MT Approaches
RBMT: Rule-based Machine Translation
EBMT: Example-based Machine Translation
TM: Translation Memory
SMT: Statistical Machine Translation
Winter School 2013, Birmingham
RBMT: Vauquois’ Triangle
Syntactic Transfer
Semantic Transfer
Interlingua
DirectSource Language Target Language
Analysis Generation
Winter School 2013, Birmingham
RBMT: Rules for Components
Analysis
Morphological Analysis Source Morphological Rules
Syntactic Analysis (Parsing) Source Grammar
Semantic Analysis Source Semantic Rules
Transfer
Lexical Transfer Bilingual Lexicon
Syntactic Transfer Syntactic Mapping Rules
Semantic Transfer Semantic Mapping Rules
Generation
Semantic Generation Target Semantic Rules
Syntactic Generation Target Grammar
Morphological Generation Target Morphological Rules
Winter School 2013, Birmingham
RBMT: an Example
Winter School 2013, Birmingham
RBMT: an Example
Winter School 2013, Birmingham
RBMT: an Example
Winter School 2013, Birmingham
RBMT: an Example
Winter School 2013, Birmingham
RBMT: an Example
Winter School 2013, Birmingham
RBMT
RBMT makes use of human encoded linguistic rules for translation
Development of a RBMT system is very expensive because it needs plenty of human labour and takes a long time (years)
Winter School 2013, Birmingham
RBMT
RBMT systems can reach good translation quality after years of development in the given domain.
Well developed RBMT systems tend to better capture large size sentence structures but perform worse on small size expressions compared with SMT systems.
Winter School 2013, Birmingham
EBMT
An EBMT system translate sentences by analog of existing translation examples
EBMT does not need deep analysis of source text and may generate high quality translation when similar examples are found
Winter School 2013, Birmingham
EBMT
Winter School 2013, Birmingham
EBMT
Quality of EBMT increases while we get more examples.
A problem of EBMT is the coverage of the examples, especially for long sentences.
Winter School 2013, Birmingham
TM
Translation Memory directly output existing target sentence when a very similar source sentence is found in the memory, or it outputs nothing.
Winter School 2013, Birmingham
SMT
SMT builds statistical models to predict the probability of a target sentence being the translation of a given source sentence.
To translate a given source sentence is just to search for a target sentence with the highest translation probability.
Winter School 2013, Birmingham
SMT
A large number of translation pairs (parallel corpus) is needed to estimate the model parameters.
To predict the translation, sentence pairs are broken into smaller translation equivalence, either in word level, or in phrase level or syntax rule level.
Winter School 2013, Birmingham
Word-based SMT
Winter School 2013, Birmingham
Word-based SMTSource Target Probability
Bushi (布什) Bush 0.7
President 0.2
US 0.1
yu (与) and 0.6
with 0.4
juxing (举行) hold 0.7
had 0.3
le (了) hold 0.01
... ...
Winter School 2013, Birmingham
Phrase-based SMT
Winter School 2013, Birmingham
Phrase-based SMTSource Target Probability
Bushi (布什) Bush 0.5
president Bush 0.3
the US president 0.2
Bushi yu (布什与) Bush and 0.8
the president and 0.2
yu Shalong (与沙龙) and Shalong 0.6
with Shalong 0.4
juxing le huiang (举行了会谈) hold a meeting 0.7
had a meeting 0.3
Winter School 2013, Birmingham
Hierarchical Phrased-based SMT
Winter School 2013, Birmingham
Hierarchical Phrased-based SMTSource Target Probability
juxing le huiang (举行了会谈) hold a meeting 0.6
had a meeting 0.3
X huitang (X会谈) X a meeting 0.8
X a talk 0.2
juxing le X (举行了X) hold a X 0.5
had a X 0.5
Bushi yu Shalong (布什与沙龙) Bush and Sharon 0.8
Bushi X (布什X) Bush X 0.7
X yu Y (X与Y) X and Y 0.9
Winter School 2013, Birmingham
Syntax-based SMT
Winter School 2013, Birmingham
Syntax-based SMT
Source Target Probability
VPB(VS(juxing) AS(le) NPB(huiang)) (举行了会谈)
hold a meeting 0.6
have a meeting 0.3
have a talk 0.1
VPB(VS(juxing) AS(le) x1:NPB) (举行了x1)
hold a x1 0.5
have a x1 0.5
VP(PP(P(yu) x1:NPB) x2:VPB) (与 x1 x2) x2 with x1 0.9
IP(x1:NPB VP(x2:PP x3:VPB)) x1 x3 x2 0.7
Winter School 2013, Birmingham
SMT
SMT is cheap SMT systems can be developed in a
short time SMT needs a large number of parallel
corpus
Winter School 2013, Birmingham
SMT
SMT gets good quality translations if we have plenty of in-domain data
SMT quality drops dramatically for out-of-domain data
SMT results is fluent in short phrases but not good at large size sentence structures (esp. for distant languages)
Winter School 2013, Birmingham
Why Hybrid MT?
Each MT approach has its pros and cons.
We want to take advantage of different MT approaches
We do not want to waste our investments on existing MT systems
Winter School 2013, Birmingham
Outline
Why Hybrid MT? An overview of Hybrid MT Typical Hybrid MT Approaches Conclusion
Winter School 2013, Birmingham
An overview of Hybrid MT
Selective MT: loose coupling Pipelined MT: medium coupling Mixture MT: close coupling
Winter School 2013, Birmingham
Selective MT
Given translations generated by different approaches, Selective MT tries to select a best one, or select best parts from different translations and combine them to a new one.
Winter School 2013, Birmingham
Selective MT
MT1
MT3
SelectMT2
Source
Target
Target
Winter School 2013, Birmingham
Selective MT
MT1
MT3
SelectMT2
Source
Target
Target
Winter School 2013, Birmingham
Selective MT
Typical Selective MT:System RecommendationSystem Combination Sentence-level combination word-level combination
Winter School 2013, Birmingham
Pipelined MT
Pipelined MT adopts one approach as the main approach and use another approach for monolingual pre-processing or post-processing.
Winter School 2013, Birmingham
Pipelined MT
Main ApproachPre-Processing Post-Processing
Winter School 2013, Birmingham
Pipelined MT
Typical Pipelined MT:Statistical Post-Editing for RBMTRule-based Pre-reordering for SMT
Winter School 2013, Birmingham
Mixture MT
Mixture MT adopts one approach as the main approach but utilizes one or more different approaches in some components.
Winter School 2013, Birmingham
Mixture MT
Winter School 2013, Birmingham
Mixture MT
Typical Mixture MT:Statistical Parsing in RBMTRule-based Named Entity Translation
in SMTHuman-Encoded Rules in SMTSMT Decoding with TM Phrases
Winter School 2013, Birmingham
Outline
Why Hybrid MT? An overview of Hybrid MT Typical Hybrid MT Approaches Conclusion
Winter School 2013, Birmingham
Typical Hybrid MT Approaches
Selective MTSystem RecommendationSystem Combination
Pipelined MT Mixture MT
Winter School 2013, Birmingham
System Recommendation
Yifan He, Yanjun Ma, Josef van Genabith and Andy
Way, Bridging SMT and TM with System
Recommendation, Proceedings of the 48th Annual
Meeting of the Association for Computational
Linguistics (ACL2010), pages 622–630, Uppsala,
Sweden, 11-16 July 2010.
Winter School 2013, Birmingham
System Recommendation
Intuition: In some cases when we have enough big
translation memory, the trained SMT system is comparable with TM output in translation quality. Here comes the problem of selection.
System recommendation recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM
Winter School 2013, Birmingham
System Recommendation
TM
SMT
SystemRecommendation
Parallel Corpus
Winter School 2013, Birmingham
System Recommendation
A SVM binary classifier is adopted The classifier is trained on human-
annotated data A confidence score is given for the
recommendation
Winter School 2013, Birmingham
System Recommendation
SMT System Features: features used in the SMT system
TM Feature: Fuzzy Match Cost
System Independent Features: Source-Side Language Model Score and Perplexity
Target-Side Language Model Perplexity
The Pseudo-Source Fuzzy Match Score
The IBM Model 1 Score.
Winter School 2013, Birmingham
System Recommendation
Evaluation Metrics:
Where A is the set of recommended MT outputs, and B is the set of MT outputs that have lower TER than TM hits.
Winter School 2013, Birmingham
System Recommendation
Winter School 2013, Birmingham
System Recommendation
Winter School 2013, Birmingham
Typical Hybrid MT Approaches
Selective MTSystem RecommendationSystem Combination
Pipelined MT Mixture MT
Winter School 2013, Birmingham
System Combination
Rosti, A. V. I., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R. M., & Dorr, B. J. (2007, April). Combining Outputs from Multiple Machine Translation Systems. In HLT-NAACL (pp. 228-235).
Winter School 2013, Birmingham
System Combination
Rosti, A. V. I., Matsoukas, S., & Schwartz, R. (2007, June). Improved word-level system combination for machine translation. In ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (Vol. 45, No. 1, p. 312).
Winter School 2013, Birmingham
System Combination
He, X., Yang, M., Gao, J., Nguyen, P., & Moore, R. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 98-107). Association for Computational Linguistics.
Winter School 2013, Birmingham
System Combination
Feng, Y., Liu, Y., Mi, H., Liu, Q., & Lü, Y. 2009. Lattice-based system combination for statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3 (pp. 1105-1113). Association for Computational Linguistics.
Winter School 2013, Birmingham
Sentence-Level System Combination
Kumar, S., & Byrne, W. J. (2004, May). Minimum Bayes-Risk Decoding for Statistical Machine Translation. In HLT-NAACL (pp. 169-176).
Winter School 2013, Birmingham
Sentence-Level System Combination
Consider we have several MT systems For a given source text F, each MT system
output a n-best target text If possible, MT system gives each target
text a probability P(E|F), or we may consider the n-best target text with equal probabilities.
Winter School 2013, Birmingham
Sentence-Level System Combination
Minimum Bayes-Risk (MBR):
Winter School 2013, Birmingham
Word-LevelSystem Combination
Select a translation candidate as a skeleton (backbone) with Minimal Bayes Risk
Construct a confusion network by aligning all the words in other translation candidates to the words in the skeleton
Select the best path from the confusion network and generate a new translation
Winter School 2013, Birmingham
Translation Candidate
Skeleton
Winter School 2013, Birmingham
Word Alignment against the Skeleton
Skeleton
Winter School 2013, Birmingham
Confusion Network
Final output: Please show me on the map.
Winter School 2013, Birmingham
Word-LevelSystem Combination
System combination is proved to be very effective
In NIST Open MT Evaluation Chinese-English task, MSR-NRC-SRI ranked no.1 by using system combination technologies
In later NIST evaluations, different tracks are defined participants using or not using system combination technologies.
Winter School 2013, Birmingham
Typical Hybrid MT Approaches
Selective MT Pipelined MTStatistical Post-Editing for RBMTRule-based Pre-reordering for SMT
Mixture MT
Winter School 2013, Birmingham
Statistical Post-Editing for RBMT
Dugast, L., Senellart, J., & Koehn, P. (2007, June). Statistical post-editing on SYSTRAN's rule-based translation system. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 220-223). Association for Computational Linguistics.
Winter School 2013, Birmingham
Statistical Post-Editing for RBMT
Simard, M., Ueffing, N., Isabelle, P., & Kuhn, R. (2007). Rule-based Translation With Statistical Phrase-based Post-editing. Second Workshop on Statistical Machine Translation. Prague, Czech Republic. June 23, 2007. pp. 203–206.
Winter School 2013, Birmingham
Statistical Post-Editing
When we have: A very good RBMT system Large number of parallel corpus which can be
used for SMT training Both RBMT and SMT have advantages and
disadvantages Can we make benefits from both methods?
Winter School 2013, Birmingham
Statistical Post-Editing
SourceText RBMT RBMT
Result SPE SPEResult
A Statistical Post-Editing (SPE) system is a monolingual SMT system which takes the result of a RBMT system as input and generate a improved target output.
Winter School 2013, Birmingham
Statistical Post Edit: Training
Source
Target
RBMT RBMTTarget
Target
SPETraining SPE
Winter School 2013, Birmingham
Statistical Post Edit: Training
RBMT usually generates a better word order while SMT can make better lexical selection.
RBMT+SPE outperforms the original RBMT and SMT systems.
Winter School 2013, Birmingham
Typical Hybrid MT Approaches
Selective MT Pipelined MTStatistical Post-Editing for RBMTRule-based Pre-reordering for SMT
Mixture MT
Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT
Elia Yuste, Manuel Herranz, Alexandra Helle and Hirokazu Suzuki, Go Hybrid: Pangeanic's and Toshiba's First Steps Towards ENJP MT Hybridization, AAMT Journal, No.50, December 2011 (Part B for this tutorial)
Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT
Xia, F., & McCord, M. (2004, August). Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th international conference on Computational Linguistics (p. 508). Association for Computational Linguistics.
Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT
A phrase-based SMT (PBSMT) system performs good lexical choices but is not good at long distance reordering without linguistics knowledge
A rule-based word-reordering on the source side is conducted to make the word order of the source text much more similar with the word order in the target side.
Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT
SourceText
Pre-Reordering
ReorderedSource Text PBSMT Target
Text
Winter School 2013, Birmingham
PBSMT: Training
Source
Target
Pre-reordering
ReorderedSource
Target
PBSMTTraining PBSMT
Winter School 2013, Birmingham
Pre-reordering: Training
The rule for pre-ordering can be automatic acquired from the parallel corpus with automatic word alignment and parsing trees in both side.
Winter School 2013, Birmingham
Pre-reordering: Training
Parsing the source sentence Parsing the target sentence Align the words and the phrases in
both sides Extract the rewrite rules
Winter School 2013, Birmingham
Parsing Trees and Alignments
Winter School 2013, Birmingham
Rule Extraction
Winter School 2013, Birmingham
Rule Organization and Filtering
Winter School 2013, Birmingham
Applying Rewrite Rules
Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT
Winter School 2013, Birmingham
Typical Hybrid MT Approaches
Selective MT Pipelined MTMixture MTStatistical Parsing in RBMTRule-based Named Entity Translation in SMTHuman-Acquired Rules in SMTSMT Decoding with TM Phrases
Winter School 2013, Birmingham
Statistical Parsing in RBMT
Statistical parsing outperforms rule-based parsing if we have large scale treebank.
It is reasonable to use statistical algorithm in the parsing component in a RBMT system.
Winter School 2013, Birmingham
Rule-based Named Entity Translation in SMT
Ney, H. (2013). Statistical MT Systems Revisited: How much Hybridity do they have? Proceedings of the Second Workshop on Hybrid Approaches to Translation, page 7, Sofia, Bulgaria, August 8, 2013.
Winter School 2013, Birmingham
Numerical Expression Translation
3501749
3,501,749
350,1749
3 million 501 thousand and 749
350 wan 1749
English:
Chinese:
Winter School 2013, Birmingham
Human-Acquired Rules in SMT
Li, X., Lü, Y., Meng, Y., Liu, Q., & Yu, H. Feedback Selecting of Manually Acquired Rules Using Automatic Evaluation. Proceedings of the 4th Workshop on Patent Translation, pages 52-59, MT Summit XIII, Xiamen, China, September 2011
Winter School 2013, Birmingham
Human-Acquired Rules in SMT
These rules are used in the decoding process together with the Hierarchical Phrases in a SMT system
Winter School 2013, Birmingham
SMT Decoding with TM Phrases
Philipp Koehn and Jean Senellart. 2010. Convergence of translation memory and statistical machine translation. In AMTA Workshop on MT Research and the Translation Industry, pages 21–31.
Wang, K., Zong, C., & Su, K. Y. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 11–21, Sofia, Bulgaria, August 4-9 2013
Winter School 2013, Birmingham
SMT Decoding with TM Phrases
Yanjun Ma, Yifan He, Andy Way and Josef van Genabith. 2011. Consistent translation using discriminative learning: a translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Lingui stics, pages 1239–1248, Portland, Oregon.
Yifan He, Yanjun Ma, Andy Way and Josef van Genabith. 2011. Rich linguistic features for translation memory-inspired consistent translation. In Proceedings of the Thirteenth Machine Translation Summit, pages 456–463.
Winter School 2013, Birmingham
SMT Decoding with TM Phrases
Extract TM phrases from similar sentences in the translation memory and use them in the decoding process in the runtime.
Winter School 2013, Birmingham
Outline
Why Hybrid MT? An overview of Hybrid MT Typical Hybrid MT Approaches Conclusion
Winter School 2013, Birmingham
Conclusion
Different MT approaches have advantages and disadvantages, which are usually complementary.
Hybrid MT can take benefit from different MT approaches
Three categories of Hybrid MT is introduced: Selective, Pipelined and Mixture.
Actually almost all the real MT systems are hybrid system.
Winter School 2013, Birmingham
Thank you!Q&A