8. qun liu (dcu) hybrid solutions for translation

Hybrid Solutions for Translation: Going hybrid

Qun Liu (DCU)Dr. Manuel Herranz (Pangeanic)

12 November 2013, Birmingham, UK

PART A

Qun Liu (DCU)[email protected]

Winter School 2013, Birmingham

Outline

Why Hybrid MT? An overview of Hybrid MT Typical Hybrid MT Approaches Conclusion


MT Approaches

RBMT: Rule-based Machine Translation

EBMT: Example-based Machine Translation

TM: Translation Memory

SMT: Statistical Machine Translation


RBMT: Vauquois’ Triangle

Syntactic Transfer

Semantic Transfer

Interlingua

DirectSource Language Target Language

Analysis Generation


RBMT: Rules for Components

Analysis

Morphological Analysis Source Morphological Rules

Syntactic Analysis (Parsing) Source Grammar

Semantic Analysis Source Semantic Rules

Transfer

Lexical Transfer Bilingual Lexicon

Syntactic Transfer Syntactic Mapping Rules

Semantic Transfer Semantic Mapping Rules

Generation

Semantic Generation Target Semantic Rules

Syntactic Generation Target Grammar

Morphological Generation Target Morphological Rules


RBMT: an Example


RBMT

RBMT makes use of human encoded linguistic rules for translation

Development of a RBMT system is very expensive because it needs plenty of human labour and takes a long time (years)


RBMT

RBMT systems can reach good translation quality after years of development in the given domain.

Well developed RBMT systems tend to better capture large size sentence structures but perform worse on small size expressions compared with SMT systems.


EBMT

An EBMT system translate sentences by analog of existing translation examples

EBMT does not need deep analysis of source text and may generate high quality translation when similar examples are found


EBMT


EBMT

Quality of EBMT increases while we get more examples.

A problem of EBMT is the coverage of the examples, especially for long sentences.


TM

Translation Memory directly output existing target sentence when a very similar source sentence is found in the memory, or it outputs nothing.


SMT

SMT builds statistical models to predict the probability of a target sentence being the translation of a given source sentence.

To translate a given source sentence is just to search for a target sentence with the highest translation probability.


SMT

A large number of translation pairs (parallel corpus) is needed to estimate the model parameters.

To predict the translation, sentence pairs are broken into smaller translation equivalence, either in word level, or in phrase level or syntax rule level.


Word-based SMT


Word-based SMTSource Target Probability

Bushi （布什） Bush 0.7

President 0.2

US 0.1

yu （与） and 0.6

with 0.4

juxing （举行） hold 0.7

had 0.3

le （了） hold 0.01

... ...


Phrase-based SMT


Phrase-based SMTSource Target Probability

Bushi （布什） Bush 0.5

president Bush 0.3

the US president 0.2

Bushi yu （布什与） Bush and 0.8

the president and 0.2

yu Shalong （与沙龙） and Shalong 0.6

with Shalong 0.4

juxing le huiang （举行了会谈） hold a meeting 0.7

had a meeting 0.3


Hierarchical Phrased-based SMT


Hierarchical Phrased-based SMTSource Target Probability

juxing le huiang （举行了会谈） hold a meeting 0.6

had a meeting 0.3

X huitang （X会谈） X a meeting 0.8

X a talk 0.2

juxing le X （举行了X） hold a X 0.5

had a X 0.5

Bushi yu Shalong （布什与沙龙） Bush and Sharon 0.8

Bushi X （布什X） Bush X 0.7

X yu Y （X与Y） X and Y 0.9


Syntax-based SMT


Syntax-based SMT

Source Target Probability

VPB(VS(juxing) AS(le) NPB(huiang)) （举行了会谈）

hold a meeting 0.6

have a meeting 0.3

have a talk 0.1

VPB(VS(juxing) AS(le) x1:NPB) （举行了x1）

hold a x1 0.5

have a x1 0.5

VP(PP(P(yu) x1:NPB) x2:VPB) （与 x1 x2） x2 with x1 0.9

IP(x1:NPB VP(x2:PP x3:VPB)) x1 x3 x2 0.7


SMT

SMT is cheap SMT systems can be developed in a

short time SMT needs a large number of parallel

corpus


SMT

SMT gets good quality translations if we have plenty of in-domain data

SMT quality drops dramatically for out-of-domain data

SMT results is fluent in short phrases but not good at large size sentence structures (esp. for distant languages)


Why Hybrid MT?

Each MT approach has its pros and cons.

We want to take advantage of different MT approaches

We do not want to waste our investments on existing MT systems


Outline



An overview of Hybrid MT

Selective MT: loose coupling Pipelined MT: medium coupling Mixture MT: close coupling


Selective MT

Given translations generated by different approaches, Selective MT tries to select a best one, or select best parts from different translations and combine them to a new one.


Selective MT

MT1

MT3

SelectMT2

Source

Target

Target


Selective MT

Typical Selective MT:System RecommendationSystem Combination Sentence-level combination word-level combination


Pipelined MT

Pipelined MT adopts one approach as the main approach and use another approach for monolingual pre-processing or post-processing.


Pipelined MT

Main ApproachPre-Processing Post-Processing


Pipelined MT

Typical Pipelined MT:Statistical Post-Editing for RBMTRule-based Pre-reordering for SMT


Mixture MT

Mixture MT adopts one approach as the main approach but utilizes one or more different approaches in some components.


Mixture MT


Mixture MT

Typical Mixture MT:Statistical Parsing in RBMTRule-based Named Entity Translation

in SMTHuman-Encoded Rules in SMTSMT Decoding with TM Phrases


Outline



Typical Hybrid MT Approaches

Selective MTSystem RecommendationSystem Combination

Pipelined MT Mixture MT


System Recommendation

Yifan He, Yanjun Ma, Josef van Genabith and Andy

Way, Bridging SMT and TM with System

Recommendation, Proceedings of the 48th Annual

Meeting of the Association for Computational

Linguistics (ACL2010), pages 622–630, Uppsala,

Sweden, 11-16 July 2010.



Intuition: In some cases when we have enough big

translation memory, the trained SMT system is comparable with TM output in translation quality. Here comes the problem of selection.

System recommendation recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM



TM

SMT

SystemRecommendation

Parallel Corpus



A SVM binary classifier is adopted The classifier is trained on human-

annotated data A confidence score is given for the

recommendation



SMT System Features: features used in the SMT system

TM Feature: Fuzzy Match Cost

System Independent Features: Source-Side Language Model Score and Perplexity

Target-Side Language Model Perplexity

The Pseudo-Source Fuzzy Match Score

The IBM Model 1 Score.



Evaluation Metrics:

Where A is the set of recommended MT outputs, and B is the set of MT outputs that have lower TER than TM hits.



Selective MTSystem RecommendationSystem Combination

Pipelined MT Mixture MT


System Combination

Rosti, A. V. I., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R. M., & Dorr, B. J. (2007, April). Combining Outputs from Multiple Machine Translation Systems. In HLT-NAACL (pp. 228-235).


System Combination

Rosti, A. V. I., Matsoukas, S., & Schwartz, R. (2007, June). Improved word-level system combination for machine translation. In ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (Vol. 45, No. 1, p. 312).


System Combination

He, X., Yang, M., Gao, J., Nguyen, P., & Moore, R. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 98-107). Association for Computational Linguistics.


System Combination

Feng, Y., Liu, Y., Mi, H., Liu, Q., & Lü, Y. 2009. Lattice-based system combination for statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3 (pp. 1105-1113). Association for Computational Linguistics.


Sentence-Level System Combination

Kumar, S., & Byrne, W. J. (2004, May). Minimum Bayes-Risk Decoding for Statistical Machine Translation. In HLT-NAACL (pp. 169-176).



Consider we have several MT systems For a given source text F, each MT system

output a n-best target text If possible, MT system gives each target

text a probability P(E|F), or we may consider the n-best target text with equal probabilities.



Minimum Bayes-Risk (MBR):


Word-LevelSystem Combination

Select a translation candidate as a skeleton (backbone) with Minimal Bayes Risk

Construct a confusion network by aligning all the words in other translation candidates to the words in the skeleton

Select the best path from the confusion network and generate a new translation


Translation Candidate

Skeleton


Word Alignment against the Skeleton

Skeleton


Confusion Network

Final output: Please show me on the map.


Word-LevelSystem Combination

System combination is proved to be very effective

In NIST Open MT Evaluation Chinese-English task, MSR-NRC-SRI ranked no.1 by using system combination technologies

In later NIST evaluations, different tracks are defined participants using or not using system combination technologies.



Selective MT Pipelined MTStatistical Post-Editing for RBMTRule-based Pre-reordering for SMT

Mixture MT


Statistical Post-Editing for RBMT

Dugast, L., Senellart, J., & Koehn, P. (2007, June). Statistical post-editing on SYSTRAN's rule-based translation system. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 220-223). Association for Computational Linguistics.


Statistical Post-Editing for RBMT

Simard, M., Ueffing, N., Isabelle, P., & Kuhn, R. (2007). Rule-based Translation With Statistical Phrase-based Post-editing. Second Workshop on Statistical Machine Translation. Prague, Czech Republic. June 23, 2007. pp. 203–206.


Statistical Post-Editing

When we have: A very good RBMT system Large number of parallel corpus which can be

used for SMT training Both RBMT and SMT have advantages and

disadvantages Can we make benefits from both methods?


Statistical Post-Editing

SourceText RBMT RBMT

Result SPE SPEResult

A Statistical Post-Editing (SPE) system is a monolingual SMT system which takes the result of a RBMT system as input and generate a improved target output.


Statistical Post Edit: Training

Source

Target

RBMT RBMTTarget

Target

SPETraining SPE


Statistical Post Edit: Training

RBMT usually generates a better word order while SMT can make better lexical selection.

RBMT+SPE outperforms the original RBMT and SMT systems.



Selective MT Pipelined MTStatistical Post-Editing for RBMTRule-based Pre-reordering for SMT

Mixture MT


Rule-based Pre-reordering for SMT

Elia Yuste, Manuel Herranz, Alexandra Helle and Hirokazu Suzuki, Go Hybrid: Pangeanic's and Toshiba's First Steps Towards ENJP MT Hybridization, AAMT Journal, No.50, December 2011 (Part B for this tutorial)



Xia, F., & McCord, M. (2004, August). Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th international conference on Computational Linguistics (p. 508). Association for Computational Linguistics.



A phrase-based SMT (PBSMT) system performs good lexical choices but is not good at long distance reordering without linguistics knowledge

A rule-based word-reordering on the source side is conducted to make the word order of the source text much more similar with the word order in the target side.



SourceText

Pre-Reordering

ReorderedSource Text PBSMT Target

Text


PBSMT: Training

Source

Target

Pre-reordering

ReorderedSource

Target

PBSMTTraining PBSMT


Pre-reordering: Training

The rule for pre-ordering can be automatic acquired from the parallel corpus with automatic word alignment and parsing trees in both side.


Pre-reordering: Training

Parsing the source sentence Parsing the target sentence Align the words and the phrases in

both sides Extract the rewrite rules


Parsing Trees and Alignments


Rule Extraction


Rule Organization and Filtering


Applying Rewrite Rules



Selective MT Pipelined MTMixture MTStatistical Parsing in RBMTRule-based Named Entity Translation in SMTHuman-Acquired Rules in SMTSMT Decoding with TM Phrases


Statistical Parsing in RBMT

Statistical parsing outperforms rule-based parsing if we have large scale treebank.

It is reasonable to use statistical algorithm in the parsing component in a RBMT system.


Rule-based Named Entity Translation in SMT

Ney, H. (2013). Statistical MT Systems Revisited: How much Hybridity do they have? Proceedings of the Second Workshop on Hybrid Approaches to Translation, page 7, Sofia, Bulgaria, August 8, 2013.


Numerical Expression Translation

3501749

3,501,749

350,1749

3 million 501 thousand and 749

350 wan 1749

English:

Chinese:


Human-Acquired Rules in SMT

Li, X., Lü, Y., Meng, Y., Liu, Q., & Yu, H. Feedback Selecting of Manually Acquired Rules Using Automatic Evaluation. Proceedings of the 4th Workshop on Patent Translation, pages 52-59, MT Summit XIII, Xiamen, China, September 2011


Human-Acquired Rules in SMT

These rules are used in the decoding process together with the Hierarchical Phrases in a SMT system


SMT Decoding with TM Phrases

Philipp Koehn and Jean Senellart. 2010. Convergence of translation memory and statistical machine translation. In AMTA Workshop on MT Research and the Translation Industry, pages 21–31.

Wang, K., Zong, C., & Su, K. Y. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 11–21, Sofia, Bulgaria, August 4-9 2013



Yanjun Ma, Yifan He, Andy Way and Josef van Genabith. 2011. Consistent translation using discriminative learning: a translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Lingui stics, pages 1239–1248, Portland, Oregon.

Yifan He, Yanjun Ma, Andy Way and Josef van Genabith. 2011. Rich linguistic features for translation memory-inspired consistent translation. In Proceedings of the Thirteenth Machine Translation Summit, pages 456–463.



Extract TM phrases from similar sentences in the translation memory and use them in the decoding process in the runtime.


Outline



Conclusion

Different MT approaches have advantages and disadvantages, which are usually complementary.

Hybrid MT can take benefit from different MT approaches

Three categories of Hybrid MT is introduced: Selective, Pipelined and Mixture.

Actually almost all the real MT systems are hybrid system.


Thank you!Q&A

8. qun liu (dcu) hybrid solutions for translation

Technology

examplewinter school

smt smt

ebmtwinter school

huiang winter school

x xwinter school

wordbased smtwinter

translation memory smt

phrasebased smtwinter