translation models: taking translation direction into account
DESCRIPTION
Translation Models: Taking Translation Direction into Account. Gennadi Lembersky Noam Ordan Shuly Wintner ISCOL, 2011. Statistical Machine Translation (SMT). Given foreign sentence f : “Maria no dio una bofetada a la bruja verde ” Find the most likely English translation e : - PowerPoint PPT PresentationTRANSCRIPT
Translation Models: Taking Translation Direction into Account
Gennadi LemberskyNoam OrdanShuly WintnerISCOL, 2011
Statistical Machine Translation (SMT)• Given foreign sentence f:
▫ “Maria no dio una bofetada a la bruja verde”• Find the most likely English translation e:
▫“Maria did not slap the green witch”• Most likely English translation e is given by:
arg max P(e|f):• P(e|f) estimates conditional probability of any e
given f• How to estimate P(e|f)?• Noisy channel:
▫ Decompose P(e|f) into P(f|e) * P(e) / P(f)▫ Estimate P(f|e) using parallel corpus (translation model)▫ Estimate P(e) using monolingual corpus (language model)
2
Translation Model•How to model P(f|e)?
▫Learn parameters of P(f|e) from a parallel corpus▫Estimate translation model parameters at the
phrase level explicit modeling of word context captures local reorderings, local dependencies
•IBM Models define how words in a source sentence can be aligned to words in a parallel target sentence▫EM is used to estimate the parameters
•Aligned words are extended to phrases•Results: phrase-table
3
Log-Linear Models•Log-linear models
▫where hi are the feature functions and λi are the model parameters
▫typical feature functions: phrase translation probabilities, lexical translation probabilities, language model probability, reordering model
•Model parameter estimation (tuning) using discriminative training; MERT algorithm (Och,2003)
4
iii feh
eefeP
),(
)|(maxarg
Evaluation •Human evaluation is not practical – too slow
and costly•Automatic evaluation is based on a human
reference translation▫The output of an MT system is compared to
the human translation of the same set of sentences
▫ The metric basically calculate the distance between MT output and the reference translation
•Tens of metrics were developed▫BLEU is the most popular one▫METEOR and TER are close
5
Original vs. Translated Texts
Given this simplified model:
Two points are made with regard to the “intermediate component” (TM and LM):
1. TM is blind to direction (but see Kurokawa et al., 2009)
2. LMs are based on originally written texts.
6
Source Text Target TextLM
TM
Original vs. Translated Texts
Translated texts are ontologically different from non-translated texts ; they generally exhibit
1. Simplification of the message, the grammar or both (Al-Shabab, 1996, Laviosa, 1998) ;
2. Explicitation, the tendency to spell out implicit utterances that occur in the source text (Blum-Kulka, 1986).
7
Original vs. Translated Texts•Translated texts can be distinguished
from non-translated texts with high accuracy (87% and more)- For Italian (Baroni & Bernardini, 2006)- For Spanish (Iliseiet al., 2010);- For English (Koppel & Ordan, 2011)
8
How Translation Direction Affects MT?
•Language Models▫Our work (accepted to EMNLP) shows that
translated LMs are better for MT systems than the original ones.
•Translation Models▫Kurokawa et al, 2009 showed that when
translating French into English it is better to use French-translated-to-English parallel corpus and vice versa.
▫This work supports this claim and extends it (in review for WMT)
9
Our Setup• Canadian Hansard corpus: parallel French-English
corpus▫80% Original English (EO)▫20% Original French (FO)▫The ‘source’ language is marked
• Two scenarios:▫Balanced: 750K FO sentences and 750K EO sentences▫Biased: 750K FO sentences and 3M EO sentences
• MOSES PB-SMT toolkit• Tuning & Evaluation:
▫1000 FO sentences for tuning and 5000 FO sentences for evaluation
10
Baseline Experiments•We translate French-to-English•EO – train the phrase-table on EO portion
of the parallel corpus•FO – train the phrase-table on FO portion
of the parallel corpus•FO+EO – train the phrase-table on all the
parallel corpus
11
Baseline Results12
Time Size BLEU System Set1.04 1,391,365 28.44 EO
Balanced0.98 1,308,726 31.92 FO1.09 2,429,807 31.72 FO+EO1.22 4,236,189 29.53 EO
Biased0.98 1,308,726 31.92 FO1.15 5,101,973 32.85 FO+EO
SystemA: Two Phrase-Tables•EO – train the phrase-table on EO portion
of the parallel corpus•FO – train the phrase-table on FO portion
of the parallel corpus•SystemA – let MOSES use both phrase-
tables▫Log-linear model training gives each
phrase-table different scores
13
SystemA Results
14
Time Size BLEU System Set1.89 2,700,091 33.21 SystemA Balanced2.39 5,544,915 33.54 SystemA Biased
• In the balanced scenario we gained 1.29 BLEU
• In the biased scenario we gained 0.69 BLEU• The cost is the decoding time and memory
resources
Looking Inside…15
•Complete table – a phrase-table after training
•Filtered table – a phrase-table that contains only phrases that appear in the evaluation set
Few Observations… / 1•Balanced Set / Complete tables
▫FO table has many more unique French phrases (15.8M vs. 13M)
▫EO table has more translation options per each source phrase (1.42 vs. 1.33)
▫The sources phrases in the intersection are shorter (3.76 vs. 5.07-5.16), but they have more translations (3.08-3.21 vs. 1.09-1.10)
16
Few Observations… / 2•Balanced Set / Filtered tables
▫The intersection comprises 96.1% of the translation phrase-pairs in the FO table and 98.3% of the translation phrase-pairs in the EO table.
17
Few Observations… / 3• Biased Set – we added 2,250,000 English-
original sentences. What happens?▫In ‘complete’ EO table – everything grows
• In Filtered Tables▫number of phrase-pairs increases by a factor of 3▫number of unique source phrases increases by 1/3
Coverage of French phrases haven’t improved by much▫The average number of translations increases by a
factor of 2.3 (from 13.2 to 30.3) Long tail – the probability is split between larger
number of translations. Good translations get lower probability than in FO table
18
How does MOSES Select Phrases?• Balanced Set
• 96.5% comes from FO table• 99.3% of the phrase-pairs
selected from the intersection originated in the FO table
• Biased Set
• 94.5% comes from FO table• 98.2% of the phrase-pairs
selected from the intersection originated in the FO table
19
The tuning effect /1•A question: Is FO phrase-table better than
the EO phrase-table or it becomes better during the tuning.
•Let’s test SystemA with initial (pre-tuning) configuration and with the configuration generated by tuning.
20
The tuning effect /221
• Balanced Set / Before tuning
• 58% only comes from the FO table
• 57.7% of the phrase-pairs selected from the intersection originated in the FO table
• Balanced Set / After tuning
• 95.4% comes from FO table
• 99.3% of the phrase-pairs selected from the intersection originated in the FO table
The tuning effect /3•The decoder prefers the FO table in the
initial configuration (58%). •The preference becomes much stronger
after tuning (95.4%)•Interestingly, the decoder doesn’t just
replace EO phrases with FO phrases; it searches for the longer phrases;▫The average length of a phrase selected
from the EO table increases by about 1.5 words.
22
New Experiment: SystemB•Based on these results, we can through
away the intersection subset of the EO phrase-table▫We expect a small loss in quality, but a
significant improvement in translation speed.
23
SystemB Results24
Time Size BLEU System Set1.04 1,391,365 28.44 EO
Balanced0.98 1,308,726 31.92 FO1.09 2,429,807 31.72 FO+EO1.89 2,700,091 33.21 SystemA0.94 1,327,955 33.19 SystemB1.22 4,236,189 29.53 EO
Biased0.98 1,308,726 31.92 FO1.15 5,101,973 32.85 FO+EO2.39 5,544,915 33.54 SystemA0.95 1,382,017 33.34 SystemB
What about classified corpus?
•Annotation of the source language is rarely available in the parallel corpora.▫Will our SystemA and SystemB outperform
FO+EO and FO MT systems?•We use we use SVM for classification, and
our features are punctuation marks and the n-grams of part-of-speech tags.
•We train the classifier on an English-French subset of the Europarl corpus.
•Accuracy is about 73.5%
25
Classified System Results26
BLEU System Set31.72 EO+FO
Balanced
31.92 FO (annotated)32.04 FO (classified)32.91 SystemA
(classified)32.57 SystemB
(classified)33.21 SystemA
(annotated)33.19 SystemB
(annotated)
Thank You!
27