example of parallel corpus machine translation: word ... · pdf filemachine translation: word...
Post on 15-Feb-2019
230 views
Embed Size (px)
TRANSCRIPT
Machine Translation:Word Alignment Problem
Marcello FedericoFBK, Trento - Italy
2016
M. Federico MT 2016
1Outline
Word alignments Word alignment models Alignment search Alignment estimation EM algorithm Model 2 Fertility alignment models HMM alignment models
This part contains advanced material (marked with *) suited to students interestedin the mathematical details of the presented models.
M. Federico MT 2016
2Example of Parallel Corpus
Darum liegt die Verantwortungfur das Erreichen desEffizienzzieles und der damiteinhergehenden CO2 -Reduzierungbei der Gemeinschaft , dienamlich dann tatig wird ,wenn das Ziel besser durchgemeinschaftliche Massnahmenerreicht werden kann . Undgenaugenommen steht hier dieGlaubwurdigkeit der EU auf demSpiel .
That is why the responsibilityfor achieving the efficiencytarget and at the same timereducing CO2 lies with theCommunity , which in fact takesaction when an objective canbe achieved more effectively byCommunity measures .Strictly speaking , it is thecredibility of the EU that is atstake here .
Notice dierent positions of corresponding verb groups.
MT has to take into account word re-ordering!
M. Federico MT 2016
3Word Alignments
Let us considers possible alignments a between words in f and e.
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since32 84 965 7
blowwillwindchillyeasternaneveningtomorrow
M. Federico MT 2016
3Word Alignments
Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e.
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since32 84 965 7
blowwillwindchillyeasternaneveningtomorrow
M. Federico MT 2016
3Word Alignments
Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since32 84 965 7
blowwillwindchillyeasternaneveningtomorrow
M. Federico MT 2016
3Word Alignments
Ley us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL)
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since30 2 84 965 7
blowwillwindchillyeasternaneveningtomorrowNULL
M. Federico MT 2016
3Word Alignments
Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL) These and even more general alignments are machine learnable.
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since30 2 84 965 7
blowwillwindchillyeasternaneveningtomorrowNULL
M. Federico MT 2016
3Word Alignments
Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL) These and even more general alignments are machine learnable. Notice also that alignments induce word re-ordering
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since30 2 84 965 7
blowwillwindchillyeasternaneveningtomorrowNULL
M. Federico MT 2016
4Word Alignment: Matrix Representation
blow 9 will 8 wind 7
chilly 6 eastern 5
an 4 evening 3 tomorrow 2
since 1
1 2 3 4 5 6 7 8 9
dalla
serata
di domani
soffiera
un fblackdo
vento
orientale
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since32 84 965 7
blowwillwindchillyeasternaneveningtomorrow
M. Federico MT 2016
4Word Alignment: Matrix Representation
blow 9 will 8 wind 7
chilly 6 eastern 5
an 4 evening 3 tomorrow 2
since 1 NULL 0
1 2 3 4 5 6 7 8 9
dalla
serata
di domani
soffiera
un fblackdo
vento
orientale
dalla1
orientaledomani unserata soffier freddo ventodi98765432
1
since30 2 84 965 7
blowwillwindchillyeasternaneveningtomorrowNULL
M. Federico MT 2016
5Word Alignment: Direct Alignment
A : {1, . . . ,m} ! {1, . . . , l}
implemented6 been5 has4
program3 the2 and1
position 1 2 3 4 5 6 7
il programma
e stato
messo
in pratica
We allow only one link (point) in each column. Some columns may be empty.
M. Federico MT 2016
6Word Alignment: Inverted Alignment
A : {1, . . . , l} ! {1, . . . ,m}
people6 aborigenal5
the4 of3
territory2 the1
position 1 2 3 4
il territorio
degli
autoctoni
You can get a direct alignment by swapping source and target sentence.
M. Federico MT 2016
7Alignment Variable
Modelling the alignment as an arbitrary relation between source and targetlanguage is very general but computationally unfeasible: 2lm possiblealignments!
A generally applied restriction is to let each source word be assigned to exactlyone target word(see Example 2). Hence, alignment is a map from source totarget positions:
A : {1, . . . ,m} ! {0, . . . , l} Alignment variable: a = a1, . . . , am consists of associations j ! i = aj, from
source position j to target position i = aj
.
We may include null word alignments, that is aj
= 0 to account for sourcewords not aligned to any target word. Hence, only (l + 1)m possiblealignments.
M. Federico MT 2016
8Word Alignment Model
In SMT we will model the translation probability Pr(f | e) by summing theprobabilities of all possible (l+1)m hidden alignments a between the source andthe target strings:
Pr(f | e) =X
a
Pr(f ,a | e) (1)
Hence we will consider statistical word alignment models:
Pr(f ,a | e) = p
(f ,a | e)
defined by specific sets of parameters .
The art of statistical modelling consists in designing statistical models whichcapture the relevant properties of the considered phenomenon, in our case therelationship between a source language string and a target language string.
There are 5 models of increasing complexity (=number of parameters)
M. Federico MT 2016
9Word Alignment Models
In order to find automatic methods to learn word alignments from data we usemathematical models that explain how translations are generated.
The way models explain translations may appear very nave if not silly!Indeed they are very simplistic ...
However, simple explanations often do work better than complex ones!
We need to be a little bit formal here, just to give names to ingredients wewill use in our recipes to learn word alignments:
English sentence e is a sequence of l words French sentence f is a sentence of m words Word alignment a is a map from m positions to l + 1 positions
We will have to relax a bit our conception of sentence:it is just a sequence of words, which might have or not sense at all...
M. Federico MT 2016
10Word Alignment Models
There are five models, of increasing complexity, that explain how a translationand an alignment can be generated from a foreign sentence.
Alignment ModelPr(a,f|e)e a,f
Complexity refers to the amount of parameters that define the model!
We start from the simplest model, called Model 1!
M. Federico MT 2016
11Model 1
Alignment ModelPr(a,f|e)e a,f
Model 1 generates the translation and the alignment as follows:
1. guess the length m of f on the basis of the length l of e
2. for each position j in f repeat the following two steps:
(a) randomly pick a corresponding position i in e(b) generate word j of f by picking a translation of word i in e
Step 1 is executed by using a translation length predictorStep 2.(a) is performed by throwing a dice with l + 1 faces 1
Step 2.(b) is carried out by using a word translation table
1We want to include the null word.
M. Federico MT 2016
12On Probability Factorization
Chain RuleThe prob. of a sequence of events e = e1, e2, e3, ....el can be factorized as:
Pr(e = e1, e2, e3, . . . el) = Pr(e1) Pr(e2 | e1) Pr(e3 | e1, e2) . . .Pr(e
l
| e1, . . . , el1)
The joint probability is factorized over single event probabilities Factors however introduce dependencies of increasing complexity
the last factor has the same complexity of the complete joint probability!
There are two basic approximations for sequential models which eliminate dependencies in the conditional part of the chain factors
Notice that for non-sequential events, we might change the order of factors, e.g:
Pr(f ,a | e) = Pr(a, f | e) = Pr(a | e) Pr(f | e,a)
M. Federico MT 2016
13Basic Sequential Models
Bag-o