example of parallel corpus machine translation: word ... · pdf filemachine translation: word...

Click here to load reader

Post on 15-Feb-2019

230 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

Machine Translation:Word Alignment Problem

Marcello FedericoFBK, Trento - Italy

2016

M. Federico MT 2016

1Outline

Word alignments Word alignment models Alignment search Alignment estimation EM algorithm Model 2 Fertility alignment models HMM alignment models

This part contains advanced material (marked with *) suited to students interestedin the mathematical details of the presented models.

M. Federico MT 2016

2Example of Parallel Corpus

Darum liegt die Verantwortungfur das Erreichen desEffizienzzieles und der damiteinhergehenden CO2 -Reduzierungbei der Gemeinschaft , dienamlich dann tatig wird ,wenn das Ziel besser durchgemeinschaftliche Massnahmenerreicht werden kann . Undgenaugenommen steht hier dieGlaubwurdigkeit der EU auf demSpiel .

That is why the responsibilityfor achieving the efficiencytarget and at the same timereducing CO2 lies with theCommunity , which in fact takesaction when an objective canbe achieved more effectively byCommunity measures .Strictly speaking , it is thecredibility of the EU that is atstake here .

Notice dierent positions of corresponding verb groups.

MT has to take into account word re-ordering!

M. Federico MT 2016

3Word Alignments

Let us considers possible alignments a between words in f and e.

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since32 84 965 7

blowwillwindchillyeasternaneveningtomorrow

M. Federico MT 2016

3Word Alignments

Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e.

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since32 84 965 7

blowwillwindchillyeasternaneveningtomorrow

M. Federico MT 2016

3Word Alignments

Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since32 84 965 7

blowwillwindchillyeasternaneveningtomorrow

M. Federico MT 2016

3Word Alignments

Ley us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL)

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since30 2 84 965 7

blowwillwindchillyeasternaneveningtomorrowNULL

M. Federico MT 2016

3Word Alignments

Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL) These and even more general alignments are machine learnable.

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since30 2 84 965 7

blowwillwindchillyeasternaneveningtomorrowNULL

M. Federico MT 2016

3Word Alignments

Let us considers possible alignments a between words in f and e. Typically, alignments are restricted to maps between positions of f and of e. Some source words might be not aligned (=virtually aligned with NULL) These and even more general alignments are machine learnable. Notice also that alignments induce word re-ordering

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since30 2 84 965 7

blowwillwindchillyeasternaneveningtomorrowNULL

M. Federico MT 2016

4Word Alignment: Matrix Representation

blow 9 will 8 wind 7

chilly 6 eastern 5

an 4 evening 3 tomorrow 2

since 1

1 2 3 4 5 6 7 8 9

dalla

serata

di domani

soffiera

un fblackdo

vento

orientale

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since32 84 965 7

blowwillwindchillyeasternaneveningtomorrow

M. Federico MT 2016

4Word Alignment: Matrix Representation

blow 9 will 8 wind 7

chilly 6 eastern 5

an 4 evening 3 tomorrow 2

since 1 NULL 0

1 2 3 4 5 6 7 8 9

dalla

serata

di domani

soffiera

un fblackdo

vento

orientale

dalla1

orientaledomani unserata soffier freddo ventodi98765432

1

since30 2 84 965 7

blowwillwindchillyeasternaneveningtomorrowNULL

M. Federico MT 2016

5Word Alignment: Direct Alignment

A : {1, . . . ,m} ! {1, . . . , l}

implemented6 been5 has4

program3 the2 and1

position 1 2 3 4 5 6 7

il programma

e stato

messo

in pratica

We allow only one link (point) in each column. Some columns may be empty.

M. Federico MT 2016

6Word Alignment: Inverted Alignment

A : {1, . . . , l} ! {1, . . . ,m}

people6 aborigenal5

the4 of3

territory2 the1

position 1 2 3 4

il territorio

degli

autoctoni

You can get a direct alignment by swapping source and target sentence.

M. Federico MT 2016

7Alignment Variable

Modelling the alignment as an arbitrary relation between source and targetlanguage is very general but computationally unfeasible: 2lm possiblealignments!

A generally applied restriction is to let each source word be assigned to exactlyone target word(see Example 2). Hence, alignment is a map from source totarget positions:

A : {1, . . . ,m} ! {0, . . . , l} Alignment variable: a = a1, . . . , am consists of associations j ! i = aj, from

source position j to target position i = aj

.

We may include null word alignments, that is aj

= 0 to account for sourcewords not aligned to any target word. Hence, only (l + 1)m possiblealignments.

M. Federico MT 2016

8Word Alignment Model

In SMT we will model the translation probability Pr(f | e) by summing theprobabilities of all possible (l+1)m hidden alignments a between the source andthe target strings:

Pr(f | e) =X

a

Pr(f ,a | e) (1)

Hence we will consider statistical word alignment models:

Pr(f ,a | e) = p

(f ,a | e)

defined by specific sets of parameters .

The art of statistical modelling consists in designing statistical models whichcapture the relevant properties of the considered phenomenon, in our case therelationship between a source language string and a target language string.

There are 5 models of increasing complexity (=number of parameters)

M. Federico MT 2016

9Word Alignment Models

In order to find automatic methods to learn word alignments from data we usemathematical models that explain how translations are generated.

The way models explain translations may appear very nave if not silly!Indeed they are very simplistic ...

However, simple explanations often do work better than complex ones!

We need to be a little bit formal here, just to give names to ingredients wewill use in our recipes to learn word alignments:

English sentence e is a sequence of l words French sentence f is a sentence of m words Word alignment a is a map from m positions to l + 1 positions

We will have to relax a bit our conception of sentence:it is just a sequence of words, which might have or not sense at all...

M. Federico MT 2016

10Word Alignment Models

There are five models, of increasing complexity, that explain how a translationand an alignment can be generated from a foreign sentence.

Alignment ModelPr(a,f|e)e a,f

Complexity refers to the amount of parameters that define the model!

We start from the simplest model, called Model 1!

M. Federico MT 2016

11Model 1

Alignment ModelPr(a,f|e)e a,f

Model 1 generates the translation and the alignment as follows:

1. guess the length m of f on the basis of the length l of e

2. for each position j in f repeat the following two steps:

(a) randomly pick a corresponding position i in e(b) generate word j of f by picking a translation of word i in e

Step 1 is executed by using a translation length predictorStep 2.(a) is performed by throwing a dice with l + 1 faces 1

Step 2.(b) is carried out by using a word translation table

1We want to include the null word.

M. Federico MT 2016

12On Probability Factorization

Chain RuleThe prob. of a sequence of events e = e1, e2, e3, ....el can be factorized as:

Pr(e = e1, e2, e3, . . . el) = Pr(e1) Pr(e2 | e1) Pr(e3 | e1, e2) . . .Pr(e

l

| e1, . . . , el1)

The joint probability is factorized over single event probabilities Factors however introduce dependencies of increasing complexity

the last factor has the same complexity of the complete joint probability!

There are two basic approximations for sequential models which eliminate dependencies in the conditional part of the chain factors

Notice that for non-sequential events, we might change the order of factors, e.g:

Pr(f ,a | e) = Pr(a, f | e) = Pr(a | e) Pr(f | e,a)

M. Federico MT 2016

13Basic Sequential Models

Bag-o