introduction to machine translation and phrase-based...

122
Introduction to Machine Translation and Phrase-Based Machine Translation Aleˇ s Tamchyna Charles University in Prague September 8, 2015 Aleˇ s Tamchyna Introduction to MT September 8, 2015 1 / 33

Upload: others

Post on 17-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Introduction to Machine Translation andPhrase-Based Machine Translation

Ales Tamchyna

Charles University in Prague

September 8, 2015

Ales Tamchyna Introduction to MT September 8, 2015 1 / 33

Page 2: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.

Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 3: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 4: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.

Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 5: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 6: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.

Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 7: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Other Approaches to Machine Translation

Our topic: phrase-based MT.(Some) other approaches:

Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)

Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)

Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)

Ales Tamchyna Introduction to MT September 8, 2015 2 / 33

Page 8: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 9: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 10: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 11: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 12: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 13: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Where to Find More?Books:

Ondrej Bojar: Cestina a strojovy preklad

Philipp Koehn: Statistical Machine Translation

Online:

http://www.statmt.org/

http://www.statmt.org/moses/

http://mttalks.ufal.cz/

Ales Tamchyna Introduction to MT September 8, 2015 3 / 33

Page 14: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.

E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 15: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 16: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 17: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.

Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 18: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 19: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 20: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Probability – Quick Refresher

P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?

P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).

P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?

P(A|B) =P(A ∩ B)

P(B)

Bayes’ Theorem (inverse probability):

P(B|A) =P(A|B)P(B)

P(A)

Ales Tamchyna Introduction to MT September 8, 2015 4 / 33

Page 21: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Goal: Phrase-Based Machine Translation

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

Ales Tamchyna Introduction to MT September 8, 2015 5 / 33

Page 22: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

The Essential Ingredient

Ales Tamchyna Introduction to MT September 8, 2015 6 / 33

Page 23: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 24: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 25: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 26: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 27: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.

Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 28: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Our Own Parallel Corpus

zluty

zluty

byl

ten

papousek

the

parrot

was

yellow

yellow

zluty

zluty

pes

yellow

yellow

dog

What does ”zluty” mean in English?

We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.

Ales Tamchyna Introduction to MT September 8, 2015 7 / 33

Page 29: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimating Translation Probability

If we had the alignment:

zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

P(”yellow”|”zluty”) =c(yellow→ zluty)

c(zluty)=

2

2= 1

P(∗|”zluty”) = 0

We will align the English words to Czech words.

Ales Tamchyna Introduction to MT September 8, 2015 8 / 33

Page 30: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimating Translation Probability

If we had the alignment:

zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

P(”yellow”|”zluty”) =c(yellow→ zluty)

c(zluty)=

2

2= 1

P(∗|”zluty”) = 0

We will align the English words to Czech words.

Ales Tamchyna Introduction to MT September 8, 2015 8 / 33

Page 31: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimating Translation Probability

If we had the alignment:

zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

P(”yellow”|”zluty”) =c(yellow→ zluty)

c(zluty)=

2

2= 1

P(∗|”zluty”) = 0

We will align the English words to Czech words.

Ales Tamchyna Introduction to MT September 8, 2015 8 / 33

Page 32: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Our approach: distribute our one alignment link among all words.

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 33: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

How to weight these ”partial” links? Use translation probability P(e|f).

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 34: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Initially, we don’t know anything ⇒ start with uniform estimates.

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 35: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Let’s sum up the evidence that ”yellow” aligns to ”zluty”:c(yellow→ zluty) = 1/4 + 1/2 = 3/4

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 36: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

...and the evidence that ”yellow” aligns to other words...c(yellow→ zluty) = 3/4c(yellow→ byl) = 1/4c(yellow→ ten) = 1/4c(yellow→ papousek) = 1/4c(yellow→ pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 37: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .

c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 38: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .

c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 39: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .

c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 40: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 41: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/2

3/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Normalize to get the conditional probability distributions:P(yellow|zluty) = 3/8 P(yellow|byl) = 1/4 P(yellow|pes) = 1/2P(was|zluty) = 1/8 P(was|byl) = 1/4 P(was|pes) = 0

P(parrot|zluty) = 1/8 P(parrot|byl) = 1/4 · · · P(parrot|pes) = 0P(the|zluty) = 1/8 P(the|byl) = 1/4 P(the|pes) = 0P(dog|zluty) = 2/8 P(dog|byl) = 0 P(dog|pes) = 1/2

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 42: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/2

3/9

2/9

2/9

2/9

3/7

4/7

3/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

What next? Iterate.

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 43: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/7

3/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 44: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 45: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 46: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 47: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 48: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

P(yellow|zluty) = 0.5 P(yellow|byl) = 0.206 P(yellow|pes) = 0.462P(was|zluty) = 0.094 P(was|byl) = 0.265 P(was|pes) = 0

P(parrot|zluty) = 0.094 P(parrot|byl) = 0.265 · · · P(parrot|pes) = 0P(the|zluty) = 0.094 P(the|byl) = 0.265 P(the|pes) = 0P(dog|zluty) = 0.219 P(dog|byl) = 0 P(dog|pes) = 0.538

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 49: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Estimation of IBM Model 1zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

1/4

1/4

1/4

1/4

1/2

1/2

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4

1/4 1/2

1/2

3/8

1/4

1/4

1/4

3/8

1/23/9

2/9

2/9

2/9

3/7

4/73/9

2/9

2/9

2/9

3/7

4/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7

1/7

2/7

2/7

2/7 1/3

2/3

0.447

0.184

0.184

0.184

0.520

0.480

The algorithm: expectation maximization (EM)

1 Initialize the model with uniform probabilities.

2 Apply the model to the data (expectation step).

3 Re-estimate the model from the data (maximization step).

4 Go to 2 and repeat until probabilities stop changing.

Ales Tamchyna Introduction to MT September 8, 2015 9 / 33

Page 50: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word-Based Models

IBM Models 1-5 (increasing model complexity)

Brown et al. (1993): The Mathematics of Statistical MachineTranslation: Parameter Estimation

Originally developed for word-based translation

Higher models account for:I word position (IBM 1 only models the lexical translation probability)I fertility (number of English words aligned to a foreign word)

Today: used for word alignment

Ales Tamchyna Introduction to MT September 8, 2015 10 / 33

Page 51: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

IBM Model 1

We treat the alignment between words as a hidden variable.

Alignment is a function; each English word (position) picks a foreigncounterpart, e.g. a(4) = 1 (”yellow” aligns to ”zluty” in the firstsentence).

IBM Model 1 only models lexical translation probability, so formally,the probability of sentence e = (e1, . . . , em) given f = (f1, . . . , fn) is:

P(e|f) =n∑

a1=0

· · ·n∑

am=0

ε

(n + 1)m

m∏j=1

t(ej |faj ) =ε

(n + 1)m

m∏j=1

n∑i=0

t(ej |fi )

EM finds such an alignment which maximizes the (log) likelihood ofour data.

Ales Tamchyna Introduction to MT September 8, 2015 11 / 33

Page 52: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

NULL Token

NULL

vidım

problem

I

see

a

problem

Ales Tamchyna Introduction to MT September 8, 2015 12 / 33

Page 53: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

NULL Token

NULL

vidım

problem

I

see

a

problem

Do we align the indefinite article to all Czech nouns?

Ales Tamchyna Introduction to MT September 8, 2015 12 / 33

Page 54: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

NULL Token

NULL

vidım

problem

I

see

a

problem

Align words which are dropped in Czech to NULL.

Ales Tamchyna Introduction to MT September 8, 2015 12 / 33

Page 55: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

From IBM Models to Word Alignment

IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.

Alignment is a function ⇒ one English word cannot align to more thanone foreign word.

psacı

stroj

typewriter vysavac vacuum

cleaner

There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.

Ales Tamchyna Introduction to MT September 8, 2015 13 / 33

Page 56: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

From IBM Models to Word Alignment

IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.

psacı

stroj

typewriter vysavac vacuum

cleaner

There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.

Ales Tamchyna Introduction to MT September 8, 2015 13 / 33

Page 57: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

From IBM Models to Word Alignment

IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.

psacı

stroj

typewriter vysavac vacuum

cleaner

There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.

Ales Tamchyna Introduction to MT September 8, 2015 13 / 33

Page 58: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

From IBM Models to Word Alignment

IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.

psacı

stroj

typewriter vysavac vacuum

cleaner

There is no way that we can get this word alignment with our currentmodels.

Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.

Ales Tamchyna Introduction to MT September 8, 2015 13 / 33

Page 59: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

From IBM Models to Word Alignment

IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.

psacı

stroj

typewriter vysavac vacuum

cleaner

There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.

Ales Tamchyna Introduction to MT September 8, 2015 13 / 33

Page 60: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Alignment Symmetrization

A heuristic procedure, several possible strategies.

Start with an intersection of the alignment links.

Gradually add links from the union which are allowed by the chosencriteria.

psacı

stroj

typewriter vysavac vacuum

cleaner

Ales Tamchyna Introduction to MT September 8, 2015 14 / 33

Page 61: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Alignment Symmetrization

A heuristic procedure, several possible strategies.

Start with an intersection of the alignment links.

Gradually add links from the union which are allowed by the chosencriteria.

psacı

stroj

typewriter vysavac vacuum

cleaner

Ales Tamchyna Introduction to MT September 8, 2015 14 / 33

Page 62: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

zluty

byl

ten

papousek

the

parrot

was

yellow

zluty

pes

yellow

dog

Ales Tamchyna Introduction to MT September 8, 2015 15 / 33

Page 63: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

Ales Tamchyna Introduction to MT September 8, 2015 15 / 33

Page 64: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

Let’s go from words to phrases.

Ales Tamchyna Introduction to MT September 8, 2015 15 / 33

Page 65: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 66: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 67: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 68: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 69: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 70: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 71: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Extraction

žlutý

byl

ten

papoušek

theparrotwas

yellow

Ales Tamchyna Introduction to MT September 8, 2015 16 / 33

Page 72: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Building a Phrase Table (Translation Model)

Obtain some parallel data (sentence aligned).

Run word alignment (IBM models) in both directions, symmetrize.

Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).

Count phrase (co-)occurrences to estimate phrase translationprobabilities:

P(e|f) =c(e&f)

c(f)

Tiny example:

zluty papousek ||| a yellow parrot ||| 0.1

zluty papousek ||| yellow parakeet ||| 0.1

zluty papousek ||| yellow parrot ||| 0.6

zluty papousek ||| yellowish parrot ||| 0.2

Ales Tamchyna Introduction to MT September 8, 2015 17 / 33

Page 73: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Building a Phrase Table (Translation Model)

Obtain some parallel data (sentence aligned).

Run word alignment (IBM models) in both directions, symmetrize.

Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).

Count phrase (co-)occurrences to estimate phrase translationprobabilities:

P(e|f) =c(e&f)

c(f)

Tiny example:

zluty papousek ||| a yellow parrot ||| 0.1

zluty papousek ||| yellow parakeet ||| 0.1

zluty papousek ||| yellow parrot ||| 0.6

zluty papousek ||| yellowish parrot ||| 0.2

Ales Tamchyna Introduction to MT September 8, 2015 17 / 33

Page 74: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Building a Phrase Table (Translation Model)

Obtain some parallel data (sentence aligned).

Run word alignment (IBM models) in both directions, symmetrize.

Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).

Count phrase (co-)occurrences to estimate phrase translationprobabilities:

P(e|f) =c(e&f)

c(f)

Tiny example:

zluty papousek ||| a yellow parrot ||| 0.1

zluty papousek ||| yellow parakeet ||| 0.1

zluty papousek ||| yellow parrot ||| 0.6

zluty papousek ||| yellowish parrot ||| 0.2

Ales Tamchyna Introduction to MT September 8, 2015 17 / 33

Page 75: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Building a Phrase Table (Translation Model)

Obtain some parallel data (sentence aligned).

Run word alignment (IBM models) in both directions, symmetrize.

Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).

Count phrase (co-)occurrences to estimate phrase translationprobabilities:

P(e|f) =c(e&f)

c(f)

Tiny example:

zluty papousek ||| a yellow parrot ||| 0.1

zluty papousek ||| yellow parakeet ||| 0.1

zluty papousek ||| yellow parrot ||| 0.6

zluty papousek ||| yellowish parrot ||| 0.2

Ales Tamchyna Introduction to MT September 8, 2015 17 / 33

Page 76: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Building a Phrase Table (Translation Model)

Obtain some parallel data (sentence aligned).

Run word alignment (IBM models) in both directions, symmetrize.

Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).

Count phrase (co-)occurrences to estimate phrase translationprobabilities:

P(e|f) =c(e&f)

c(f)

Tiny example:

zluty papousek ||| a yellow parrot ||| 0.1

zluty papousek ||| yellow parakeet ||| 0.1

zluty papousek ||| yellow parrot ||| 0.6

zluty papousek ||| yellowish parrot ||| 0.2

Ales Tamchyna Introduction to MT September 8, 2015 17 / 33

Page 77: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

How do we decide which of these translations is best?

Ales Tamchyna Introduction to MT September 8, 2015 18 / 33

Page 78: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

How do we decide which of these translations is best?

Ales Tamchyna Introduction to MT September 8, 2015 18 / 33

Page 79: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

The Noisy Channel ModelWarren Weaver (1955):

When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’

Transmitter ReceiverNoisy ChannelEnglish Russian

We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).

e = arg maxe

P(e|f) = arg maxe

P(f|e)P(e)

P(f)

= arg maxe

P(f|e)︸ ︷︷ ︸Translation model

P(e)︸︷︷︸Language model

Ales Tamchyna Introduction to MT September 8, 2015 19 / 33

Page 80: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

The Noisy Channel ModelWarren Weaver (1955):

When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’

Transmitter ReceiverNoisy ChannelEnglish Russian

We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).

e = arg maxe

P(e|f) = arg maxe

P(f|e)P(e)

P(f)

= arg maxe

P(f|e)︸ ︷︷ ︸Translation model

P(e)︸︷︷︸Language model

Ales Tamchyna Introduction to MT September 8, 2015 19 / 33

Page 81: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

The Noisy Channel ModelWarren Weaver (1955):

When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’

Transmitter ReceiverNoisy ChannelEnglish Russian

We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).

e = arg maxe

P(e|f) = arg maxe

P(f|e)P(e)

P(f)

= arg maxe

P(f|e)︸ ︷︷ ︸Translation model

P(e)︸︷︷︸Language model

Ales Tamchyna Introduction to MT September 8, 2015 19 / 33

Page 82: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

The Noisy Channel ModelWarren Weaver (1955):

When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’

Transmitter ReceiverNoisy ChannelEnglish Russian

We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).

e = arg maxe

P(e|f) = arg maxe

P(f|e)P(e)

P(f)

= arg maxe

P(f|e)︸ ︷︷ ︸Translation model

P(e)︸︷︷︸Language model

Ales Tamchyna Introduction to MT September 8, 2015 19 / 33

Page 83: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Noisy Channel Model

e = arg maxe

P(f|e)P(e)

P(e) is the language model (LM).

P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).

A useful decomposition:

I TM: How accurately does the translation match the input?(Parallel data needed for training.)

I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)

So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)

Ales Tamchyna Introduction to MT September 8, 2015 20 / 33

Page 84: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Noisy Channel Model

e = arg maxe

P(f|e)P(e)

P(e) is the language model (LM).

P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).

A useful decomposition:

I TM: How accurately does the translation match the input?(Parallel data needed for training.)

I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)

So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)

Ales Tamchyna Introduction to MT September 8, 2015 20 / 33

Page 85: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Noisy Channel Model

e = arg maxe

P(f|e)P(e)

P(e) is the language model (LM).

P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).

A useful decomposition:

I TM: How accurately does the translation match the input?(Parallel data needed for training.)

I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)

So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)

Ales Tamchyna Introduction to MT September 8, 2015 20 / 33

Page 86: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model

Don’t miss the lecture on language modelling tomorrow (KennethHeafield).

The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)

We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.

Side note – chain rule (example for 4 variables):

P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)

Let’s formulate the n-gram LM:

P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)

≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)

Ales Tamchyna Introduction to MT September 8, 2015 21 / 33

Page 87: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model

Don’t miss the lecture on language modelling tomorrow (KennethHeafield).

The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)

We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.

Side note – chain rule (example for 4 variables):

P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)

Let’s formulate the n-gram LM:

P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)

≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)

Ales Tamchyna Introduction to MT September 8, 2015 21 / 33

Page 88: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model

Don’t miss the lecture on language modelling tomorrow (KennethHeafield).

The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)

We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.

Side note – chain rule (example for 4 variables):

P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)

Let’s formulate the n-gram LM:

P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)

≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)

Ales Tamchyna Introduction to MT September 8, 2015 21 / 33

Page 89: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model

Don’t miss the lecture on language modelling tomorrow (KennethHeafield).

The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)

We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.

Side note – chain rule (example for 4 variables):

P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)

Let’s formulate the n-gram LM:

P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)

≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)

Ales Tamchyna Introduction to MT September 8, 2015 21 / 33

Page 90: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model

Don’t miss the lecture on language modelling tomorrow (KennethHeafield).

The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)

We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.

Side note – chain rule (example for 4 variables):

P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)

Let’s formulate the n-gram LM:

P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)

≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)

Ales Tamchyna Introduction to MT September 8, 2015 21 / 33

Page 91: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model: Example

A 3-gram language model (only depend on 2 previous words).

P(”thank you very much”) = P(”thank”|”<s>”)

× P(”you”|”<s>thank”)

× P(”very”|”thank you”)

× P(”much”|”you very”)

To estimate e.g. P(”very”|”thank you”), we go through the data andcount:

How many times we saw ”thank you” followed by ”very”.

How many times we saw ”thank you” (followed by anything).

P(”very”|”thank you”) =c(”thank you very”)

c(”thank you”)

Smoothing!

Ales Tamchyna Introduction to MT September 8, 2015 22 / 33

Page 92: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model: Example

A 3-gram language model (only depend on 2 previous words).

P(”thank you very much”) = P(”thank”|”<s>”)

× P(”you”|”<s>thank”)

× P(”very”|”thank you”)

× P(”much”|”you very”)

To estimate e.g. P(”very”|”thank you”), we go through the data andcount:

How many times we saw ”thank you” followed by ”very”.

How many times we saw ”thank you” (followed by anything).

P(”very”|”thank you”) =c(”thank you very”)

c(”thank you”)

Smoothing!

Ales Tamchyna Introduction to MT September 8, 2015 22 / 33

Page 93: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model: Example

A 3-gram language model (only depend on 2 previous words).

P(”thank you very much”) = P(”thank”|”<s>”)

× P(”you”|”<s>thank”)

× P(”very”|”thank you”)

× P(”much”|”you very”)

To estimate e.g. P(”very”|”thank you”), we go through the data andcount:

How many times we saw ”thank you” followed by ”very”.

How many times we saw ”thank you” (followed by anything).

P(”very”|”thank you”) =c(”thank you very”)

c(”thank you”)

Smoothing!

Ales Tamchyna Introduction to MT September 8, 2015 22 / 33

Page 94: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Language Model: Example

A 3-gram language model (only depend on 2 previous words).

P(”thank you very much”) = P(”thank”|”<s>”)

× P(”you”|”<s>thank”)

× P(”very”|”thank you”)

× P(”much”|”you very”)

To estimate e.g. P(”very”|”thank you”), we go through the data andcount:

How many times we saw ”thank you” followed by ”very”.

How many times we saw ”thank you” (followed by anything).

P(”very”|”thank you”) =c(”thank you very”)

c(”thank you”)

Smoothing!

Ales Tamchyna Introduction to MT September 8, 2015 22 / 33

Page 95: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Log-Linear ModelBegin with the noisy channel model:

e = arg maxe

P(f|e)P(e)

= arg maxe

log(P(f|e)P(e))

= arg maxe

log(P(f|e)) + log(P(e))

Perhaps the importance of LM vs. TM should be weighted differently?

e = arg maxe

λTM log(P(f|e)) + λLM log(P(e))

We could add other features (besides LM and TM), so generally:

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 23 / 33

Page 96: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Log-Linear ModelBegin with the noisy channel model:

e = arg maxe

P(f|e)P(e)

= arg maxe

log(P(f|e)P(e))

= arg maxe

log(P(f|e)) + log(P(e))

Perhaps the importance of LM vs. TM should be weighted differently?

e = arg maxe

λTM log(P(f|e)) + λLM log(P(e))

We could add other features (besides LM and TM), so generally:

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 23 / 33

Page 97: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Log-Linear ModelBegin with the noisy channel model:

e = arg maxe

P(f|e)P(e)

= arg maxe

log(P(f|e)P(e))

= arg maxe

log(P(f|e)) + log(P(e))

Perhaps the importance of LM vs. TM should be weighted differently?

e = arg maxe

λTM log(P(f|e)) + λLM log(P(e))

We could add other features (besides LM and TM), so generally:

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 23 / 33

Page 98: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Log-Linear Model: Features

We now have the freedom to add new features. In PBMT, we typically use:

Phrase translation probability, both direct and inverse:I P(e|f)I P(f|e)

Lexical translation probability (direct and inverse):I Plex(e|f)I Plex(f|e)

Language model probability:I P(e)

Phrase penalty.

Word penalty.

Distortion penalty.

Ales Tamchyna Introduction to MT September 8, 2015 24 / 33

Page 99: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)

Ales Tamchyna Introduction to MT September 8, 2015 25 / 33

Page 100: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)

P(”modry autobus pristal na Marsu”|”a blue bus lands on Mars”) = 1

P(”a blue bus lands on Mars”|”modry autobus pristal na Marsu”) = 1

Is that a reliable probability estimate?

Ales Tamchyna Introduction to MT September 8, 2015 25 / 33

Page 101: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)

P(”; distortion carried - over”|”; zkreslenı”) = 1

P(”; zkreslenı”|”; distortion carried - over”) = 1

Data from the “wild” are noisy. Word alignment contains errors.This is a real phrase pair from our best English-Czech system.Both P(e|f) and P(f|e) say that this is a perfect translation.

Ales Tamchyna Introduction to MT September 8, 2015 25 / 33

Page 102: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.

Several possible definitions, e.g.:

Plex(e|f, a) =le∏

j=1

1

|i |(i , j) ∈ a|∑∀(i ,j)∈a

w(ej , fi )

psacı

stroj

a

typewriter0.2

0.3

0.1

Plex(”a typewriter”|”psacı stroj”) =

[1

1· 0.1

]·[

1

2· (0.3 + 0.2)

]= 0.025

Ales Tamchyna Introduction to MT September 8, 2015 26 / 33

Page 103: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:

Plex(e|f, a) =le∏

j=1

1

|i |(i , j) ∈ a|∑∀(i ,j)∈a

w(ej , fi )

psacı

stroj

a

typewriter0.2

0.3

0.1

Plex(”a typewriter”|”psacı stroj”) =

[1

1· 0.1

]·[

1

2· (0.3 + 0.2)

]= 0.025

Ales Tamchyna Introduction to MT September 8, 2015 26 / 33

Page 104: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:

Plex(e|f, a) =le∏

j=1

1

|i |(i , j) ∈ a|∑∀(i ,j)∈a

w(ej , fi )

psacı

stroj

a

typewriter0.2

0.3

0.1

Plex(”a typewriter”|”psacı stroj”) =

[1

1· 0.1

]·[

1

2· (0.3 + 0.2)

]= 0.025

Ales Tamchyna Introduction to MT September 8, 2015 26 / 33

Page 105: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:

Plex(e|f, a) =le∏

j=1

1

|i |(i , j) ∈ a|∑∀(i ,j)∈a

w(ej , fi )

psacı

stroj

a

typewriter0.2

0.3

0.1

Plex(”a typewriter”|”psacı stroj”) =

[1

1· 0.1

]·[

1

2· (0.3 + 0.2)

]= 0.025

Ales Tamchyna Introduction to MT September 8, 2015 26 / 33

Page 106: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Lexical Weights (Plex)

Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:

Plex(e|f, a) =le∏

j=1

1

|i |(i , j) ∈ a|∑∀(i ,j)∈a

w(ej , fi )

psacı

stroj

a

typewriter0.2

0.3

0.1

Plex(”a typewriter”|”psacı stroj”) =

[1

1· 0.1

]·[

1

2· (0.3 + 0.2)

]= 0.025

Ales Tamchyna Introduction to MT September 8, 2015 26 / 33

Page 107: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word Penalty

Not all languages use the same number of words on average.

vidım problem ||| I can see a problem

We want to control how many words are generated.

Word penalty simply adds 1 for each produced word in the translation.

Depending on the λ for word penalty, we will either generate shorteror longer outputs.

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 27 / 33

Page 108: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word Penalty

Not all languages use the same number of words on average.

vidım problem ||| I can see a problem

We want to control how many words are generated.

Word penalty simply adds 1 for each produced word in the translation.

Depending on the λ for word penalty, we will either generate shorteror longer outputs.

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 27 / 33

Page 109: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word Penalty

Not all languages use the same number of words on average.

vidım problem ||| I can see a problem

We want to control how many words are generated.

Word penalty simply adds 1 for each produced word in the translation.

Depending on the λ for word penalty, we will either generate shorteror longer outputs.

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 27 / 33

Page 110: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word Penalty

Not all languages use the same number of words on average.

vidım problem ||| I can see a problem

We want to control how many words are generated.

Word penalty simply adds 1 for each produced word in the translation.

Depending on the λ for word penalty, we will either generate shorteror longer outputs.

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 27 / 33

Page 111: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Word Penalty

Not all languages use the same number of words on average.

vidım problem ||| I can see a problem

We want to control how many words are generated.

Word penalty simply adds 1 for each produced word in the translation.

Depending on the λ for word penalty, we will either generate shorteror longer outputs.

e = arg maxe

∑i

λi fi (e, f)

Ales Tamchyna Introduction to MT September 8, 2015 27 / 33

Page 112: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Penalty

Add 1 for each produced phrase in the translation.

Varying the λ for phrase penalty can lead to more literal(word-by-word) translations (made from a lot of short phrases) or tomore idiomatic outputs (use fewer, longer phrases – if available).

Ales Tamchyna Introduction to MT September 8, 2015 28 / 33

Page 113: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Phrase Penalty

Add 1 for each produced phrase in the translation.

Varying the λ for phrase penalty can lead to more literal(word-by-word) translations (made from a lot of short phrases) or tomore idiomatic outputs (use fewer, longer phrases – if available).

Ales Tamchyna Introduction to MT September 8, 2015 28 / 33

Page 114: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Distortion Penalty

The simplest way to capture phrase reordering.

Can be sufficient for some language pairs (our English→Czechsystems use it).

Several possible definitions, e.g.:

I Distance between the end of the previous phrase (on the source side)and the beginning of the current phrase.

Ales Tamchyna Introduction to MT September 8, 2015 29 / 33

Page 115: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Model Weights

How to get λs for our feature functions?

Usual approach: set them so that we translate some held-out data aswell as possible.

“Tuning”

See the lecture tomorrow: Discriminative Training (Milos Stanojevic)

Ales Tamchyna Introduction to MT September 8, 2015 30 / 33

Page 116: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Model Weights

How to get λs for our feature functions?

Usual approach: set them so that we translate some held-out data aswell as possible.

“Tuning”

See the lecture tomorrow: Discriminative Training (Milos Stanojevic)

Ales Tamchyna Introduction to MT September 8, 2015 30 / 33

Page 117: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Model Weights

How to get λs for our feature functions?

Usual approach: set them so that we translate some held-out data aswell as possible.

“Tuning”

See the lecture tomorrow: Discriminative Training (Milos Stanojevic)

Ales Tamchyna Introduction to MT September 8, 2015 30 / 33

Page 118: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Model Weights

How to get λs for our feature functions?

Usual approach: set them so that we translate some held-out data aswell as possible.

“Tuning”

See the lecture tomorrow: Discriminative Training (Milos Stanojevic)

Ales Tamchyna Introduction to MT September 8, 2015 30 / 33

Page 119: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

Search for the best translation.

Ales Tamchyna Introduction to MT September 8, 2015 31 / 33

Page 120: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Progress Check

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

John gave Mary a kiss yesterday

Search for the best translation.

Ales Tamchyna Introduction to MT September 8, 2015 31 / 33

Page 121: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Translation Process: Generate Translation Options

Jan včera políbil Marii

John

Johny

yesterday kissed

gave a kiss to

Mary

gave Mary a kiss

Ales Tamchyna Introduction to MT September 8, 2015 32 / 33

Page 122: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based

Translation Process: Beam Search

...

John

Johny

yesterday

Mary

gave Mary a kiss

gave a kiss to

kissed

Mary

yesterday

...

......

yesterday

Ales Tamchyna Introduction to MT September 8, 2015 33 / 33