machine translation

72
Machine Translation Om Damani (Ack: Material taken from JurafskyMartin 2 nd Ed., Brown et. al. 1993)

Upload: sloane-bryan

Post on 31-Dec-2015

45 views

Category:

Documents


2 download

DESCRIPTION

Machine Translation. Om Damani (Ack: Material taken from JurafskyMartin 2 nd Ed., Brown et. al. 1993). State of the Art. The spirit is willing but the flesh is weak. English-Russian Translation System. Дух охотно готов но плоть слаба. Russian-English Translation System. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Translation

Machine Translation

Om Damani(Ack: Material taken from

JurafskyMartin 2nd Ed., Brown et. al. 1993)

Page 2: Machine Translation

2

The spirit is willing but the flesh is weak

English-Russian Translation System

Дух охотно готов но плоть слаба

Russian-English Translation System

The vodka is good, but the meat is rotten

State of the Art

Babelfish: Spirit is willingly ready but flesh it is weak

Google: The spirit is willing but the flesh is week

Page 3: Machine Translation

3

The spirit is willing but the flesh is weak

Google English-Hindi Translation System

आत्मा� पर शर�र दुर्ब�ल है�

Google Hindi-English Translation System

Spirit on the flesh is weak

State of the Art (English-Hindi) – March 19, 2009

Page 4: Machine Translation

4

Is state of the art so bad

Google English-Hindi Translation System

कल� क� है�लत इतनी� खर�र्ब है�

Google Hindi-English Translation System

The state of the art is so bad

Is State of the Art (English-Hindi) so bad

Page 5: Machine Translation

5

State of the english hindi translation is not so bad

Google English-Hindi Translation System

र�ज्य क� अं�ग्रे�ज़ी� हिहैन्दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�

Google Hindi-English Translation System

State of the English translation of English is not so bad

State of the english-hindi translation is not so bad

OK. Maybe it is __ bad.OK. Maybe it is __ bad.

Page 6: Machine Translation

6

State of the English Hindi translation is not so bad

Google English-Hindi Translation System

र�ज्य मा# अं�ग्रे�जी� से� हिंहै'दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�

Google Hindi-English Translation System

English to Hindi translation in the state is not so bad

State of the English-Hindi translation is not so bad

OK. Maybe it is __ __ bad.OK. Maybe it is __ __ bad.

र�ज्य क� अं�ग्रे�ज़ी� हिहैन्दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�

Page 7: Machine Translation

7

Your Approach to Machine Translation

Page 8: Machine Translation

8

Translation Approaches

Page 9: Machine Translation

9

Direct Transfer – What Novices do

Page 10: Machine Translation

10

Direct Transfer: Limitations

Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl

Many Bengali poets have sung songs of this land

Final: Many Bengali poets of this land songs have sung

Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl

कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

Morph: कई र्ब�गा�ल� कहिवा-PL,OBL नी� इसे भू,मिमा क� गा�त { गा�ए है�}-PrPer,PlKai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl

Page 11: Machine Translation

11

Syntax Transfer (Analysis-Transfer-Generation)

Here phrases NP, VP etc. can be arbitrarily large

Page 12: Machine Translation

12

Syntax Transfer Limitations

He went to Patna -> Vah Patna gaya

He went to Patil -> Vah Patil ke pas gaya

Translation of went depends on the semantics of the object of went

Fatima eats salad with spoon – what happens if you change spoon

Semantic properties need to be included in transfer rules – Semantic Transfer

Page 13: Machine Translation

13

Interlingua Based Transfer

you

this

farmer

agtobj

pur

plc

contact

nam

orregion

khatav

manchar

taluka

nam :01

For this, you contact the farmers of Manchar region or of Khatav taluka.

In theory: N analysis and N transfer modules in stead of N2

In practice: Amazingly complex system to tackle N2 language pairs

Page 14: Machine Translation

14

Difficulties in Translation – Language Divergence (Concepts from Dorr 1993, Text/Figures from Dave, Parikh and

Bhattacharyya 2002)

Constituent Order Prepositional Stranding Null Subject

Conflational Divergence Categorical Divergence

Page 15: Machine Translation

15

Lost in Translation: We are talking mostly about syntax, not semantics, or pragmatics

You: Could you give me a glass of waterRobot: Yes.….wait..wait..nothing happens..wait……Aha, I see…You: Will you give me a glass of water…wait…wait..wait..

Image from http://inicia.es/de/rogeribars/blog/lost_in_translation.gif

Page 16: Machine Translation

16

CheckPoint State of the Art Different Approaches Translation Difficulty Need for a novel approach

Page 17: Machine Translation

17

Statistical Machine Translation: Most ridiculous idea ever

Consider all possible partitions of a sentence.For a given partition,

Consider all possible translations of each part.Consider all possible combinations of all possible translationsConsider all possible permutations of each combination

And somehow select the best partition/translation/permutation

कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/

Many Bengali Poets this land of have sung poem

Several Bengali to this place ‘s sing songs

Many poets from Bangal

in this space song sung

Poets from Bangladesh

farm have sung songs

To this space have sung songs of many poets from Bangal

Page 18: Machine Translation

18

How many combinations are we talking about

Number of choices for a N word sentence

N=20 ??

Number of possible chess games

Page 19: Machine Translation

19

How do we get the Phrase TableCollect large amount of bi-lingual parallel text.For each sentence pair, Consider all possible partitions of both sentences For a given partition pair, Consider all possible mapping between parts (phrases) on two sideSomehow assign the probability to each phrase pair

इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए

For this you contact the farmers of Manchar region

Page 20: Machine Translation

20

Data Sparsity Problems in Creating Phrase Table

Sunil is eating mangoe -> Sunil aam khata haiNoori is eating banana -> Noori kela khati haiSunil is eating banana -> We need examples of everyone eating everything !!

We want to figure out that eating can be either khata hai or khati hai

And let Language Model select from ‘Sunil kela khata hai’ and ‘Sunil kela khati hai’

Select well-formed sentences among all candidates using LM

Page 21: Machine Translation

21

Formulating the Problem

. A language model to compute P(E)

. A translation model to compute P(F|E)

. A decoder, which is given F and produces the most probable E

Page 22: Machine Translation

22

P(F|E) vs. P(E|F)

P(F|E) is the translation probability – we need to look at the generationprocess by which <F,E> pair is obtained.

Parts of F correspond to parts of E. With suitable independence assumptions,P(F|E) measures whether all parts of E are covered by F.

E can be quite ill-formed.

It is OK if {P(F|E) for an ill-formed E} is greater than the {P(F|E) for a well formed E}. Multiplication by P(E) should hopefully take care of it.

We do not have that luxury in estimating P(E|F) directly – we will need toensure that well-formed E score higher.

Summary: For computing P(F|E), we may make several independence assumptions that are not valid. P(E) compensated for that.

P( र्ब�रिरश है8 रहै� है�|It is raining) = .02P( र्बरसे�त आ रहै� है�| It is raining) = .03P( र्ब�रिरश है8 रहै� है�|rain is happening) = .420

We need to estimate P(It is raining| र्ब�रिरश है8 रहै� है�) vs. P(rain is happening| र्ब�रिरश है8 रहै� है�)

Page 23: Machine Translation

23

CheckPoint From a parallel corpus, generate

probabilistic phrase table Give a sentence, generate various

candidate translations using the phrase table

Evaluate the candidates using Translation and Language Models

Page 24: Machine Translation

24

What is the meaning of Probability of Translation What is the meaning of P(F|E) By Magic: you simply know P(F|E) for every (E,F) pair –

counting in a parallel corpora Or, each word in E generates one word of F, independent of

every other word in E or F Or, we need a ‘random process’ to generate F from E A semantic graph G is generated from E and F is generated

from G We are no better off. We now have to estimate P(G|E) and P(F|

G) for various G and then combine them – How? We may have a deterministic procedure to convert E to G, in

which case we still need to estimate P(F|G) A parse tree TE is generated from E; TE is transformed to TF;

finally TF is converted into F Can you write the mathematical expression

Page 25: Machine Translation

25

The Generation Process Partition: Think of all possible partitions of the

source language Lexicalization: For a give partition, translate each

phrase into the foreign language Spurious insertion: add foreign words that

are not attributable to any source phrase Reordering: permute the set of all foreign words -

words possibly moving across phrase boundaries

Try writing the probability expression for the generation process

We need the notion of alignment

Page 26: Machine Translation

26

Generation Example: Alignment

Page 27: Machine Translation

27

Simplify Generation: Only 1->Many Alignments allowed

Page 28: Machine Translation

28

AlignmentA function from target position to source position:

The alignment sequence is: 2,3,4,5,6,6,6Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..

To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I+1)J

Page 29: Machine Translation

29

IBM Model 1: Generative Process

Page 30: Machine Translation

30

IBM Model 1: Basic Formulation

),,|(*),|(*)|()|(

:rit togethe Putting

),,|(*),|(),|,(),|(

),|(*)|(),'|(*)|'()|('

AEJFPEJAPEJPEFP

AEJFPEJAPEJAFPEJFP

EJFPEJPEJFPEJPEFP

A

AA

J

Page 31: Machine Translation

31

IBM Model 1: Details

No assumptions. Above formula is exact. Choosing length: P(J|E) = P(J|E,I) = P(J|I) = Choosing Alignment: all alignments equiprobable

Translation Probability

),,|(*),|(*)|()|( AEJFPEJAPEJPEFPA

A

J

jaJ jjeft

IEFP

1

)|(*)1(

)|(

),,,|(*),,|(

),,|(*),|(

),,|(*),|(

11

11

11

11

1

11111

IJjj

J

j

Ijj

IJJIJ

eaJffPeJaaP

eaJfPeJaP

EJAFPEJAP

A

IJjj

J

j

Ijj eaJffPeJaaPEJPEFP ),,,|(*),,|(*)|()|( 1

11

11

11

11

1

1),,|( 1

11

IeJaaP Ij

j

)|(),,,|( 11

11

1 jajefteaJffP IJj

j

Page 32: Machine Translation

32

HMM Alignment All alignments are not equally likely

Can you guess what properties does an alignment have

Alignments tend to be locality preserving – neighboring words tend to get aligned together

We would like P(aj) to depend on aj-1

Page 33: Machine Translation

33

HMM Alignment: Details P(F,A|J,E) decomposed as P(A|

J,E)*P(F|A,J,E) in Model 1 Now we will decompose it

differently (J is implict, not mentioned in conditional

expressions)

Alignment Assumption (Markov): Alignment probability of Jth word P(aj) depends only on the alignment of the previous word aj-1

Translation assumption: probability of the foreign word fj

depends only on the aligned English word eaj

),,|(*),,|(

),,|,(

)|,(

)|,(

111

111

11

11

11

11

11

111

Ijjj

Ijjj

J

j

Ijjjj

J

j

IJJ

eaffPeafaP

eafafP

eafP

EAFP

),|(),,|( 111

11

1 IaaPeafaP jjIjj

j

)|(),,|( 111

1 jajIjj

j efPeaffP

A

ajjj

J

jAj

efPIaaPIJPEAFPEFP )|(*),|(*)|()|,()|( 11

Page 34: Machine Translation

34

Computing the Alignment Probability P(aj|aj-1, I) is written as P(i|i’, I)

Assume - probability does not depend on absolute word positions but on the jump-width (i-i’) between words: P (4 | 6, 17) = P (5 | 7, 17)

Note: Denominator counts are collected over sentences of all lengths. But sum is performed over only those jump-widths relevant to (i,i’) – For I’=6: -5 to 11 is relevant

Page 35: Machine Translation

35

HMM Model - Example

A

ajjj

J

jAj

efPIaaPIJPEAFPEFP )|(*),|(*)|()|,()|( 11

P(F,A|E) = P(J=10|I=9)*P(2|start,9)*P(इसेक� |this)*P(-1|2,9) *P(लिलए|this)*P(2|1,9)*….*P(0|4,9)*P(क�जिजीए|contact)

Page 36: Machine Translation

36

Enhancing the HMM model Add NULL words in the English to which foreign

words can align Condition the alignment on the word class of the

previous English word

Other suggestions ?? What is the problem in making more realistic

assumptions How to estimate the parameters of the model

))(,,|(11 jajj eCIaaP

Page 37: Machine Translation

37

Checkpoint Generative Process is important for

computing probability expressions Model1 and HMM model What about Phrase Probabilities

Page 38: Machine Translation

38

Training Alignment Models Given a parallel corpora, for each (F,E)

learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment

probability P(ai|ai-1,I) for the HMM model

How will you compute these probabilities if all you have is a parallel corpora

Page 39: Machine Translation

39

Intuition : Interdependence of Probabilities If you knew which words are probable

translation of each other then you can guess which alignment is probable and which one is improbable

If you were given alignments with probabilities then you can compute translation probabilities

Looks like a chicken and egg problem Can you write equations expressing one in

terms of other

Page 40: Machine Translation

40

Computing Alignment Probabilities Align. Prob. In terms of

trans. Prob. :P(A,F|J,E)

Compute P(A) in terms of P(A,F) Note: Prior Prob. for all

Alignments are equal. We are interested in posterior probabilities.

Can you specify translation prob. in terms of align. prob.

J

jaj

J jeft

I

EJAFPEJAPEJFAP

1

)|(*)1(

),,|(*),|(),|,(

A

EFAP

EFAPEFAP

)|,(

)|,(),|(

Page 41: Machine Translation

41

Computing Translation probabilities

P(से�पक� | contact) = 2/6

What if alignments had probabilities

.5

.3

= (.5*1+.3*1+.9*0)/(.5*3+.3*2+.9*1)=.8/3

Note: It is not .7*1/3 + .5*1/2 + .9*0 ??

.9

Page 42: Machine Translation

42

Computing Translation Probabilities –Maximum Likelihood Estimate

f

EF A

eftcount

eftcounteft

EFAefCEFAPeftcount

)|(

)|()|(

),,|,(*),|()|(,

Page 43: Machine Translation

43

Expectation Maximization (EM) Algorithm

Used when we want maximum likelihood estimate of the parameters ofa model when the model depends on hidden variables-In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities

Init: Start with an arbitrary estimate of parametersE-step: compute the expected value of hidden variablesM-Step: Recompute the parameters that maximize the likelihood of

data given the expected value of the hidden variables from E-step

Page 44: Machine Translation

44

Working out alignments for a simplified Model 1 Ignore the NULL words Assume that every english word aligns

with some foreign word (just to reduce the number of alignments for the illustration)

Page 45: Machine Translation

45

Example of EMGreen houseCasa verde

The houseLa case

Init: Assume that any word can generate any word with equal prob:

P(la|house) = 1/3

Page 46: Machine Translation

46

E-Step

J

jaj

J jeft

I

EJAFPEJAPEJFAP

1

)|(*)1(

),,|(*),|(),|,(

E-Step:

A

EFAP

EFAPEFAP

)|,(

)|,(),|(

Page 47: Machine Translation

47

M-Step

f

EF A

eftcount

eftcounteft

EFAefCAPeftcount

)|(

)|()|(

),,|,(*)()|(,

Page 48: Machine Translation

48

E-Step again

J

jaj

J jeft

I

EJAFPEJAPEJFAP

1

)|(*)1(

),,|(*),|(),|,(

A

EFAP

EFAPEFAP

)|,(

)|,(),|( 1/3 2/3 2/3 1/3

Repeat till convergence

Page 49: Machine Translation

49

Computing Translation Probabilities in Model 1 E-M algo is fine, but it requires exponential

computation For each alignment we recompute alignment probability Translation probability is computed from all alignment

probabilities We need efficient algo

Page 50: Machine Translation

50

Page 51: Machine Translation

51

Page 52: Machine Translation

52

Page 53: Machine Translation

53

Checkpoint Use of EM algorithm for estimating phrase

probabilities under IBM Model-1 An example And an efficient algorithm

Page 54: Machine Translation

54

Generating Bi-directional Alignments Existing models only generate uni-directional

alignments Combine two uni-directional alignments to get

many-to-many bi-directional alignments

Page 55: Machine Translation

55

Eng-Hindi Alignment

छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�

Goa

|

is

a

|premier

|

beach

|

vacation

|

destination

|

Page 56: Machine Translation

56

Hindi-Eng Alignmentछु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-

तटी�य गा�तव्य है�

Goa |

is

a |premier |

beach

vacation | | |destination | |

Page 57: Machine Translation

57

Combining Alignments

छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�

Goa +

is

a +premier |

|

beach

|

vacation | |

+

destination

|

| |

P=2/3=.67, R=2/7=.3P=4/5=.8,R=4/7=.6

P=5/6=.83,R=5/7=.7P=6/9=.67,R=6/7=.85

Page 58: Machine Translation

58

A Different Heuristic from Moses-Site

GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e);

GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and

( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and

( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new )

Proposed Changes:After growing diagonalAlign the shorter sentence firstAnd use alignments only fromcorresponding directional alignment

Page 59: Machine Translation

59

Generating Phrase Alignments

छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�

Goa +

is

a +premier +

beach

+

vacation + +

+

destination + +

a premier beach vacation destinationएक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�

premier beach vacationप्रमा ख सेमा द्र-तटी�य

Page 60: Machine Translation

60

Phrase Alignment Probabilities We have been dealing with just one

sentence pair. In fact, we have been dealing with just one

alignment – the most probable alignment Such a alignment can easily have

mistakes, and generate garbage phrases Compute phrase alignment probabilities

over entire corpus

f

efcount

efcountef

),(

),()|(

Page 61: Machine Translation

61

IBM Model 3

Model 1 story seems bizarre -Who will first chose the sentence length and then align and then generate

A more likely case is - generate translation for each word and then reorder

Model 1 Generative story

Page 62: Machine Translation

62

Model 3 Generative Story

Page 63: Machine Translation

63

Model 3 Formula Ignore NULL for a

moment Choosing Fertility:

Generating words:

Aligning words:

)|(1

i

I

ii en

))|((*)!(11

ja

J

jj

I

ii eft

),,|(0:

JIajdJ

ajj

j

Page 64: Machine Translation

64

Generating Spurious Words Instead of using n(2|NULL) or n(1|NULL) With probability p1, generate a spurious word

every time a valid word is generated Ensures that longer sentences generate more

spurious words

Page 65: Machine Translation

65

Diagrams converted into pictures in next slides

Page 66: Machine Translation

66

इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए

For this you contact the farmers of Manchar region

Page 67: Machine Translation

67

इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए

For this you contact the farmers of Manchar region

Page 68: Machine Translation

68

इसेक� लिलए हिकसे�नी* से5 मिमालिलय�

For this you contact the farmers

Page 69: Machine Translation

69

इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए

For this you contact the farmers of Manchar region

Page 70: Machine Translation

70

OchNey03 Heuristic: Intution Decide the intersection Extend it by adding alignments from the

union if both the words in union alignment are not already aligned in the final alignment

Then add an alignment only if: It already has an adjacent alignment in the

final alignment, and, Adding it will not cause any final alignment to

have both horizontal and vertical neighbors as final alignments

Page 71: Machine Translation

71

SMT Example कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/

Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/

Many Bengali Poets this land of have sung poem

Several Bengali to this place ‘s sing songs

Many poets from Bangal

in this space song sung

Poets from Bangladesh

farm have sung songs

To this space have sung songs of many poets from Bangal

Page 72: Machine Translation

72

Translation Model - Notations F: f1, f2,..,fJ ; E: e1, e2,..eJ P(F|E) not same as P (f1..fJ | e1..eI)

What is P(फा�हितमा� च�वाल ख�त� है�| Fatima eats rice) P(F|E) = P (J, f1..fJ | I, e1..eI)

We explicitly mention I and J only when needed We will work with above formulation instead of the

alternative P(F|E) = P (w1(F)=f1..wJ(F)=fJ w[J+1](F)=$ | …)

फा�हितमा� च�वाल ख�त� है� Fatima eats rice

फा�हितमा� चम्माच से� च�वाल ख�त� है� Fatima eats rice with spoon