exact decoding of phrase-based translation models through lagrangian relaxation #emnlpreading

21
Exact Decoding of Phrasebased Translation Models through Lagrangian Relaxation YinWen Chang (MIT), Michael Collins (Columbia University) EMNLP 2011 reading

Upload: yoh-okuno

Post on 11-Jun-2015

3.460 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Exact  Decoding  of  Phrase-­‐based  Translation  Models  through  

Lagrangian  Relaxation Yin-­‐Wen  Chang  (MIT),  

Michael  Collins  (Columbia  University)  

EMNLP  2011  reading  

Page 2: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

About  the  presenter

•  Name:  Yoh  Okuno  

•  Software  Engineer  at  Web  company  

•  Interest:  NLP,  Machine  Learning,  Data  Mining  

•  Skill:  C/C++,  Python,  Hadoop,  etc.  

•  Weblog:  http://d.hatena.ne.jp/nokuno/  

Page 3: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Decoding  in  Phrase-­‐based  SMT

•  Decoding  in  SMT  is    NP-­‐Hard  

– Approximate  search:  beam  search  

– Exact  search:  ILP(Integer  Linear  Programming)  

•  Propose  adoption  of  Lagrangian  relaxation  

and  efficient  dynamic  programming

Page 4: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Phrase-­‐based  SMT  Model

•  Reordering  makes  the  problem  complicated  

•  Use  3-­‐gram  language  model

f(y) = h(e(y)) +L�

k=1

g(pk) +L−1�

k=1

ηδ(t(pk), s(pk+1))

LM Translation Distortion y =< p1p2...pL >pk = (s, t, e)δ(t, s) = |t+ 1− s|

output: phrase: distortion:

η  :  negative  constant x:  input  sentence

Page 5: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Decoding  with  constraints

•  Our  purpose:  solve  

•  Define  y(i)  =  #  of  x_i  is  translated  in  y

1.  Each  word  in  the  input  is  translated  exactly  

once:  y(i)  =  1  for  all  i  

2.  Distortion  limit:    

argmaxy∈Y

f(y)

δ(t(pk), s(pk+1)) < d

Page 6: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Exact  dynamic  programming

•  Use  states:  

•  w1,  w2:  trigram  context  words  

•  b:  bit  string  which  input  words  are  translated  

•  r:  end  position  of  the  previous  phrase

(w1, w2, b, r)

Page 7: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Exact  dynamic  programming •  Yet  it  is  intractable

Page 8: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Decoding  based  on  Lagrangian  Relaxation

•  Consider  broader  set  of    Y  

•  Y’  use  looser  constraint:  

•  That  means,  N  words  are  translated

argmaxy∈Y �

f(y)

N�

i=1

y(i) = N

Page 9: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Efficient  Dynamic  Programming

•  Use  states:  

– or  

•  n:  number  of  translated  words  

•  (l,m):  range  of  previous  translated  words  

•  Transition  as  one  phrase  translation  

(w1, w2, n, r)

(w1, w2, n, l,m, r)

pk = (s, t, e)

Page 10: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Applying  Lagrangian  Relaxation •  Solve  relaxed  problem  +  constraints  

•  Apply  Lagrangian  method  

•  Dual  objective  and  dual  problem:    

argmaxy∈Y �

f(y) such that ∀ i, y(i) = 1

L(u, y) = f(y) +�

i

u(i)(y(i)− 1)

minu

L(u) = minu

maxy∈Y �

L(u, y)

Page 11: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Decoding  by  subgradient  method

Page 12: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Intuitive  interpretation •  Lagrange  multiplier  u(i)  penalizes  or  rewards  

input  word  i  to  be  translated  exactly  once  

•  Update:  – Declease  u(i)  if  y(i)  >  1,  

–  Inclease  u(i)  if  y(i)  =  0  

– Do  nothing  if  y(i)  =  1

ut(i) = ut−1(i)− αt(yt(i)− 1)

Page 13: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Input:    dadurch  konnen  die  qualit  ¨  at  und  die  regelm  ¨  aßige  postzustellung  auch  weiterhin  sichergestellt  werden  .

the  quality  and  also  the  and  the  quality  and  also  the  regular  will  continue  to  be  continue  to  be  continue  to..    in  that  way,  and  can  thus  quality  in  that  way,  the  qualit  and..    can  the  regular  distribution  should  also  ensure  distribution..    the  regular  and  regular    and  regular  the  quality  and  the  ..    in  that  way,  the  quality  of  the  quality  of  the  distribution...  

output:  in  that  way,  the  quality  and  the  regular  distribution  should  continue  to  be  guaranteed.

Page 14: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Experimental  summary

•  Language:  German  to    English  translation  

•  Corpus:  Europarl  data  (1,824  sentence)  

•  Proposed  method  finds  exact  solutions  on  99%  

•  Average  run  time  is  120  seconds  

•  Moses  makes  search  errors  of  4  to  18%

Page 15: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Table  1:  iteration  and  conversion

•  97%  of  the  examples  converge  within  120  iter.

Page 16: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Table  4:  ILP/LP  are  too  slow

Page 17: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Table  5:  Moses  search  errors

Page 18: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Table  7:  BLUE  doesn’t  improveL

Page 19: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Conclusion •  Described  an  exact  decoding  algorithm  for  

SMT  using  Lagrangian  relaxation  

•  Proposed  method  finds  exact  solutions  on  

99%  samples  within  120  seconds  in  average  

•  Future  work:  apply  Lagrangian  relaxation  to  

training  algorithms  for  SMT

Page 20: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Any  Question?

Page 21: Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation #emnlpreading

Transition  for  DP

•  Define  transition  as  one  phrase  translation  

(w1, w2, n, l,m, r) −→ (w�1, w

�2, n

�, l�,m�, r�)pk = (s, t, e)

(w�1, w

�2) = (eM−1, eM ) if M > 1

(w2, e1) if M = 1n� = n+ t− s+ 1