![Page 1: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/1.jpg)
A Neural Reordering Model for Phrase-based Translation
Peng Li Tsinghua University
joint work with Yang Liu, Maosong Sun, Tatsuya Izuha, Dakun Zhang
![Page 2: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/2.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
![Page 3: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/3.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
![Page 4: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/4.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
![Page 5: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/5.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
![Page 6: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/6.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush
![Page 7: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/7.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush
![Page 8: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/8.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk
![Page 9: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/9.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk
![Page 10: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/10.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
![Page 11: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/11.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
segmentation
![Page 12: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/12.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
segmentation reordering
![Page 13: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/13.jpg)
Phrase-based Translation
2
(Koehn et al., 2003; Och and Ney, 2004)
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
segmentation reordering translation
![Page 14: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/14.jpg)
Reordering is Hard
3
Chinese
President
XiJinping
and
his
Us
counterpart Barack
Obama
opentwo
days
of
talks
in
California on
anumber
of
high-stakes issues
![Page 15: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/15.jpg)
Reordering is Hard
3
Chinese
President
XiJinping
and
his
Us
counterpart Barack
Obama
opentwo
days
of
talks
in
California on
anumber
of
high-stakes issues
Q: Can you figure out a sentence using these words?
![Page 16: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/16.jpg)
Reordering is Hard
4
Chinese President Xi Jinping and his Us counterpart
Barack Obama open two days
of
talks in California
on a number
of
high-stakes issues
Q: Can you figure out a sentence using these words?
![Page 17: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/17.jpg)
Reordering is Hard• An NP-complete problem (Knight, 1999; Zaslavskiy et al., 2009)
• Reordering modeling has attracted intensive attention, e.g.
• Distance-based model (Koehn et al., 2003)
• Word-based lexicalized model (Koehn et al., 2007)
• Phrase-based lexicalized model (Tillman, 2004)
• Hierarchical phrase-based lexicalized model (Galley and Manning, 2008)
5
![Page 18: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/18.jpg)
Distance-based Model
6
(Koehn et al., 2003)
+2
d=2
布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharond=0 d=5
-5
![Page 19: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/19.jpg)
Lexicalized Models
7
(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)
M布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
![Page 20: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/20.jpg)
Lexicalized Models
8
(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)
M布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
D
![Page 21: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/21.jpg)
Lexicalized Models
9
(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)
M布什 与 沙⻰龙 举⾏行 了 会谈
Bush held a talk with Sharon
S D
![Page 22: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/22.jpg)
Challenge #1: Sparsity
10
Source Phrase Target Phrase M S D
布什 Bush 0.7 0.2 0.1
举⾏行 了 会谈 held a talk 0.1 0.1 0.8
与 沙⻰龙 with Sharon 0.7 0.1 0.2
举⾏行 了 held a 0.6 0.1 0.3
会谈 talk 0.4 0.3 0.3
![Page 23: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/23.jpg)
Challenge #1: Sparsity• Probability distributions are estimated by MLE
11
![Page 24: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/24.jpg)
Challenge #1: Sparsity• Probability distributions are estimated by MLE
11
P举⾏行 了 会谈
held a talkD =
![Page 25: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/25.jpg)
Challenge #1: Sparsity• Probability distributions are estimated by MLE
11
P举⾏行 了 会谈
held a talkD =
# D举⾏行 了 会谈
held a talk
![Page 26: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/26.jpg)
Challenge #1: Sparsity• Probability distributions are estimated by MLE
11
P举⾏行 了 会谈
held a talkD =
# D举⾏行 了 会谈
held a talk
# ●举⾏行 了 会谈
held a talk
![Page 27: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/27.jpg)
Challenge #2: Ambiguity
12
![Page 28: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/28.jpg)
沙⻰龙与布什
Challenge #3: Context Insensitivity
13
举⾏行
Bush held a talk with Sharon
S了 会谈
![Page 29: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/29.jpg)
沙⻰龙与布什
Challenge #3: Context Insensitivity
13
举⾏行
Bush held a talk with Sharon
S了 会谈
How to resolve the three challenges?
![Page 30: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/30.jpg)
Including More Contexts
14
与 沙⻰龙 举⾏行 了 会谈
held a talk
S
with Sharon
Sparsity
Ambiguity
Context Insensitivity
![Page 31: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/31.jpg)
Including More Contexts
14
与 沙⻰龙 举⾏行 了 会谈
held a talk
S
with Sharon
Sparsity
Ambiguity
Context Insensitivity✔
✔
![Page 32: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/32.jpg)
Including More Contexts
14
与 沙⻰龙 举⾏行 了 会谈
held a talk
S
with Sharon
Sparsity
Ambiguity
Context Insensitivity✔
?
✔
![Page 33: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/33.jpg)
Sparsity• Including more contexts leads to severer sparsity
15
=P举⾏行 了 会谈
held a talkD
与 沙⻰龙
with Sharon
#举⾏行 了 会谈
held a talkD
与 沙⻰龙
with Sharon
#举⾏行 了 会谈
held a talk•
与 沙⻰龙
with Sharon
![Page 34: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/34.jpg)
Sparsity• Including more contexts leads to severer sparsity
15
=P举⾏行 了 会谈
held a talkD
与 沙⻰龙
with Sharon
#举⾏行 了 会谈
held a talkD
与 沙⻰龙
with Sharon
#举⾏行 了 会谈
held a talk•
与 沙⻰龙
with Sharon
Reordering as Classification
![Page 35: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/35.jpg)
Neural Reordering Model
16
![Page 36: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/36.jpg)
Neural Reordering Model• A neural classifier for predicting reordering orientations
16
![Page 37: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/37.jpg)
Neural Reordering Model• A neural classifier for predicting reordering orientations
• Conditioned on both the current and previous phrase pairs
• Improves context sensitivity
• Reduces reordering ambiguity
16
![Page 38: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/38.jpg)
Neural Reordering Model• A neural classifier for predicting reordering orientations
• Conditioned on both the current and previous phrase pairs
• Improves context sensitivity
• Reduces reordering ambiguity
• A single classifier for all phrase pairs
• Uses vector space representations
• Alleviates the data sparsity problem
16
![Page 39: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/39.jpg)
Recursive Autoencoder (RAE)
17
aheld talk(Pollack; 1990; Socher et. al, 2011)
![Page 40: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/40.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1
(Pollack; 1990; Socher et. al, 2011)
![Page 41: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/41.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1 x2
(Pollack; 1990; Socher et. al, 2011)
![Page 42: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/42.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1 x2 x3
(Pollack; 1990; Socher et. al, 2011)
![Page 43: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/43.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1 x2 x3
y1 = f
(1)(W (1)[x1;x2] + b
(1))y1
(Pollack; 1990; Socher et. al, 2011)
![Page 44: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/44.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1 x2 x3
y1 = f
(1)(W (1)[x1;x2] + b
(1))y1
x
01 x
02
[x01;x
02] = f
(2)(W (2)y1 + b
(2))
(Pollack; 1990; Socher et. al, 2011)
![Page 45: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/45.jpg)
Recursive Autoencoder (RAE)
17
aheld talk
x1 x2 x3
y1
x
01 x
02
||x1 � x
01||2 ||x2 � x
02||2
(Pollack; 1990; Socher et. al, 2011)
![Page 46: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/46.jpg)
Recursive Autoencoder (RAE)
18
aheld talk
x1 x2 x3
y1
(Pollack; 1990; Socher et. al, 2011)
![Page 47: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/47.jpg)
Recursive Autoencoder (RAE)
18
aheld talk
x1 x2 x3
y1
y2y2 = f
(1)(W (1)[y1;x3] + b
(1))
(Pollack; 1990; Socher et. al, 2011)
![Page 48: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/48.jpg)
Recursive Autoencoder (RAE)
18
aheld talk
x1 x2 x3
y1
y2y2 = f
(1)(W (1)[y1;x3] + b
(1))
[y01;x03] = f
(2)(W (2)y2 + b
(2))y01 x
03
(Pollack; 1990; Socher et. al, 2011)
![Page 49: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/49.jpg)
Recursive Autoencoder (RAE)
18
aheld talk
x1 x2 x3
y1
y2
y01 x
03
||x3 � x
03||2
||y1 � y01||2
(Pollack; 1990; Socher et. al, 2011)
![Page 50: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/50.jpg)
Recursive Autoencoder (RAE)
18
aheld talk
x1 x2 x3
y1
y2
(Pollack; 1990; Socher et. al, 2011)
![Page 51: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/51.jpg)
Neural Classifier
19
orientations
current phrase pair previous phrase pair
![Page 52: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/52.jpg)
Neural Classifier
19
orientations
current phrase pair previous phrase pair
RAE
![Page 53: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/53.jpg)
Neural Classifier
19
orientations
current phrase pair previous phrase pair
softmax
![Page 54: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/54.jpg)
Training
20
Reordering error on predicting orientations
Reconstruction error on recovering training examples
![Page 55: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/55.jpg)
Training
20
Reordering error on predicting orientations
Reconstruction error on recovering training examples
linear interpolation
![Page 56: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/56.jpg)
Reconstruction Error• Reconstruction error
!
• Source side average reconstruction error
!
• Total reconstruction error
21
Erec([c1; c2]; ✓) =1
2||[c1; c2]� [c01; c
02]||2
Erec,s(S; ✓) =1
Ns
X
i
X
p2T ✓R(ti,s)
Erec([p.c1, p.c2]; ✓)
Erec(S; ✓) = Erec,s(S; ✓) + Erec,t(S; ✓)
![Page 57: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/57.jpg)
Reordering Error• Average cross-entropy error
!
• Joint training objective
22
E
reo
(S; ✓) =1
|S|X
i
�X
o
d
ti(o) · log(P✓
(o|ti
))
!
J = ↵Erec
(S; ✓) + (1� ↵)Ereo
(S; ✓) +R(✓)
R(✓) =�L
2||✓
L
� ✓L0 ||2 +
�rec
2||✓
rec
||2 + �reo
2||✓
reo
||2
![Page 58: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/58.jpg)
Optimization• Hyper-parameters optimization
•
• Optimized by random search (Bergstra and Bengio, 2012)
• Training objective optimization: L-BFGS
• Using backpropagation through structures to compute the gradients (Goller and Kuchler, 1996)
23
↵,�L
,�rec
,�reo
![Page 59: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/59.jpg)
Experiments• Chinese-English translation
• Training: 1.2M sentence pairs
• LM: 4-gram, 397.6M words
• Dev. set: NIST 06
• Test set: NIST 02-05, 08
• Case-insensitive BLEU
24
![Page 60: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/60.jpg)
• Baselines
• Distance-based model
• Lexicalized model
Experiments• Chinese-English translation
• Training: 1.2M sentence pairs
• LM: 4-gram, 397.6M words
• Dev. set: NIST 06
• Test set: NIST 02-05, 08
• Case-insensitive BLEU
24
× M/S/D
left/right
orientations
word-based
phrase-based
hier. phrase-based
model type
![Page 61: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/61.jpg)
M/S/D Orientations• Care about relative position and adjacency
25
与 沙⻰龙
with Sharon
举⾏行 了 会谈
held a talkBush
布什D
与 沙⻰龙
with Sharon
举⾏行 了 会谈
held a talkBush
布什S
![Page 62: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/62.jpg)
Left/Right Orientations• Only care about relative position
26
与 沙⻰龙
with Sharon
举⾏行 了 会谈
held a talkBush
布什right
与 沙⻰龙
with Sharon
举⾏行 了 会谈
held a talkBush
布什left
![Page 63: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/63.jpg)
Translation
27
![Page 64: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/64.jpg)
Translation
27
BLEU
24
26.25
28.5
30.75
33
MT06 (dev) MT02 MT03 MT04 MT05 MT08
baseline neural M/S/D neural left/right
![Page 65: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/65.jpg)
Translation
27
BLEU
24
26.25
28.5
30.75
33
MT06 (dev) MT02 MT03 MT04 MT05 MT08
baseline neural M/S/D neural left/right
![Page 66: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/66.jpg)
Translation
27
BLEU
24
26.25
28.5
30.75
33
MT06 (dev) MT02 MT03 MT04 MT05 MT08
baseline neural M/S/D neural left/right
![Page 67: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/67.jpg)
Non-Separability• The unaligned Chinese word “de” makes a big difference in
determining M/S/D orientations
28
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
M
![Page 68: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/68.jpg)
Non-Separability• The unaligned Chinese word “de” makes a big difference in
determining M/S/D orientations
28
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
M
![Page 69: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/69.jpg)
Non-Separability• The unaligned Chinese word “de” makes a big difference in
determining M/S/D orientations
28
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
M
![Page 70: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/70.jpg)
Non-Separability• The unaligned Chinese word “de” makes a big difference in
determining M/S/D orientations
29
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
D
![Page 71: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/71.jpg)
Non-Separability
30
M S D
六 万 的
60000
常住 ⼈人⼝口
resident population
六 万
60000
常住 ⼈人⼝口
resident population
![Page 72: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/72.jpg)
Non-Separability• Left/right orientations are not so sensitive to unaligned words
31
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
right
![Page 73: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/73.jpg)
Non-Separability• Left/right orientations are not so sensitive to unaligned words
32
⾦金⻔门 有 六 万 的 常住 ⼈人⼝口
Kinmen has 60000 resident population
jinmen you liu wan de changzhu renkou
right
![Page 74: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/74.jpg)
Non-Separability
33
left right
六 万 的
60000
常住 ⼈人⼝口
resident population
六 万
60000
常住 ⼈人⼝口
resident population
![Page 75: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/75.jpg)
Non-Separability
34
left rightM S D
![Page 76: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/76.jpg)
Distortion Limit
35
2 4 6 822
23
24
25
Distortion Limit
BL
EU
neural
lexicalized
![Page 77: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/77.jpg)
Word Vectors
36
BLEU
20
23.5
27
30.5
34
MT06 (dev) MT02 MT03 MT04 MT05 MT08
task-oriented vectors word2vec vectors
![Page 78: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/78.jpg)
Vector Space Representations
37
but is willing toeconomy is required to
range of services to said his visit is to
is making use of
june 18, 2001
late 2011
as detention centergroup all togethertake care of old
by june 1
and complete by end 1998
or other economicand for other
within the agencies
![Page 79: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/79.jpg)
Conclusion• We propose a neural reordering model for phrase-based translation
• It improves the context sensitivity, reduces ambiguity and alleviates the data sparsity problem
38
![Page 80: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/80.jpg)
Conclusion• We propose a neural reordering model for phrase-based translation
• It improves the context sensitivity, reduces ambiguity and alleviates the data sparsity problem
• Future work
• Train MT system and neural classifier jointly
• Develop more efficient models to leverage larger contexts
• Extend our work to syntax-based and n-gram based models
38
![Page 81: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et](https://reader034.vdocuments.mx/reader034/viewer/2022051922/600f4182399e09148c1e1781/html5/thumbnails/81.jpg)
Thanks!
39