#03-2019-1012 nmt...

62
10.11.19 09:43 Neural Machine Translation 2 CMPT 413/825, Fall 2019 Jetic Gū School of Computing Science 12 Nov. 2019

Upload: others

Post on 23-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

10.11.19 09:43

Neural Machine Translation2 CMPT 413/825, Fall 2019

Jetic GūSchool of Computing Science

12 Nov. 2019

Page 2: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Overview• Focus: Neural Machine Translation

• Architecture: Encoder-Decoder Neural Network

• Main Story:

• Extension to Seq2Seq: Copy Mechanism

• Extension to Seq2Seq: Ensemble

• Extension to Seq2Seq: BeamSearch

• [Extra] Beyond Seq2Seq: Attention is all you need

• [Extra] Beyond NMT

2

Page 3: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

DecoderEncoder

Sequence-to-Sequence

Review

Pr(E |F) = ΠtPr(e′ = et |F, e<t)CLM

?

RNN

<EOS>

what

RNN

did

did

RNN

prof.

prof.

RNN

sarkar

sarkar

RNN

eat

eat

RNN

?

<SOS>

RNN

what

RNNRNN RNN RNN RNN RNN

?

RNN

was

RNN

hat

RNN

prof.

RNN

sarkar

RNN

gegessen

RNN

P0 Review

Page 4: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Attention

Review

!

RNN RNN RNN

<SOS> prof likes

prof likes cheese

RNN

cheese

<EOS>! ! !

RNN

Encoder

RNNRNN RNN RNN RNN RNN

?

RNN

was

RNN

hat

RNN

prof.

RNN

sarkar

RNN

gegessen

RNN

Decoder

P0 Review

Page 5: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Attention

Review

!

Frau

in

Gelb

mit

einem

Kinderwagen

Mann

mit

blauem

Hemd

und

ein

Sour

ce (G

erm

an)

wom

an in

shirt

with 1a

stro

ller

1and 2a

man in 3a

blue

shirt

Prediction (English)

gold

P0 Review

Page 6: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Attention• Why does Attention work?

• What is in the context vector?

• What is in ?

• It’s alignment1 !

• learns to refer to useful information in src2

• similar to human attention: we pay attention to whatever is needed

α

Review

P0 XXXXXXX !

contextt = ∑i

αt,ihenci

scoret,i = f(henci , hdec

t )αt = softmax(scoreti)

1. CL2015008 [Bahdanau et al.] Neural Machine Translation by Jointly Learning to Align and Translate 2. CL2017342 [Ghader et Monz] What does Attention in Neural Machine Translation Pay Attention to?

P0 Review

Page 7: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Common Problems of NMT

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

Review

P0 XXXXXXX

P0 Review

Page 8: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

OOV

Review

P0 XXXXXXX

1. https://en.wikipedia.org/wiki/Zipf%27s_law

P0 Review

Zipf’s Law

Occ

urre

nce

Cou

nt

1

10

100

1000

10000

100000

1000000

Frequency Rank1 25000 50000

• rare word are exponentially less frequent than frequent words

• e.g. in HW4, 45%|65% (src|tgt) of the unique words occur once

Page 9: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Common Problems of NMT

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

• Under translation

• Crucial information are left untranslated; premature generation

Review

P0 XXXXXXX

P0 Review

<EOS>

Page 10: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

• Under translation

• Crucial information are left untranslated; premature generation

Under-trans<EOS>

Review

P0 XXXXXXX

P0 Review

!

RNN

<EOS>

nobody

RNN

expects

expects

RNN

the

the

RNN

spanish

spanish

RNN

inquisition

inquisition

RNN

!

<SOS>

RNN

nobody

<EOS>

Page 11: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

• Under translation

• Crucial information are left untranslated; premature generation

Under-trans<EOS>

Review

P0 XXXXXXX

P0 Review

!

RNN

<EOS>

nobody

RNN

expects

expects

RNN

the

the

RNN

spanish

spanish

RNN

inquisition

inquisition

RNN

!

<SOS>

RNN

nobody

'inquisition': 0.49 <EOS>: 0.51

<EOS>

Page 12: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

• Under translation

• Crucial information are left untranslated; premature generation<EOS>

Under-trans<EOS>

Review

P0 XXXXXXX

P0 Review

!

RNN

<EOS>

nobody

RNN

expects

expects

RNN

the

the

RNN

spanish

spanish

RNN

inquisition

inquisition

RNN

!

<SOS>

RNN

nobody <EOS>

'inquisition': 0.49 <EOS>: 0.51

Page 13: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Common Problems of NMT

Review

P0 XXXXXXX

P0 Review

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Frequent word are translated correctly, rare words are not

• In any corpus, word frequencies are exponentially unbalanced

• Under translation

• Crucial information are left untranslated; premature generation<EOS>

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Under translation

Page 14: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Common Problems of NMT

Review

P0 XXXXXXX

P0 Review

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Copy Mechanisms

• Char-level Encoder (oops for logogram, e.g. Chinese)

• Under translation

• Ensemble

• Beam search

• Coverage models

• Out-of-Vocabulary (OOV) Problem; Rare word problem

• Under translation

Page 15: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

Page 16: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

<UNK>

prof

liebt

kebab

<UNK>

1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

Page 17: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

<UNK>

prof

liebt

kebab

<UNK>

αt,1 = 0.1

αt,2 = 0.2

αt,3 = 0.7

1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

Page 18: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

<UNK>

prof

liebt

kebab

<UNK>

αt,1 = 0.1

αt,2 = 0.2

αt,3 = 0.7

kebab

1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

Page 19: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

! !

RNN

prof

liebt

kebab

<UNK>

αt,1 = 0.1

αt,2 = 0.2

αt,3 = 0.7

kebab

• Sees <UNK> at step

• looks at attention weight

• replace <UNK> with the source word

t

αt

fargmaxiαt,i

1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

Page 20: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

*Pointer-Generator Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

<UNK>

prof

liebt

kebab

<UNK>

αt,1 = 0.1

αt,2 = 0.2

αt,3 = 0.7

kebab

Page 21: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

*Pointer-Generator Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words

RNN

<UNK>

!

pg

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

<UNK>

prof

liebt

kebab

<UNK>

αt,1 = 0.1

αt,2 = 0.2

αt,3 = 0.7

kebab

• binary classifier switch • use decoder output • use copy mechanism

Page 22: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

*Pointer-Generator Copy Mechanism

Concep

t

P0 XXXXXXX

P1 Copy

1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words

RNN

<UNK>

!

pg

• Sees <UNK> at step , or

• looks at attention weight

• replace <UNK> with the source word

t pg([hdect ; contextt]) ≤ 0.5

αt

fargmaxiαt,i

Page 23: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

*Pointer-Based Dictionary Fusion

Concep

t

P0 XXXXXXX

P1 Copy

1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation

RNN

<UNK>

!

pg

• Sees <UNK> at step , or

• looks at attention weight

• replace <UNK> with translation of the source word

t pg([hdect ; contextt]) ≤ 0.5

αt

dict( fargmaxiαt,i)

Page 24: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

• Similar to voting mechanism, but with probabilities

• multiple models of different parameters (usually from different checkpoints of the same training instance)

• use the output with the highest probability across all models

Concep

t

P0 XXXXXXX

P2 Ensemble

Page 25: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

!

RNN RNN RNN

<SOS> prof loves

prof loves cheese

RNN

cheese

<EOS>

! ! !

RNN

Page 26: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNNRNN RNN RNN

prof loves cheese

RNN

<EOS>

RNNRNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

Page 27: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNNRNN RNN RNN

prof loves cheese

RNN

<EOS>

RNNRNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

seq2seq_E049.pt

Page 28: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN cp2 cp2

prof loves cheese

cp2

<EOS>

cp2

RNN cp3 cp3

prof loves pizza

cp3

<EOS>

cp3

RNN cp1 cp1

prof loves kebab

cp1

<EOS>

cp1

seq2seq_E049.pt

seq2seq_E048.pt

seq2seq_E047.pt

Page 29: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN cp2 cp2

prof loves cheese

cp2

<EOS>

cp2

RNN cp3 cp3

prof loves pizza

cp3

<EOS>

cp3

RNN cp1 cp1

prof loves kebab

cp1

<EOS>

cp1

seq2seq_E049.pt

seq2seq_E048.pt

seq2seq_E047.pt

'kebab': 0.9 'cheese': 0.05 'pizza': 0.05

……

'kebab': 0.3 'cheese': 0.55 'pizza': 0.05

……

'kebab': 0.3 'cheese': 0.05 'pizza': 0.55

……

Page 30: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN cp2 cp2

prof loves cheese

cp2

<EOS>

cp2

RNN cp3 cp3

prof loves pizza

cp3

<EOS>

cp3

RNN cp1 cp1

prof loves kebab

cp1

<EOS>

cp1

seq2seq_E049.pt

seq2seq_E048.pt

seq2seq_E047.pt

'kebab': 0.9 'cheese': 0.05 'pizza': 0.05

……

'kebab': 0.3 'cheese': 0.55 'pizza': 0.05

……

'kebab': 0.3 'cheese': 0.05 'pizza': 0.55

……

Page 31: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

'kebab': 0.3 'cheese': 0.55 'pizza': 0.05

……

'kebab': 0.3 'cheese': 0.05 'pizza': 0.55

……

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN cp2 cp2

prof loves cheese

cp2

<EOS>

cp2

RNN cp3 cp3

prof loves pizza

cp3

<EOS>

cp3

RNN cp1 cp1

prof loves kebab

cp1

<EOS>

cp1

seq2seq_E049.pt

seq2seq_E048.pt

seq2seq_E047.pt

'kebab': 0.9 'cheese': 0.05 'pizza': 0.05

……

kebab

kebab

Page 32: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Ensemble

Demo

P0 XXXXXXX

P2 Ensemble

RNN cp2 cp2

prof loves cheese

cp2

<EOS>

cp2

RNN cp3 cp3

prof loves pizza

cp3

<EOS>

cp3

RNN cp1 cp1

prof loves kebab

cp1

<EOS>

cp1

seq2seq_E049.pt

seq2seq_E048.pt

seq2seq_E047.pt

kebab

kebab

Page 33: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

BeamSearch

• Consider multiple hypothesis at each step , formulate decoding as a search programme on a tree

t

Concep

t

P0 XXXXXXX

P3 BeamSearch

Page 34: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

BeamSearch

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

Page 35: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

johnterryeric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

Page 36: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

terry

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

P = 0.5

P = 0.4

P = 0.09

P = 0.01

Page 37: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

terry

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

P = 0.5

P = 0.4

P = 0.09

P = 0.01

Page 38: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

P = 0.5

P = 0.4

P = 0.09

Page 39: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof cheese

RNN

<EOS>

RNN

RNN

RNN

Page 40: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

wants

loves

meets

kicks

is

has

Page 41: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

wants

loves

meets

kicks

is

has

Page 42: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

wants

loves

meets

kicks

is

has

Page 43: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

Page 44: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

RNN

RNN

Page 45: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

RNN

RNN

pizza

kebab

Page 46: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

RNN

RNN

pizza

kebab

RNN

RNN

<EOS>

<EOS>

Page 47: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

RNN

RNN

pizza

kebab

RNN

RNN

<EOS>

<EOS>

Page 48: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN RNN RNN

prof loves cheese

RNN

<EOS>

RNN

RNN

RNN

likes

loves

RNN

RNN

pizza

kebab

RNN

RNN

<EOS>

<EOS>

Output: "prof loves cheese"

Page 49: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

john

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN

prof loves cheese <EOS>

<SOS>

likes

loves

pizza

kebab

<EOS>

<EOS>

Output: "prof loves cheese"

Page 50: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN

prof loves cheese <EOS>

<SOS>

loves

pizza

kebab

<EOS>

<EOS>

Output: "prof loves cheese"

Page 51: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN

prof loves cheese <EOS>

<SOS>

loves

pizza

kebab

<EOS>

<EOS>

Output: "prof loves cheese"

P(E |F)

P(E |F)

P(E |F)

Page 52: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

eric

BeamSearch (Size=3)

Demo

P0 XXXXXXX

P3 BeamSearch

RNN

prof loves cheese <EOS>

<SOS>

loves

pizza

kebab

<EOS>

<EOS>

Output: "prof loves cheese"

P('prof loves cheese <EOS>' |F)

P('eric loves kebab <EOS>' |F)

P('eric loves pizza <EOS>' |F)

Page 53: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

BeamSearch

• Consider multiple hypotheses at each step , formulate decoding as a search programme on a tree

• Consider multiple complete translation hypotheses, but reduces the total search space from to

t

|VE |m

Concep

t

P0 XXXXXXX

P3 BeamSearch

Page 54: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

BeamSearch

• Consider multiple hypotheses at each step , formulate decoding as a search programme on a tree

• Consider multiple complete translation hypotheses, but reduces the total search space from to

t

|VE |m

Concep

t

P0 XXXXXXX

P3 BeamSearch

batchSize × m

Page 55: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Self-Attentive NMT

Review

P0 XXXXXXX

Extra1 Transformer

RNN

Encoder Representations

I like cheese

score calculation

!

Encoder Representations

I like cheese

x x x+ + =

DecoderRNNState

α1 α2 α3;[ ]ht

• Aggregating src information , with as query

henc1:|F|

hdect

Page 56: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Self-Attentive NMT

Review

P0 XXXXXXX

Extra1 Transformer

• Aggregating src information , with as query

• : aggregating information , with as query

• : aggregating information , with as query

• Can we replace the RNN with attention?

henc1:|F| hdec

t

h enc f<i fi

h enc f>i fi

Page 57: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Self-Attentive NMT

• Transformer1

• Encoder-Decoder

• No RNN -> all RNNs in Seq2Seq replaced by identical self-attention blocks

• Multi-headed attention blocks*

• read the paper for more information

Concep

t

P0 XXXXXXX

Extra1 Transformer

1. CL2017244 [Vaswani et al.] Attention Is All You Need

Page 58: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Self-Attentive NMT

• Does everything Seq2Seq does

• BERT: Bidirectional Encoder Representation of Transformer

Concep

t

P0 XXXXXXX

Extra1 Transformer

1. CL2017244 [Vaswani et al.] Attention Is All You Need 2. CL2018460 [Devlin et al.] BERT Pre-training of Deep Bidirectional Transformers for Language Understanding

Page 59: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Beyond NMT

• NMT weaknesses

• Longer src sentences -> SMT does it better, augment attention?

• Growing lexicon/Terminologies -> Pointer-based Dictionary Fusion

• Sensitive domain -> offline computing, human-involved system to ensure accuracy

• Mobile Platform deployment -> model compression

Concep

t

P0 XXXXXXX

Extra2 Beyond NMT

Page 60: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

Beyond NMT

• NMT weaknesses

• Longer src sentences -> SMT does it better, augment attention?

• Growing lexicon/Terminologies -> Pointer-based Dictionary Fusion

• Sensitive domain -> offline computing, human-involved system to ensure accuracy

• Mobile Platform deployment -> model compression

Concep

t

P0 XXXXXXX

Extra2 Beyond NMT

HYBRID

Page 61: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

PURISM FOREVER!

MAKE NMT GREAT AGAIN

ATTENTION IS ALL YOU

NEED

HYBRIDS ARE DUMB

Beyond NMT

Concep

t

P0 XXXXXXX

Extra2 Beyond NMT

HYBRID

Page 62: #03-2019-1012 NMT Lecture2anoopsarkar.github.io/nlp-class/assets/slides/03-2019-1012_NMT_Lecture2.pdf · P0 XXXXXXX P1 Copy 1. CL2019331 [Gū et al.] Pointer-based Fusion of Bilingual

My work here is done, thank you.