sequence to sequence models for machine translation · •training/learning •how to estimate...

20
Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig

Upload: others

Post on 01-Aug-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Sequence to Sequence Modelsfor Machine TranslationCMSC 723 / LING 723 / INST 725

Marine Carpuat

Slides & figure credits: Graham Neubig

Page 2: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Machine Translation

• Translation system• Input: source sentence F

• Output: target sentence E

• Can be viewed as a function

• Statistical machine translation systems

• 3 problems

• Modeling• how to define P(.)?

• Training/Learning• how to estimate parameters from

parallel corpora?

• Search• How to solve argmax efficiently?

Page 3: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Practical tricks

• Sequence to sequence models for other NLP tasks

Page 4: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A feedforward neural 3-gram model

Page 5: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A recurrent language model

Page 6: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A recurrent language model

Page 7: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Examples of RNN variants

• LSTMs• Aim to address vanishing/exploding gradient issue

• Stacked RNNs

• …

Page 8: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: online

Page 9: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: batch

Page 10: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: minibatch

• Compromise between online and batch

• Computational advantages• Can leverage vector processing instructions in modern hardware

• By processing multiple examples simultaneously

Page 11: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Problem with minibatches: in language modeling, examples don’t have the same length

• 3 tricks• Padding

• Add </s> symbol to make all sentences same length

• Masking• Multiply loss function calculated

over padded symbols by zero

• + sort sentences by length

Page 12: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Training tricks

• Sequence to sequence models for other NLP tasks

Page 13: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Encoder-decoder model

Page 14: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Encoder-decoder model

Page 15: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Generating Output

• We have a model P(E|F), how can we generate translations?

• 2 methods

• Sampling: generate a random sentence according to probability distribution

• Argmax: generate sentence with highest probability

Page 16: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Ancestral Sampling

• Randomly generate words one by one

• Until end of sentence symbol

• Done!

Page 17: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Greedy search

• One by one, pick single highest probability word

• Problems• Often generates easy words first

• Often prefers multiple common words to rare words

Page 18: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Greedy SearchExample

Page 19: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Beam SearchExample with beam size b = 2

We consider b top hypotheses at each time step

Page 20: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Practical tricks

• Sequence to sequence models for other NLP tasks