neural monkey: a natural language processing toolkit · neural monkey: a natural language...

27
Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl , Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš Musil, Ondřej Cífka, Ondřej Bojar March 19, 2019 GTC 2019 Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

Upload: others

Post on 07-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Neural Monkey: A Natural LanguageProcessing ToolkitJindřich Helcl, Jindřich Libovický, Tom Kocmi,Dušan Variš, Tomáš Musil, Ondřej Cífka, Ondřej Bojar

March 19, 2019

GTC 2019

Charles UniversityFaculty of Mathematics and PhysicsInstitute of Formal and Applied Linguistics unless otherwise stated

Page 2: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

NLP Toolkit Overview

Why do we need NLP toolkits?• No need to re-implement everything from scratch.• Re-use of published (trained) components.• Often difficult design decisions already made.• Usually, published results indicate the reliability of the toolkit.

Neural Monkey: A Natural Language Processing Toolkit 1/17

Page 3: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

NLP Toolkit Overview

NLP research libraries and toolkits can be categorized as:• Math libraries (matrix- or tensor-level, usually symbolic)

• TensorFlow, (py)Torch, Theano.• Neural Network abstraction APIs (handle individual NN layers)

• Keras• Higher-level toolkits (work with encoders, decoders, etc.)

• Neural Monkey, AllenNLP, Sockeye• Specialized applications

• Marian or tensor2tensor for NMT, Kaldi for ASR, etc.

Neural Monkey: A Natural Language Processing Toolkit 2/17

Page 4: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Neural Monkey

• Open-source toolkit for NLP tasks• Suited for research and education• Three (overlapping) user groups considered:

• Students• Researchers• Newcomers to deep learning

Neural Monkey: A Natural Language Processing Toolkit 3/17

Page 5: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Development

• Implemented in Python 3.6 using TensorFlow• Thanks to TensorFlow GPU support using CUDA, cuDNN• Actively developed using GitHub as the main communication platform

Source code here:https://github.com/ufal/neuralmonkey

Neural Monkey: A Natural Language Processing Toolkit 4/17

Page 6: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Used in Research

• Multimodal translation(Charles University, ACL 2017, WMT 2018)

• Bandit learning(Heidelberg University, ACL 2017)

• Graph Convolutional Encoders(University of Amsterdam, EMNLP 2017)

• Non-autoregressive translation(Charles University, EMNLP 2018) e

in

Mann

schläft

auf

einem

grünen

Sofa

ineinem

grünen

Raum

.

(1)(2)(3)

Neural Monkey: A Natural Language Processing Toolkit 5/17

Page 7: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Goals

1. Code readability

2. Modularity along research concepts3. Up-to-date building blocks4. Fast prototyping

Neural Monkey: A Natural Language Processing Toolkit 6/17

Page 8: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Goals

1. Code readability2. Modularity along research concepts

3. Up-to-date building blocks4. Fast prototyping

Neural Monkey: A Natural Language Processing Toolkit 6/17

Page 9: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Goals

1. Code readability2. Modularity along research concepts3. Up-to-date building blocks

4. Fast prototyping

Neural Monkey: A Natural Language Processing Toolkit 6/17

Page 10: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Goals

1. Code readability2. Modularity along research concepts3. Up-to-date building blocks4. Fast prototyping

Neural Monkey: A Natural Language Processing Toolkit 6/17

Page 11: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Usage

• Neural Monkey experiments defined in INI configuration files• Once config is ready, run with:

neuralmonkey-train config.ini• Inference from a trained model uses a second config for data:

neuralmonkey-run config.ini data.ini

Neural Monkey: A Natural Language Processing Toolkit 7/17

Page 12: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Abstractions in Neural Monkey

• Compositional design• High-level abstractions derived from low-level ones

• (High-level) abstractions aligned with literature• Encoder, decoder, etc.

• Separation between model definition and usage• “Model parts” define the network architecture• “Graph executors” define what to compute in the TF session

Neural Monkey: A Natural Language Processing Toolkit 8/17

Page 13: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Example Use-Case: Machine Translation

• Bahdanau et al., 2015• Encoder: Bidirectional GRU• Decoder: GRU with single-layer

feed-forward attention

<s> x1 x2 x3 x4

~yi ~yi+1

h1h0 h2 h3 h4

...

+

×α0

×α1

×α2

×α3

×α4

sisi-1 si+1

+

Neural Monkey: A Natural Language Processing Toolkit 9/17

Page 14: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 15: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 16: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 17: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 18: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 19: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 20: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Simple MT Configuration Example

[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500

[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

General training configuration:[main]output="output_dir"batch_size=64epochs=20train_dataset=<train_data>val_dataset=<val_data>trainer=<my_trainer>runners=[<my_runner>]evaluation=[("target", evaluators.BLEU)]logging_period=500validation_period=5000

Loading vocabularies:[en_vocabulary]class=vocabulary.from_wordlistpath="en_vocab.tsv"

[de_vocabulary]class=vocabulary.from_wordlistpath="de_vocab.tsv"

Loading training and validation data:[train_data]class=dataset.loadseries=["source, target"]data=["data/train.en", "data/train.de"]

[val_data]class=dataset.loadseries=["source, target"]data=["data/val.en", "data/val.de"]

GRU Encoder configuration:[my_encoder]class=encoders.SentenceEncoderrnn_size=500embedding_size=600data_id="source"vocabulary=<en_vocabulary>

GRU Decoder and Attention configuration:[my_attention]class=attention.Attentionencoder=<my_encoder>state_size=500[my_decoder]class=decoders.Decoderencoders=[<my_encoder>]attentions=[<my_attention>]max_output_len=20rnn_size=1000embedding_size=600data_id="target"vocabulary=<de_vocabulary>

Trainer and runner:[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_decoder>]clip_norm=1.0

[my_runner]class=runners.GreedyRunnerdecoder=<my_decoder>output_series="target"

Neural Monkey: A Natural Language Processing Toolkit 10/17

Page 21: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Configuration of Captioning

Define how to load images:[imagenet_reader]class=readers.image_reader.imagenet_readerprefix="/home/test/data/flickr30k-images"target_width=229target_height=229zero_one_normalization=True

And replace encoder with ImageNet network:[imagenet_resnet]class=encoders.ImageNetname="imagenet"data_id="images"network_type="resnet_v2_50"spatial_layer="resnet_v2_50/block4/unit_3/bottleneck_v2/conv3"slim_models_path="tensorflow-models/research/slim"

Neural Monkey: A Natural Language Processing Toolkit 11/17

Page 22: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Sentence Classification

Keep the encoder and replace the decoder (and update the rest)

[my_classifier]class=decoders.Classifierdata_id="labels"encoders=[<my_encoder>]vocabulary=<label_vocabulary>layers=[200]

[my_runner]class=runners.PlainRunnerdecoder=<my_classifier>

[my_trainer]class=trainers.CrossEntropyTrainerdecoders=[<my_classifier>]clip_norm=1.0

Pre-trained model partsParameters of model parts can be fixed using the gradient blocking module

Neural Monkey: A Natural Language Processing Toolkit 12/17

Page 23: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Supported Features

• Recurrent encoder and decoder withattention

• Beam search decoding with modelensembling

• Deep convolutional encoder• Self-attentive encoder and decoder

(a.k.a. Transformer)• Wrappers for ImageNet networks

• Custom CNNs for image processing• ConvNets for sequence classification• Self-attentive embeddings for sentence

classification• Hierarchical attention over multiple

source sequences• Generic sequence labeler• Connectionist temporal classification

Neural Monkey: A Natural Language Processing Toolkit 13/17

Page 24: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Console Logging during Training

Neural Monkey: A Natural Language Processing Toolkit 14/17

Page 25: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Scalar Values in TensorBoard

Losses, evaluation metrics, parameter norms, histograms of gradientsNeural Monkey: A Natural Language Processing Toolkit 15/17

Page 26: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Attention in TensorBoard

Neural Monkey: A Natural Language Processing Toolkit 16/17

Page 27: Neural Monkey: A Natural Language Processing Toolkit · Neural Monkey: A Natural Language Processing Toolkit Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Dušan Variš, Tomáš

Neural Monkey: A Natural Language Processing Toolkit

SummaryNeural Monkey is:

• Actively developing open-source GitHub project• Suited for researchers, students, and other DL enthusiasts• Collection of features from across the NLP sub-topics• Simple to use because of clear and readable config files• Highly modular, therefore relatively easy to debug

https://github.com/ufal/neuralmonkey