30th annual conference on - inriadeeploria.gforge.inria.fr/intranet/nips16review.pdf ·...
TRANSCRIPT
![Page 1: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/1.jpg)
LE Thien Hoa
30th Annual Conference on
Neural Information Processing Systems
NIPS 2016, Barcelona
![Page 2: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/2.jpg)
Topics
• Deep Reinforcement Learning & Robotics
• Generative Adversarial Network
• RNN variants
• Meta-learning
• Neuroscience
• Optimization
• Machine Learning
• Natural Language Processing
• …
![Page 3: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/3.jpg)
In this talk
• Nuts and Bolts of Applying Deep Learning
• RNN variants & limitations
• Natural Language Processing
![Page 4: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/4.jpg)
Nuts and Bolts of Applying Deep Learning Source: Andrew Ng, NIPS 2016
![Page 5: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/5.jpg)
End-to-End Deep Learning
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Effective when works with
Big Data
![Page 6: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/6.jpg)
End-to-End Deep Learning (2)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Suppress pre-processing steps
to have End-to-End learning
![Page 7: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/7.jpg)
Bias – Variance Tradeoff
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Divide Dev to Train-Dev & Test-Dev
![Page 8: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/8.jpg)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Bias – Variance Tradeoff (2)
![Page 9: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/9.jpg)
Bias – Variance Tradeoff (3)
Source: Andrew Ng, NIPS 2016
Human error: 1%
2% Train error
Dev error: 10%
8% Train error
Not Overfitting
Bias
Overfitting
Good
![Page 10: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/10.jpg)
Bias – Variance Tradeoff (4)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
![Page 11: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/11.jpg)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Human Level Performance
• Typical human: 5%
• General doctor: 1%
• Specialized doctor: 0.8%
• Group of specialized doctors: 0.5%
Deep Learning models tend to plateau once they have
reached or surpassed human-level accuracy
![Page 12: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/12.jpg)
RNN variants & limitations
![Page 13: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/13.jpg)
RNN & LSTM
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Learn “long-term dependencies”
Core components in many AI’s application
![Page 14: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/14.jpg)
Fastweight RNN
Source: Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu.
Using Fastweight to Attend to the Recent Past. NIPS 2016
Using Fastweight
to Attend to the Recent Past
![Page 15: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/15.jpg)
Phased LSTM
Source: Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu.
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. NIPS 2016
Accelerating Recurrent Net Training
for Long or Event-based Sequences
![Page 16: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/16.jpg)
Quasi-RNN
Source: James Bradbury, Stephen Merity, Caiming Xiong & Richard Socher
Quasi-Recurrent Neural Networks. Under review to ICLR 2017
Use Convolution & Pooling to mimic Recurrent Layer,
which allows parallelism
16x times faster & better predictive accuracy
than stacked LSTMs of the same hidden size
![Page 17: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/17.jpg)
WaveNet
(CNN model)
Source: Aaron van den Oord et al.
WaveNet: A Generative Model for Raw Audio
Deep generative model
of raw audio waveforms
(16000 samples / second or
more, with important structure
at many time-scales)
Sounds more natural than
the best existing Text-to-Speech
systems
![Page 18: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/18.jpg)
RNN with Stochastic Layers
Source: Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther
Sequential Neural Models with Stochastic Layers. NIPS 2016
Extend the modeling capabilities of
RNN by combining them with
nonlinear state space models
Able to track the factorization of the
model’s posterior distribution
![Page 19: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/19.jpg)
Learning to Learn
Source: Marcin Andrychowicz, Misha Denil et al
Learning to learn by gradient descent by gradient descent.
NIPS 2016
LSTM as a cure to
automatic learning optimization
![Page 20: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/20.jpg)
Natural Language Processing
![Page 21: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/21.jpg)
Machine Translation
Source: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi et al
Google’s Neural Machine Translation System: Bridging the Gap
between Human and Machine Translation
Google replace traditional MT by LSTM
![Page 22: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/22.jpg)
Zero-Shot Translation
Source: Melvin Johnson, Mike Schuster, Quoc V. Le et al
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Benefits: exploit Transfer Learning
across different languages
![Page 23: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/23.jpg)
Multitasking
Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop
Construct Deep Model by
Hierarchical Linguistic Structure
![Page 24: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/24.jpg)
Multitasking (2)
Source: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa
Natural language processing (almost) from scratch. JMLR 2011
Share Embedding Space
Free to choose the Depth Strucutre
![Page 25: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/25.jpg)
Multitasking (3)
Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop
![Page 26: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/26.jpg)
Multiplicative Interaction
Source: Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Gated-Attention Readers for Text Comprehension. Under review to ICLR 2017
Gated-Attention
Multiplicative Operation
Performance of
different gating functions
on WDW dataset
![Page 27: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/27.jpg)
Words or Characters?
Source: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Words or Characters? Fine-grained Gating for Reading Comprehension. Under review to ICLR 2017
![Page 28: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/28.jpg)
Extreme case: Rare words
Source: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
Pointer Sentinel Mixture Models. Workshop NIPS 2016
RNN struggle to predict rare words on Language Modeling task
Pointer sentinel mixture architecture:
ability to either reproduce a word from the recent context
or produce a word from a standard softmax classifier
![Page 29: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot](https://reader033.vdocuments.mx/reader033/viewer/2022042417/5f334488898273094a5bda68/html5/thumbnails/29.jpg)
Thank you for your attention