long-short term memory network - wordpress.com · 2017-11-07 · long-short term recurrent networks...
TRANSCRIPT
Long-Short Term Memory Network
Hien Van Nguyen
University of Houston
11/6/2017
Why recurrent networks?
• Sequential input, next state depends on previous state
• Generalize to input with variable length
• Consider smaller chunk fewer parameters in model
11/7/2017 Machine Learning 2
What is sequence?
11/7/2017 Machine Learning 3
Source: https://uvadlc.github.io/lectures/lecture8.pdf
One-hot vector
11/7/2017 Machine Learning 4
Recurrent networks
11/7/2017 Machine Learning 5
Unroll through time
Recurrent networks
11/7/2017 Machine Learning 6
Unroll through time
Simple recurrent network
• Linear activation
• Gradient:
• 𝑇𝑇 is the number of timestepsconsidered
11/7/2017 Machine Learning 7
Problem of Vanishing/Exploding Gradient
• Review of chain rule
• Apply chain rule:
11/7/2017 Machine Learning 8
How change in V at step k will affect loss at step t
On the difficulty of training recurrent networks https://arxiv.org/pdf/1211.5063.pdf
Problem of Vanishing/Exploding Gradient
• Recall that:
• Using chain rule:
11/7/2017 Machine Learning 9
Problem of Vanishing/Exploding Gradient
11/7/2017 Machine Learning 10
Long-Short Term Recurrent Networks (LSTM)
• Idea: Don’t multiply Multiplication == Vanishing gradients
Instead of multiplying previous hidden state by a matrix to get new state
we add something to old hidden state and get new state (not called “hidden state” but “cell” in LSTM language, explained next)
11/7/2017 Machine Learning 11
Long-Short Term Recurrent Networks (LSTM)
• Intuition:Not everything is useful to rememberNot every input is useful to takeNot necessary to output each instance
11/6/2017 Machine Learning 12
Long-Short Term Recurrent Networks (LSTM)
• Comparison of vanilla RNN and LSTM
11/7/2017 Machine Learning 13
Vanilla RNN
LSTM
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Short Term Recurrent Networks (LSTM)
• Comparison of vanilla RNN and LSTM
11/7/2017 Machine Learning 14
Vanilla RNN
LSTM
LSTM-Step by Step
11/7/2017 Machine Learning 15
LSTM-Step by Step
11/7/2017 Machine Learning 16
LSTM-Step by Step
11/7/2017 Machine Learning 17
LSTM-Step by Step
11/7/2017 Machine Learning 18
LSTM-Step by Step
11/7/2017 Machine Learning 19
Long-Short Term Recurrent Networks (LSTM)
• Comparison of vanilla RNN and LSTM
11/7/2017 Machine Learning 20
Vanilla RNN
LSTM
LSTM-Gradient Flow
11/7/2017 Machine Learning 21
Learning sequence representation:https://d-nb.info/1082034037/34
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
LSTM-Gradient Flow
11/7/2017 Machine Learning 22
Applications – Machine Translation
11/7/2017 Machine Learning 23
Source: https://uvadlc.github.io/lectures/lecture8.pdf
Applications – Machine Translation
11/7/2017 Machine Learning 24
Google Pixel Buds
Applications – Image Captioning
11/7/2017 Machine Learning 25
Applications – Question Answering
11/7/2017 Machine Learning 26
Applications – Visual Question Answering
11/7/2017 Machine Learning 27
Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Applications – Visual Question Answering
11/7/2017 Machine Learning 28