lstm: a search space odyssey
TRANSCRIPT
![Page 1: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/1.jpg)
LSTM: A Search Space Odyssey
Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutnยดฤฑk, Bas R. Steunebrink, Jยจurgen Schmidhuber
![Page 2: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/2.jpg)
Outlines
โข Introduction
โข Long Short-Term Memory (LSTM) with peephole connections
โข Experiment and discussion
โข Conclusion
![Page 3: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/3.jpg)
Definition:
โข Recurrent Neural Networks
โข Importance and its applications
โข Gradient problem
โข Vanishing gradient
โข Exploding gradient
โข What is the LSTM?
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 4: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/4.jpg)
LSTM History:
โข LSTM was proposed in 1997 by Sepp Hochreiter and Jรผrgen Schmidhuber.
โข In 1999, Felix Gers and Jรผrgen Schmidhuber and Fred Cummins introduced the
forget gate into LSTM architecture.
โข In 2000, Gers & Schmidhuber & Cummins added peephole connections
โข In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated
recurrent unit
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 5: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/5.jpg)
Simple RNN
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 6: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/6.jpg)
Block diagram
โข Three gates:โข Input gate
โข Forget gate
โข Output gate
โข Two blocks:โข Block input
โข Block output
โข One cell state:โข cell state
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 7: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/7.jpg)
Block Diagram
Block input:
๐๐๐ง๐ง: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐ง๐ง: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐ง๐ง: bias weight
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
z
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 8: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/8.jpg)
Block Diagram
Input gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
i
๐๐๐ก๐กโ1
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 9: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/9.jpg)
Block Diagram
Forget gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
f
๐๐๐ก๐กโ1
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 10: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/10.jpg)
Block Diagram
Output gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
o
๐๐๐ก๐ก
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 11: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/11.jpg)
Block Diagram
State cell:๐ง๐ง๐ก๐ก: the output of block input at time t
๐๐๐ก๐ก: the output of input gate at time t
๐๐๐ก๐กโ1: the output of cell state at time
t-1
๐๐๐ก๐ก: output of forget gate at time t
๐๐๐ก๐กโ1
๐๐๐ก๐ก
๐ง๐ง๐ก๐ก
๐๐๐ก๐กโ1
๐๐๐ก๐ก
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 12: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/12.jpg)
Block Diagram
Block output:๐๐๐ก๐ก: the output of output gate at time t
๐๐๐ก๐ก: state cell at time tInput
Recurrent
y
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 13: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/13.jpg)
LSTM Variants
โข NIG: No Input Gate: ๐๐๐ก๐ก = 1
โข NFG: No Forget Gate: ๐๐๐ก๐ก = 1
โข NOG: No Output Gate: ๐๐๐ก๐ก = 1
โข NIAF: No Input Activation Function: g(x) = x
โข NOAF: No Output Activation Function: h(x) = x
โข CIFG: Coupled Input and Forget Gate: ๐๐๐ก๐ก = 1- ๐๐๐ก๐ก
โข NP: No Peepholes
โข FGR: Full gate recurrence
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 14: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/14.jpg)
Experiment setup
Datasets:
โข TIMIT speech corpus
โข IAM Online Handwriting Database
โข JSB Chorales
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 15: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/15.jpg)
Experiment setup
Features:
โข TIMIT speech corpus:โข extract 12 MFCCs + energy as well as their first and second derivatives
โข IAM Online Handwriting Database:โข x, y, t and the time of the pen lifting
โข JSB Chorales:
โข transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 16: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/16.jpg)
Experiment setup
Network Architectures and training:
Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training
TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD
IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD
JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 17: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/17.jpg)
Comparison of the Variants
โข Test set performance for all 200 trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 18: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/18.jpg)
Comparison of the Variants
โข Test set performance for the best 10% trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 19: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/19.jpg)
Impact of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 20: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/20.jpg)
Interaction of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 21: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/21.jpg)
Total marginal predicted performance
TIMIT:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 22: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/22.jpg)
Total marginal predicted performance
IAM Online:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 23: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/23.jpg)
Total marginal predicted performance
JSB Chorales :
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 24: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/24.jpg)
Conclusion
โข The most commonly used LSTM architecture performs reasonably well on various datasets.
โข Coupling the input and forget gates (CIFG) or removing peephole connections (NP)
simplified LSTMs in these experiments without significantly decreasing performance.
โข The forget gate and the output activation function are the most critical components of the
LSTM block
โข the learning rate is the most crucial hyperparameter, followed by the network size.
โข Hyperparameters are virtually independent
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 25: LSTM: A Search Space Odyssey](https://reader030.vdocuments.mx/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/25.jpg)
References:
โข K. Greff, R. K. Srivastava, J. Koutnรญk, B. R. Steunebrink and J. Schmidhuber, "LSTM: A
Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.
28, no. 10, pp. 2222-2232, Oct. 2017.
โข https://www.youtube.com/watch?v=lycKqccytfU
โข https://www.youtube.com/watch?v=lWkFhVq9-nc
โข https://en.wikipedia.org/wiki/Long_short-term_memory
Introduction LSTM with peephole connections Results and discussion Conclusion