lstm: a search space odyssey

Post on 04-Oct-2021

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LSTM: A Search Space Odyssey

Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutn´ık, Bas R. Steunebrink, J¨urgen Schmidhuber

Outlines

• Introduction

• Long Short-Term Memory (LSTM) with peephole connections

• Experiment and discussion

• Conclusion

Definition:

• Recurrent Neural Networks

• Importance and its applications

• Gradient problem

• Vanishing gradient

• Exploding gradient

• What is the LSTM?

Introduction LSTM with peephole connections Results and discussion Conclusion

LSTM History:

• LSTM was proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber.

• In 1999, Felix Gers and Jürgen Schmidhuber and Fred Cummins introduced the

forget gate into LSTM architecture.

• In 2000, Gers & Schmidhuber & Cummins added peephole connections

• In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated

recurrent unit

Introduction LSTM with peephole connections Results and discussion Conclusion

Simple RNN

Introduction LSTM with peephole connections Results and discussion Conclusion

Block diagram

• Three gates:• Input gate

• Forget gate

• Output gate

• Two blocks:• Block input

• Block output

• One cell state:• cell state

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

Block input:

𝑊𝑊𝑧𝑧: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑧𝑧: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑧𝑧: bias weight

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

z

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

Input gate:𝑊𝑊𝑖𝑖: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑖𝑖: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑖𝑖: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑖𝑖: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

i

𝑐𝑐𝑡𝑡−1

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

Forget gate:𝑊𝑊𝑓𝑓: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑓𝑓: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑓𝑓: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑓𝑓: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

f

𝑐𝑐𝑡𝑡−1

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

Output gate:𝑊𝑊𝑜𝑜: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑜𝑜: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑜𝑜: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑜𝑜: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

o

𝑐𝑐𝑡𝑡

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

State cell:𝑧𝑧𝑡𝑡: the output of block input at time t

𝑖𝑖𝑡𝑡: the output of input gate at time t

𝑐𝑐𝑡𝑡−1: the output of cell state at time

t-1

𝑓𝑓𝑡𝑡: output of forget gate at time t

𝑐𝑐𝑡𝑡−1

𝑖𝑖𝑡𝑡

𝑧𝑧𝑡𝑡

𝑐𝑐𝑡𝑡−1

𝑓𝑓𝑡𝑡

Introduction LSTM with peephole connections Results and discussion Conclusion

Block Diagram

Block output:𝑜𝑜𝑡𝑡: the output of output gate at time t

𝑐𝑐𝑡𝑡: state cell at time tInput

Recurrent

y

Introduction LSTM with peephole connections Results and discussion Conclusion

LSTM Variants

• NIG: No Input Gate: 𝑖𝑖𝑡𝑡 = 1

• NFG: No Forget Gate: 𝑓𝑓𝑡𝑡 = 1

• NOG: No Output Gate: 𝑜𝑜𝑡𝑡 = 1

• NIAF: No Input Activation Function: g(x) = x

• NOAF: No Output Activation Function: h(x) = x

• CIFG: Coupled Input and Forget Gate: 𝑓𝑓𝑡𝑡 = 1- 𝑖𝑖𝑡𝑡

• NP: No Peepholes

• FGR: Full gate recurrence

Introduction LSTM with peephole connections Results and discussion Conclusion

Experiment setup

Datasets:

• TIMIT speech corpus

• IAM Online Handwriting Database

• JSB Chorales

Introduction LSTM with peephole connections Results and discussion Conclusion

Experiment setup

Features:

• TIMIT speech corpus:• extract 12 MFCCs + energy as well as their first and second derivatives

• IAM Online Handwriting Database:• x, y, t and the time of the pen lifting

• JSB Chorales:

• transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.

Introduction LSTM with peephole connections Results and discussion Conclusion

Experiment setup

Network Architectures and training:

Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training

TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD

IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD

JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD

Introduction LSTM with peephole connections Results and discussion Conclusion

Comparison of the Variants

• Test set performance for all 200 trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Comparison of the Variants

• Test set performance for the best 10% trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Impact of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Interaction of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Total marginal predicted performance

TIMIT:

Introduction LSTM with peephole connections Results and discussion Conclusion

Total marginal predicted performance

IAM Online:

Introduction LSTM with peephole connections Results and discussion Conclusion

Total marginal predicted performance

JSB Chorales :

Introduction LSTM with peephole connections Results and discussion Conclusion

Conclusion

• The most commonly used LSTM architecture performs reasonably well on various datasets.

• Coupling the input and forget gates (CIFG) or removing peephole connections (NP)

simplified LSTMs in these experiments without significantly decreasing performance.

• The forget gate and the output activation function are the most critical components of the

LSTM block

• the learning rate is the most crucial hyperparameter, followed by the network size.

• Hyperparameters are virtually independent

Introduction LSTM with peephole connections Results and discussion Conclusion

References:

• K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A

Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.

28, no. 10, pp. 2222-2232, Oct. 2017.

• https://www.youtube.com/watch?v=lycKqccytfU

• https://www.youtube.com/watch?v=lWkFhVq9-nc

• https://en.wikipedia.org/wiki/Long_short-term_memory

Introduction LSTM with peephole connections Results and discussion Conclusion

top related