[dl輪読会]quasi-recurrent neural networks

Download [DL輪読会]QUASI-RECURRENT NEURAL NETWORKS

Post on 21-Jan-2018

609 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  1. 1. QUASI-RECURRENT NEURAL NETWORKS James Bradbury, Stephen Merity , Caiming Xiong & Richard Socher 2017-05-12 @ M1
  2. 2. Agenda 1. Information 2. Introduction 3. Proposed Model 4. Experiment & result 5. Conclusion
  3. 3. 1. Information Author - James Bradbury, Stephen Merity , Caiming Xiong & Richard SocherSalesforce - Salesforce Research Submission date - Submitted on 5 Nov 2016 (v1), last revised 21 Nov 2016 (this version, v2) Society - ICLR2017 - https://arxiv.org/abs/1611.01576 About - CNN (*CNNRNN )
  4. 4. 2. Introduction RNN - RNN 1. h(t)h(t-1) 2. * W : h1 h2 hn t-1 z1 z2 zn : t W
  5. 5. 2. Introduction CNN -Fully character-level neural machine translation without explicit segmentation(Lee et al., 2016)CNN 1. (time invariance) :Fully character-level neural machine translation without explicit segmentation
  6. 6. 2. Introduction QRNN - CNN - () - PoolingLSTM like
  7. 7. 2. Introduction 3 1. document-level sentiment classification 2. language modeling 3. character-level machine translation LSTM EpochLSTM2550%
  8. 8. 3. Proposed Model QRNNCNNPooling nT mZ - (Masked convolution) -
  9. 9. 3. Proposed Model LSTM3 1. = tanh( ) 2. = ( ) 3. = ( ) - * Masked Convolution LSTM - 2 1. = tanh( 1 1 + 2 ) ->LSTMinput 2. = ( 1 1 + 2 ) ->LSTMforget 3. = ( 1 1 + 2 ) ->LSTMoutput
  10. 10. 3. Proposed Model Pooling - LSMT - 3pooling 1. f-pooling = 1 + (1 ) 2. fo-pooling = 1 + (1 ) = 3. ifo-pooling = 1 + =
  11. 11. 3. Proposed Model Regularization - LSTMzoneout - = 1 (1 ( )) Densely-connected layers - Sequence classificationQRNN skip connection(tt+d ) Encoder-Decoder Models - QRNNQRNN encoder, decoder
  12. 12. 4. Experiment & result QRNN 1. document-level sentiment classification 2. language modeling 3. character-level machine translation
  13. 13. 4. Experiment & result 1. document-level sentiment classification : IMDb Dataset - Input : - Label : positive(25,000sample) / negative(25,000sample) 2 hyper-parameter - 4densely-connected, 256 - word vector dimensions: 300 - Dropout = 0.3, L2 = 4 * 10^-6 - Minibatch = 24, RMSprop, lr=0.001, =0.9, =10^-8
  14. 14. 4. Experiment & result 1. document-level sentiment classification timestep 120~160 word LSTM
  15. 15. 4. Experiment & result 2. language modeling : Penn Treebank - - Train: 929,000 words, validation: 73,000 words, test: 82,000words - Word-level prediction - Perplexity(smaller is better) - exp( () log 1 () ) hyper-parameter - 2, 640 - SGD + moumentum, lr=[1 if n_epoch