hessian-free optimization for learning deep multidimensional recurrent neural networks

Download Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Post on 20-Jan-2017

64 views

Category:

Science

4 download

Embed Size (px)

TRANSCRIPT

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural NetworksNIPS2015@Kotaro_Setoyama2016.1.20

(www.pcl.cs.waseda.ac.jp/)4Text-to-SpeechPFISpeech-to-TextTwitter@Kotaro_SetoyamaKotaro Setoyama

Agenda

Introduction

MDRNN(Multidimensional Recurrent Neural Network)

Hessian-free for MDRNN

CTC(Connectionist Temporal Classification)

Experiments

Conclusion

Introduction

,, MDRNN CTCstate-of-the-art RNNhidden

5 DNN

Introduction

MDRNN CTCstate-of-the-art RNNhidden

5 DNN,,Hessian-free

Introduction

Hessian-freeHessian

Introduction

Hessian-free

Introduction

MDRNNCTC

nnipsuu__nn_ _ipsuu nips

Introduction

CTC Hessian-free

Introduction

CTC Hessian-free

Introduction

5MDRNNHessian-freeCTCMDRNN

Introduction

5MDRNNHessian-freeCTCMDRNN

MDRNN

MDRNN

RNN 1

MDRNN

MDRNN/ CNN RNN RNN 1

MDRNN

hidden layerInput layer(i,j-1)(i-1,j)(i,j)(i,j)

(i,j)(0,0)

x1x2 hidden layer11input layer

1RGB

MDRNN

hidden layerInput layer

(0,0)Forward Pass

MDRNN

hidden layerInput layer

(0,0)

Input layerhidden layer

(0,1)

(0,0)Forward Pass

MDRNN

hidden layerInput layer

(i,j-1)

(i-1,j-1)(i,j-2)Forward Pass

MDRNN

hidden layerInput layer

(i,j-1)

(i-1,j-1)(i,j-2)

Input layerhidden layer

(i,j-1)

(i,j)

(i-1,j)Forward Pass

MDRNN

Backward Pass

hidden layeroutput layer

(i+1,j)

(i,j)

(i,j+1)

RNN

MDRNN

Backward PassForward Pass

:

:

Multidirectional Network

RNNRNNRNN(bidirectional RNN)

MDRNN

Multidirectional Network

(0,0)(0,0)(0,0)(0,0)x1x1x1x1x2x2x2x2

(i,j)

layer1context

layer3contextlayer2contextlayer4context4Multidirectional Network

Multidimensional LSTM

LSTMMultidimensional

CellOutput GateInput GateForget Gate

Long Short-Term Memory

Multidimensional LSTM

LSTM1 MDLSTM

[AlexGraves.Supervisedsequencelabellingwithrecurrentneuralnetworks,volume385.Springer,2012. ]

Hierarchical Subsampling Networks

Hierarchical Subsampling Networks

1

Learning MDRNN

d-dimensional MDRNN with M inputs and K output::

:T

::

:N

learning phase

Learning MDRNN

d-dimensional MDRNN with M inputs and K output::

:T

ex)2d=2 T:M: T1:K:909 T2:

::

:N

Hessian-free

Hessian-free for MDRNN

Hessian-free21. G

:

:

:2. HF(CG) CG

Hessian-free for MDRNN

Hessian-free21. GGGeneralized Gauss-Newton matrix(GGN)CGG

Hessian-free for MDRNN

Hessian-free22.

Hessian-free for MDRNN

Hessian-free22.

GGN

Generalized Gauss-Newton matrix

d-dimensional MDRNN with M inputs and K output::

:T

::

:N

learning phase

Learning MDRNN

Generalized Gauss-Newton matrix

chain rule[James Martens and Ilya Sutskever. Training deep and recurrent networks with Hessian-free optimization. In Neural Networks: Tricks of the Trade, pages 479535. Springer, 2012. ]

Generalized Gauss-Newton matrix

Generalized Gauss-Newton matrix

f

Generalized Gauss-Newton matrix

f

Generalized Gauss-Newton matrix

Generalized Gauss-Newton matrix

Generalized Gauss-Newtion matrix i

Generalized Gauss-Newton matrix

Generalized Gauss-Newtion matrix

NN i

Generalized Gauss-Newton matrix

Generalized Gauss-Newton matrix

0

Generalized Gauss-Newton matrix

0

Levenberg-Marquardt

Generalized Gauss-Newton matrix

Levenberg-Marquardt

Generalized Gauss-Newton matrix

Generalized Gauss-Newton matrix

L ()

Generalized Gauss-Newton matrix

L ()

GGN

Generalized Gauss-Newton matrix

GGN

Hessian-free for MDRNN

Hessian-free21. G

:

:

:2. HF(CG) CG

Generalized Gauss-Newton matrix

GGN

Generalized Gauss-Newton matrix

GGN

Generalized Gauss-Newton matrix

1.2.3.4.

Generalized Gauss-Newton matrix

LSTMMDRNNappendix A

Generalized Gauss-Newton matrix

Generalized Gauss-Newton matrix

CTC

Generalized Gauss-Newton matrix

CTC

Generalized Gauss-Newton matrix

GGN

:t

Generalized Gauss-Newton matrix

back propagationGGNGGN

Hessian-free

Hessian-free1.2.GGN

GGN

CTC

nnipsuu__nn_ _ipsuu nips

CTC

CTC

CTC

={a,b,c} ={_,a,b,c} =cbab(c _ b _ _ a a b)

CTC

CTC1.12. forward backward method 3. forward backward method

CTC ={a,b,c} ={_,a,b,c} =cbab(c _ b _ _ a a b)

cbabt=8

CTC

12345678123456789t

s

Icbab_____I

CTC

CTC1.12. forward backward method 3. forward backward method

CTCCTCRNN

CTCCTCRNN

1

CTC

CTC1.12. forward backward method 3. forward backward method

CTCCTCRNN

forward backward method

CTC

12345678123456789t

s

1

CTC

12345678123456789t

s

CTCforward backward methods

:

:

:

:

CTCt=1s=1s=2

t > 1 or

CTCt=2t=T

CTC

CTC1.12. forward backward method 3. forward backward method

CTCd

BPTT

CTCd

BPTT[Alex Graves, Santiago Ferna ndez, Faustino Gomez, and Ju rgen Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, pages 369376, 2006. ]

CTC

MDRNNa

I=z

CTC

CTC

Hessian-free

Hessian-free1.2.GGN

GGN

CTC

:

:

CTCGGN

GGN

Generalized Gauss-Newton matrix

GGN

:t

CTCGGN

GGN

CTCGGN

GGN

CTC

GGN

CTC

GGN

CTC

:

CTC1. 2.3.

CTCMDRNNHessian-freeGNNGNN2

CTC

Generalized Gauss-Newton matrix

Generalized Gauss-Newtion matrix i

CTC

i

CTC

i

I1

CTC

=

CTC

1.CTCGNN2.GNNCTC

ExperimentCTCMDRNNSGDIFN/ENIT25955()TIMIT3696 sentences()5.2

Result

HFSGD NETWORKSA^BLSTMAB

Result

phoneme error rate(PER)PERedit distance

Conclusion

5MDRNNHessian-freeCTCMDRNN