deep learning for computer vision: recurrent neural networks (upc 2016)

38
Day 2 Lecture 6 Recurrent Neural Networks Xavier Giró-i-Nieto [course site]

Upload: xavier-giro

Post on 06-Jan-2017

477 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 2: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

2

Acknowledgments

Santi Pascual

Page 3: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

3

General idea

ConvNet(or CNN)

Page 4: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

4

General idea

ConvNet(or CNN)

Page 5: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

5

Multilayer Perceptron

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

The output depends ONLY on the current input.

Page 6: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

6

Recurrent Neural Network (RNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

The hidden layers and the output depend from previous

states of the hidden layers

Page 7: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

7

Recurrent Neural Network (RNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

The hidden layers and the output depend from previous

states of the hidden layers

Page 8: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

8

Recurrent Neural Network (RNN)

time

time

Rotation 90o

Front View Side View

Rotation 90o

Page 9: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

9

Recurrent Neural Networks (RNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

Each node represents a layerof neurons at a single timestep.

tt-1 t+1

Page 10: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

10

Recurrent Neural Networks (RNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

tt-1 t+1

The input is a SEQUENCE x(t) of any length.

Page 11: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

11

Recurrent Neural Networks (RNN)Common visual sequences:

Still image Spatial scan (zigzag, snake)

The input is a SEQUENCE x(t) of any length.

Page 12: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

12

Recurrent Neural Networks (RNN)Common visual sequences:

Video Temporalsampling

The input is a SEQUENCE x(t) of any length.

...

t

Page 13: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

13

Recurrent Neural Networks (RNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

Must learn temporally shared weights w2; in addition to w1 & w3.

tt-1 t+1

Page 14: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

14

Bidirectional RNN (BRNN)

Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

Must learn weights w2, w3, w4 & w5; in addition to w1 & w6.

Page 15: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

15Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”

Bidirectional RNN (BRNN)

Page 16: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

16Slide: Santi Pascual

Formulation: One hidden layerDelay unit

(z-1)

Page 17: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

17Slide: Santi Pascual

Formulation: Single recurrence

One-timeRecurrence

Page 18: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

18Slide: Santi Pascual

Formulation: Multiple recurrences

Recurrence

One time-steprecurrence

T time stepsrecurrences

Page 19: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

19Slide: Santi Pascual

RNN problems

Long term memory vanishes because of the T nested multiplications by U.

...

Page 20: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

20Slide: Santi Pascual

RNN problems

During training, gradients may explode or vanish because of temporal depth.

Example: Back-propagation in time with 3 steps.

Page 21: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

21

Long Short-Term Memory (LSTM)

Page 22: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

22Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780.

Long Short-Term Memory (LSTM)

Page 23: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

23Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)

Long Short-Term Memory (LSTM)Based on a standard RNN whose neuron activates with tanh...

Page 24: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

24

Long Short-Term Memory (LSTM)Ct is the cell state, which flows through the entire chain...

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)

Page 25: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

25

Long Short-Term Memory (LSTM)...and is updated with a sum instead of a product. This avoid memory vanishing and exploding/vanishing backprop gradients.

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)

Page 26: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

26

Long Short-Term Memory (LSTM)Three gates are governed by sigmoid units (btw [0,1]) define the control of in & out information..

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)

Page 27: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

27

Long Short-Term Memory (LSTM)

Forget Gate:

Concatenate

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Page 28: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

28

Long Short-Term Memory (LSTM)

Input Gate Layer

New contribution to cell state

Classic neuronFigure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Page 29: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

29

Long Short-Term Memory (LSTM)

Update Cell State (memory):

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Page 30: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

30

Long Short-Term Memory (LSTM)

Output Gate Layer

Output to next layer

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Page 31: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

31

Long Short-Term Memory (LSTM)

Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Page 32: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

32

Gated Recurrent Unit (GRU)

Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.

Similar performance as LSTM with less computation.

Page 33: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

33Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes

Gated Recurrent Unit (GRU)

Page 34: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

34

Applications: Machine Translation

Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

Language IN

Language OUT

Page 35: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

35

Applications: Image Classification

van den Oord, Aaron, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel Recurrent Neural Networks." arXiv preprint arXiv:1601.06759 (2016).

RowLSTM Diagonal BiLSTM

Classification MNIST

Page 36: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)

36

Applications: Segmentation

Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville, “ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation”. DeepVision CVPRW 2016.