recurrent neural networks - stanford...

82
Lecture 10 - 8 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 2016 1 Lecture 10: Recurrent Neural Networks

Upload: hatu

Post on 27-Jul-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20161

Lecture 10:

Recurrent Neural Networks

Page 2: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20162

Administrative- Midterm this Wednesday! woohoo!- A3 will be out ~Wednesday

Page 3: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20163

Page 4: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20164

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

Page 5: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20165

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

Page 6: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20166

Recurrent Networks offer a lot of flexibility:

Vanilla Neural Networks

Page 7: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20167

Recurrent Networks offer a lot of flexibility:

e.g. Image Captioningimage -> sequence of words

Page 8: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20168

Recurrent Networks offer a lot of flexibility:

e.g. Sentiment Classificationsequence of words -> sentiment

Page 9: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20169

Recurrent Networks offer a lot of flexibility:

e.g. Machine Translationseq of words -> seq of words

Page 10: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201610

Recurrent Networks offer a lot of flexibility:

e.g. Video classification on frame level

Page 11: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201611

Multiple Object Recognition with Visual Attention, Ba et al.

SequentialProcessingof fixed inputs

Page 12: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201612

DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.

SequentialProcessingof fixed outputs

Page 13: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201613

Recurrent Neural Network

x

RNN

Page 14: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201614

Recurrent Neural Network

x

RNN

yusually want to predict a vector at some time steps

Page 15: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201615

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

Page 16: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201616

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

Page 17: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201617

(Vanilla) Recurrent Neural Network

x

RNN

y

The state consists of a single “hidden” vector h:

Page 18: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201618

Character-levellanguage modelexample

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

x

RNN

y

Page 19: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201619

Character-levellanguage modelexample

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 20: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201620

Character-levellanguage modelexample

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 21: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201621

Character-levellanguage modelexample

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 22: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201622

min-char-rnn.py gist: 112 lines of Python

(https://gist.github.com/karpathy/d4dee566867f8291f086)

Page 34: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201634

x

RNN

y

Page 35: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201635

Page 36: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201636

train more

train more

train more

at first:

Page 37: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201637

Page 38: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201638

open source textbook on algebraic geometry

Latex source

Page 39: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201639

Page 40: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201640

Page 41: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201641

Page 42: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201642

Generated C code

Page 43: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201643

Page 44: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201644

Page 45: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201645

Searching for interpretable cells

[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]

Page 46: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201646

quote detection cell

Searching for interpretable cells

Page 47: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201647

line length tracking cell

Searching for interpretable cells

Page 48: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201648

if statement cell

Searching for interpretable cells

Page 49: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201649

quote/comment cell

Searching for interpretable cells

Page 50: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201650

code depth cell

Searching for interpretable cells

Page 51: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201651

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

Page 52: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201652

Convolutional Neural Network

Recurrent Neural Network

Page 53: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

test image

Page 54: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

test image

Page 55: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

test image

X

Page 56: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

test image

x0<START>

<START>

Page 57: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

before:h = tanh(Wxh * x + Whh * h)

now:h = tanh(Wxh * x + Whh * h + Wih * v)

v

Wih

Page 58: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

straw

sample!

Page 59: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

straw

h1

y1

Page 60: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

sample!

Page 61: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

Page 62: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

sample<END> token=> finish.

Page 63: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Image Sentence Datasets

Microsoft COCO[Tsung-Yi Lin et al. 2014]mscoco.org

currently:~120K images~5 sentences each

Page 64: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x
Page 65: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x
Page 66: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

66Show Attend and Tell, Xu et al., 2015

Preview of fancier architectures

RNN attends spatially to different parts of images while generating each word of the sentence:

Page 67: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201667

time

depth

RNN:

Page 68: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201668

time

depth

RNN:

LSTM:

Page 69: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201669

LSTM

Page 70: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201670

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

x

h

vector from before (h)

W

i

f

o

g

vector from below (x)

sigmoid

sigmoid

tanh

sigmoid

4n x 2n 4n 4*n

Page 71: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201671

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

cellstate c

f

x

Page 72: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201672

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

cellstate c

f

x

i g

x

+

Page 73: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201673

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

cellstate c

f

x +

tanh

o x

h

c

i g

x

Page 74: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201674

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

cellstate c

f

x +

tanh

o x

h

c

i g

x

higher layer, or prediction

Page 75: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201675

LSTM

cellstate c

f

x

i g

x

+

tanh

o

xf

x

i g

x

+

tanh

o

x

one timestep one timestep

h hh x x

Page 76: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201676

f f fRNN

state

f

+

f

+

f

+LSTM(ignoring forget gates)

Page 77: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201677

Recall: “PlainNets” vs. ResNets

ResNet is to PlainNet what LSTM is to RNN, kind of.

Page 78: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201678

Understanding gradient flow dynamicsCute backprop signal video: http://imgur.com/gallery/vaNahKE

Page 79: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201679

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

Page 80: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201680

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

can control exploding with gradient clippingcan control vanishing with LSTM

Page 81: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201681

LSTM variants and friends

[LSTM: A Search Space Odyssey, Greff et al., 2015]

[An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015]

GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]

Page 82: Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 201682

Summary- RNNs allow a lot of flexibility in architecture design- Vanilla RNNs are simple but don’t work very well- Common to use LSTM or GRU: their additive interactions

improve gradient flow- Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

- Better/simpler architectures are a hot topic of current research- Better understanding (both theoretical and empirical) is needed.