recurrent and recursive networks - ida · 2016-09-14 · recurrent neural networks (rnns) •...

42
Recurrent and Recursive Networks Marco Kuhlmann Neural Networks with Applications to Vision and Language

Upload: others

Post on 26-Jun-2020

37 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Recurrent and Recursive Networks

Marco Kuhlmann

Neural Networks with Applications to Vision and Language

Page 2: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Introduction

Page 3: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Applications of sequence modelling

• Map unsegmented connected handwriting to strings.

• Map sequences of acoustic signals to sequences of phonemes.

• Translate sentences from one language into another one.

• Generate baby names, poems, source code, patent applications.

Page 4: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The bag-of-words model

The gorgeously elaborate continuation of “The Lord of the Rings” trilogy is so huge that a column of words cannot adequately describe co-writer/director Peter Jackson’s expanded vision of J.R.R. Tolkien’s Middle-earth.

… is a sour little movie at its core; an exploration of the emptiness that underlay the relentless gaiety of the 1920’s, as if to stop would hasten the economic and global political turmoil that was to come.

pos neg

Page 5: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The bag-of-words model

a adequately cannot co-writer column continuation describe director elaborate expanded gorgeously huge is J.R.R. Jackson Lord Middle-earth of of of of Peter Rings so that The The the Tolkien trilogy vision words

1920’s a an and as at come core economic emptiness exploration gaiety global hasten if is its little movie of of political relentless sour stop that that the the the the to to turmoil underlay was would

pos neg

Page 6: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Part-of-speech tagging

jag bad om en kort bit

PN VB PP DT JJ NN

NN NN

PL RG

AB VBSN PN

NN

AB NN

Page 7: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Hidden Markov Models (HMMs)

jag bad om en kort bit

PN VB PP DT JJ NN

𝑃(PN|BOS)𝑃(VB|PN)

𝑃(PP|VB)𝑃(DT|PP)

𝑃(JJ|DT)𝑃(NN|JJ)

𝑃(EOS|NN)

𝑃(jag|PN) 𝑃(om|PP) 𝑃(bit|NN)𝑃(en|DT)𝑃(bad|VB) 𝑃(kort|JJ)

transition probabilities, emission probabilities

Page 8: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

VB

PN

BOS EOS

𝑃(VB| BOS) 𝑃(EOS|VB)

𝑃(PN| BOS) 𝑃(EOS|PN)

𝑃(PN|VB) 𝑃(VB|PN)

𝑃(VB|VB)

𝑃(PN|PN)

𝑃(𝑤 |VB)

𝑃(𝑤 |PN)

Page 9: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

A weakness of Hidden Markov Models

• The only information that an HMM has access to at any given point in time is its state.

• Suppose that the HMM has 𝑛 states. Then the number of the current state can be written using log 𝑛 bits.

• Thus the current state contains at most log 𝑛 bits of information about the sequence generated so far.

Page 10: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Strengths of recurrent neural networks

• Distributed hidden state

In recurrent neural networks, several units can be active at once, which allows them to store a lot of information efficiently. contrast that with one current state in HMMs

• Non-linear dynamics

Different units can interact with each other in non-linear ways. This makes recurrent neural networks Turing-complete. contrast that with linear dynamical systems

Attribution: Geoffrey Hinton

Page 11: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Recurrent neural networks (RNNs)

• Recurrent neural networks can be visualised as networks with feedback connections, which form directed cycles between units.

• These feedback connections are unfolded over time.

• A crucial property of recurrent neural networks is that they share the same set of parameters across different timesteps.

Page 12: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

RNN, cyclic representation

𝒐

𝒙

= delay of one timestep𝒇

𝒉

Page 13: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

RNN, unrolled

𝒐(2)

𝒙(2)

𝒐(1) 𝒐(3)

𝒙(1) 𝒙(3)

𝒉(2) 𝒉(3)𝒉(1)𝒇 𝒇𝒇 𝒇

Page 14: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

General observations

• The parameters of the model are shared across all timesteps.

• The hidden state can be influenced by the entire input seen so far. Contrast this with the Markov assumption of HMMs.

• The hidden state can be a ‘lossy summary’ of the input sequence. Hopefully, this state will encode useful information for the task at hand.

• The model has the same input size regardless of sequence length. specified in terms of transitions from one state to the other

Page 15: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Different types of RNN architectures

encoder

generator

transducer

Page 16: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Training recurrent neural networks

Page 17: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Computation graph for a standard architecture

𝒚(2)𝒚(1) 𝒚(3)

𝑳(2)𝑳(1) 𝑳(3)

𝒙(2)𝒙(1) 𝒙(3)

𝑾 𝑾𝑾 𝑾𝒉(2) 𝒉(3)𝒉(1)

𝑽

𝑼 𝑼 𝑼

𝑽 𝑽

𝒐(1) 𝒐(2) 𝒐(3)

Page 18: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Assumptions

• The hidden states are computed by some nonlinear activation function, such as tanh.

• The outputs at each time step are normalised log-probabilities representing distributions over a finite set of labels. In the book, this is assumed to happen implicitly when computing the loss.

Page 19: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Backpropagation through time

• Unrolled recurrent neural networks are just feedforward networks, and can therefore be trained using backpropagation. parameter sharing; linear constraints on the parameters

• This way of training recurrent neural networks is called backpropagation through time.

• Given that the unrolled computation graphs can be very deep, the vanishing gradient problem is exacerbated in RNNs.

Page 20: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

𝑦𝑖

𝐸

𝑤𝑖𝑗

𝑤𝑗𝑘

𝑓

𝑓

𝑦𝑘𝑧𝑘

𝑦𝑗𝑧𝑗

𝑡

Page 21: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

𝑦𝑖

𝐸

𝑤𝑖𝑗

𝑤𝑗𝑘

𝑓

𝑓

𝑦𝑘𝑧𝑘

𝑦𝑗𝑧𝑗

𝑡

Page 22: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Backpropagation through time

𝒚(2)𝒚(1) 𝒚(3)

𝑳(2)𝑳(1) 𝑳(3)

𝒙(2)𝒙(1) 𝒙(3)

𝑾 𝑾𝑾 𝑾𝒉(2) 𝒉(3)𝒉(1)

𝑽

𝑼 𝑼 𝑼

𝑽 𝑽

𝒐(1) 𝒐(2) 𝒐(3)

Page 23: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Initial values of the hidden state

• We could manually specify the initial state in terms of some sensible starting value.

• We could learn the initial state by starting with a random guess and then updating that guess during backpropagation.

Page 24: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Networks with output recurrence

𝒚(2)𝒚(1) 𝒚(3)

𝑳(2)𝑳(1) 𝑳(3)

𝑾 𝑾𝑾 𝑾

𝒉(2) 𝒉(3)𝒉(1)

𝑽 𝑽 𝑽

𝒙(2)𝒙(1) 𝒙(3)

𝑼 𝑼 𝑼

𝒐(1) 𝒐(2) 𝒐(3)

Page 25: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The limitations of recurrent neural networks

• In principle, recurrent networks are capable of learning long-distance dependencies.

• In practice, standard gradient-based learning algorithms do not perform very well. Bengio et al. (1994) – the ‘vanishing gradient’ problem

• Today, there are several methods available for training recurrent neural networks that avoids these problems. LSTMs, optimisation with small gradients, careful weight initialisations, …

Page 26: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Vanishing and exploding gradients

-1

-0,5

0

0,5

1

-6 -3 0 3 6

sigmoid tanh relu

0

0,25

0,5

0,75

1

-6 -3 0 3 6

sigmoid tanh relu

activation functions gradients

Page 27: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Recursive neural networks

𝒙(2)𝒙(1) 𝒙(3) 𝒙(4)

𝒐 𝒚

𝑳

𝑽 𝑽 𝑽 𝑽

𝑼 𝑾 𝑼 𝑾

𝑼 𝑾

Page 28: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Long Short-Term Memory (LSTM)

Page 29: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Long Short-Term Memory

• The Long Short-Term Memory (LSTM) architecture was specifically designed to battle the vanishing gradients problem

• Metaphor: The dynamic state of the neural network can be considered as a short-term memory.

• The LSTM architecture tries to make this short-term memory last as long as possible by preventing vanishing gradients.

• Central idea: gating mechanism

Page 30: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Memory cell and gating mechanism

The crucial innovation in an LSTM is the design of its memory cell.

• Information is written into the cell whenever its ‘write’ gate is on.

• The information stays in the cell as long as its ‘keep’ gate is on.

• Information is read from the cell whenever its ‘read’ gate is on.

Page 31: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Information flow in an LSTM

write write write

keep keep keep keep

write write write

1.7

1.7 1.7 1.7

1.7

Attribution: Geoffrey Hinton

time →

Page 32: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

A look inside an LSTM cell

𝒙(𝑖)

𝒚(𝑖)

𝒉(𝑖−1)

𝒔(𝑖−1)

𝒉(𝑖)

𝒔(𝑖)

𝜎 tanh𝜎 𝜎

tanh

• +

••

Attr

ibut

ion:

Chr

is O

lah

Page 33: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The ‘keep’ gate (‘forget gate’)

𝒙(𝑖)

𝜎

𝒉(𝑖−1)

Attr

ibut

ion:

Chr

is O

lah

Page 34: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The ‘write’ gate (‘input gate’)

𝜎

𝒙(𝑖)

𝒉(𝑖−1)

Attr

ibut

ion:

Chr

is O

lah

Page 35: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Update candidate

𝒙(𝑖)

𝒉(𝑖−1)tanh

Attr

ibut

ion:

Chr

is O

lah

Page 36: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Updating the internal state

𝒔(𝑖−1) 𝒔(𝑖)• +

𝒙(𝑖)

Attr

ibut

ion:

Chr

is O

lah

Page 37: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

The ‘write’ gate (‘output gate’)

𝒙(𝑖)

𝒉(𝑖−1)𝜎

Attr

ibut

ion:

Chr

is O

lah

Page 38: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Updating the external state

𝒚(𝑖)

𝒉(𝑖)

tanh

Attr

ibut

ion:

Chr

is O

lah

Page 39: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Peephole connections

𝒙𝑖

𝒚𝑖

𝒔𝑖−1 𝒔𝑖

𝜎 tanh𝜎 𝜎

tanh

× +

××

Attr

ibut

ion:

Chr

is O

lah

Page 40: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Gated Recurrent Unit (GRU)

𝒙(𝑖)

𝒉(𝑖−1) 𝒉(𝑖)

tanh𝜎

1−•

• +

𝜎

Attr

ibut

ion:

Chr

is O

lah

Page 41: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

Bidirectional RNNs

• In speech recognition, the correct interpretation of a given sound may depend on both the previous sounds and the next sounds.

• Bidirectional RNNs combine one RNN that moves forward through time with another RNN that moves backward.

• The output can be a representation that depends on both the past and the future, without having to specify a fixed-sized window.

Page 42: Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which

A bidirectional RNN

𝒚(2)

𝒙(2)

𝒚(1) 𝒚(3)

𝒙(1) 𝒙(3)

F B

𝒉(1) 𝒉(2)

𝒈(2) 𝒈(3)

Attr

ibut

ion:

Chr

is O

lah