edunex itb

42
EDUNEX ITB

Upload: others

Post on 13-Apr-2022

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EDUNEX ITB

EDUNEX ITB

Page 2: EDUNEX ITB

EDUNEX ITB

IF4074 Minggu 9-15

β€’ Minggu ke-9 (18 Oktober 2021): LSTM + RNN arsitektur + Tubes 2

β€’ Minggu ke-10 (25 Oktober 2021): RNN Latihan + BPTT

β€’ Minggu ke-11 (1 November 2021): Kuliah tamu (sharing Aplikasi ML di Gojek)

β€’ Minggu ke-12 (8 November 2021): Praktikum RNN

β€’ Minggu ke-13 (15 November 2021): Feature Engineering 1 / TugasDesain eksperimen

β€’ Minggu ke-14 (22 November 2021): Kuis 2

β€’ Minggu ke-15 (29 November 2021): Praktikum Feature Engineering 2

Page 3: EDUNEX ITB

EDUNEX ITB

04 LSTM: What & Why

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF – Teknik Informatika – STEI ITB

Modul 4: Recurrent Neural Network

01

Page 4: EDUNEX ITB

EDUNEX ITB

Long Short-Term Memory (LSTM): Why

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

β„Žπ‘‘ = 𝑓(π‘ˆπ‘₯𝑑 +π‘Šβ„Žπ‘‘βˆ’1 + 𝑏xh)

𝑦𝑑 = 𝑓(π‘‰β„Žπ‘‘ + 𝑏hy)

RNN: long-term dependency problem

U

W V

π‘₯𝑑

β„Žt-1

β„Žt

Suffer from short-term memory (forward propagation).

Suffer from vanishing gradient problem (backward propagation).RNNs fail to learn greater than 5-10 time steps.In the worst case, this may completely stop the neural network from further training.

02

Page 5: EDUNEX ITB

EDUNEX ITB

Long Short-Term Memory (LSTM): What

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTMs are explicitly designed to avoid the long-term dependency problem.

Introduced by Hochreiter & Schmidhuber (1997)

LSTM is special kind of RNN. The differences are the operations within the LSTM’s cells. RNN: repeating module have a very simple structure. LSTM: repeating module contains four interacting layers.

03

Page 6: EDUNEX ITB

EDUNEX ITB

LSTM: Cell State & Gatesβ€’ as the β€œmemory” of the network

β€’ act as a transport highway that transfers relevant information throughout the processing of the sequence.

Cell State

β€’ decides what information should be thrown away or kept.

β€’ Values closer to 0 means to forget, and closer to 1 means to keep.Forget Gate

β€’ Decides what information is relevant to add from the current step

β€’ update the cell state by hidden state and current inputInput Gate

β€’ decides what the next hidden state should be.

β€’ Hidden state contains information on previous inputs. The hidden state is also used for predictions.

Output Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

04

Page 7: EDUNEX ITB

EDUNEX ITB

Forget Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

𝑓𝑑 = 𝜎(π‘ˆπ‘“π‘₯𝑑 +π‘Šπ‘“β„Žπ‘‘βˆ’1 + 𝑏f)

Value 1 represents

β€œcompletely keep this”

while a 0 represents β€œcompletely get rid of this.”

π‘₯𝑑

β„Ž t-1

𝑐t-1

𝑓𝑑

05

Page 8: EDUNEX ITB

EDUNEX ITB

Input Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

𝑖𝑑 = 𝜎(π‘ˆπ‘–π‘₯𝑑 +π‘Šπ‘–β„Žπ‘‘βˆ’1 + 𝑏i)

π‘₯𝑑

β„Ž t-1

𝑐t-1

𝑓𝑑 𝑖𝑑 ǁ𝑐𝑑

෩𝐢𝑑 = tanh(π‘ˆπ‘π‘₯𝑑 +π‘Šπ‘β„Žπ‘‘βˆ’1 + 𝑏c)

06

Page 9: EDUNEX ITB

EDUNEX ITB

Cell State

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

π‘₯𝑑

β„Ž t-1

𝑐t-1

𝑓𝑑 𝑖𝑑 ǁ𝑐𝑑

𝐢𝑑 = 𝑓𝑑 βŠ™πΆπ‘‘βˆ’1 + 𝑖𝑑 βŠ™ ෩𝐢𝑑

07

Page 10: EDUNEX ITB

EDUNEX ITB

Output Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

π‘₯𝑑

β„Ž t-1

𝑐t-1

𝑓𝑑 𝑖𝑑 ǁ𝑐𝑑

π‘œπ‘‘ = 𝜎 π‘ˆπ‘œπ‘₯𝑑 +π‘Šπ‘œβ„Žπ‘‘βˆ’1 + 𝑏o

β„Žπ‘‘ = π‘œπ‘‘ βŠ™ tanh(𝐢t)

08

Page 11: EDUNEX ITB

EDUNEX ITB

LSTM Forward Propagation: Example

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

Τ¦π‘₯ (2)

β„Ž (1)

π‘ˆπ‘“ , π‘ˆπ‘–,

π‘ˆπ‘ , π‘ˆπ‘œ

π‘Šπ‘“ ,π‘Šπ‘–,

π‘Šπ‘ ,π‘Šπ‘œ

A1 A2 Target

1 2 0.5

0.5 3 1

…

𝑓𝑑 = 𝜎(π‘ˆπ‘“π‘₯𝑑 +π‘Šπ‘“β„Žπ‘‘βˆ’1 + 𝑏f)

𝑖𝑑 = 𝜎(π‘ˆπ‘–π‘₯𝑑 +π‘Šπ‘–β„Žπ‘‘βˆ’1 + 𝑏i)

෩𝐢𝑑 = tanh(π‘ˆπ‘π‘₯𝑑 +π‘Šπ‘β„Žπ‘‘βˆ’1 + 𝑏c)

𝐢𝑑 = 𝑓𝑑 βŠ™πΆπ‘‘βˆ’1 + 𝑖𝑑 βŠ™ ෩𝐢𝑑

π‘œπ‘‘ = 𝜎 π‘ˆπ‘œπ‘₯𝑑 +π‘Šπ‘œβ„Žπ‘‘βˆ’1 + 𝑏o

β„Žπ‘‘ = π‘œπ‘‘ βŠ™ tanh(𝐢t)

Uf0.700 0.450

Ui0.950 0.800

Uc0.450 0.250

Uo0.600 0.400

Wf bf0.100 0.150

Wi bi0.800 0.650

Wc bc0.150 0.200

Wo bo0.250 0.100

ht-1 Ct-10 0

09

Page 12: EDUNEX ITB

EDUNEX ITB

Computing ht and ct : Timestep t1

t1=<12, 0.5> Uf.xt Wf.ht-1+bf net_ft ft

1.600 0.150 1.750 0.852

Ui.xt Wi.ht-1+bi net_it it2.550 0.650 3.200 0.961

Uc.xt Wc.ht-1+bc net_~ct ~ct0.950 0.200 1.150 0.818

Uo.xt Wo.ht-1+bo net_ot ot1.400 0.100 1.500 0.818

Ct ht0.786 0.536

ht-1 Ct-10 0

𝑓𝑑 = 𝜎(π‘ˆπ‘“π‘₯𝑑 +π‘Šπ‘“β„Žπ‘‘βˆ’1 + 𝑏f)

𝑖𝑑 = 𝜎(π‘ˆπ‘–π‘₯𝑑 +π‘Šπ‘–β„Žπ‘‘βˆ’1 + 𝑏i)

෩𝐢𝑑 = tanh(π‘ˆπ‘π‘₯𝑑 +π‘Šπ‘β„Žπ‘‘βˆ’1 + 𝑏c)

𝐢𝑑 = 𝑓𝑑 βŠ™πΆπ‘‘βˆ’1 + 𝑖𝑑 βŠ™ ෩𝐢𝑑

π‘œπ‘‘ = 𝜎 π‘ˆπ‘œπ‘₯𝑑 +π‘Šπ‘œβ„Žπ‘‘βˆ’1 + 𝑏o

β„Žπ‘‘ = π‘œπ‘‘ βŠ™ tanh(𝐢t)

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

10

Page 13: EDUNEX ITB

EDUNEX ITB

Computing ht and ct : Timestep t2

t2=<0.53

, 1>

ht-1 Ct-10.786 0.536

𝑓𝑑 = 𝜎(π‘ˆπ‘“π‘₯𝑑 +π‘Šπ‘“β„Žπ‘‘βˆ’1 + 𝑏f)

𝑖𝑑 = 𝜎(π‘ˆπ‘–π‘₯𝑑 +π‘Šπ‘–β„Žπ‘‘βˆ’1 + 𝑏i)

෩𝐢𝑑 = tanh(π‘ˆπ‘π‘₯𝑑 +π‘Šπ‘β„Žπ‘‘βˆ’1 + 𝑏c)

𝐢𝑑 = 𝑓𝑑 βŠ™πΆπ‘‘βˆ’1 + 𝑖𝑑 βŠ™ ෩𝐢𝑑

π‘œπ‘‘ = 𝜎 π‘ˆπ‘œπ‘₯𝑑 +π‘Šπ‘œβ„Žπ‘‘βˆ’1 + 𝑏o

β„Žπ‘‘ = π‘œπ‘‘ βŠ™ tanh(𝐢t)

Uf.xt Wf.ht-1+bf net_ft ft1.700 0.204 1.904 0.870

Ui.xt Wi.ht-1+bi net_it it2.875 1.079 3.954 0.981

Uc.xt Wc.ht-1+bc net_~ct ~ct0.975 0.280 1.255 0.850

Uo.xt Wo.ht-1+bo net_ot ot1.500 0.234 1.734 0.850

Ct ht1.518 0.772

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

11

Page 14: EDUNEX ITB

EDUNEX ITB

Implementing LSTM on Keras: Many to One

from keras import Sequential

from keras.layers import LSTM, Dense

model = Sequential()

model.add(LSTM(10, input_shape=(50,1)))

#10 neurons & process 50x1 sequences

model.add(Dense(1,activation='linear’))

#linear output as regression problem

https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f

Τ¦π‘₯ (1)

β„Ž (10)

𝑦 (1)

U

V

W

# predict amazon stock closing prices, LSTM 50 timestep

12

Page 15: EDUNEX ITB

EDUNEX ITB

Number of Parameter

Τ¦π‘₯ (1)

β„Ž (10)

𝑦 (1)

U

V

W

Total parameter = (1+10+1)*4*10+(10+1)*1=491

Simple RNN with equal networks: 131 parameterU: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)

13

Total parameter for n unit lstm from m-dimension input to k dimension output = (m+n+1)*4*n+(n+1)*k

Page 16: EDUNEX ITB

EDUNEX ITB

RNN β†’ LSTM β†’ GRU β†’ ReGU

1985Recurrent nets

1997LSTMBi-RNN

2014GRU

2017Residual LSTM

2019Residual Gated Unit

GRU: no cell state, 2 gates

ReGU: shortcut connection

14

Page 17: EDUNEX ITB

EDUNEX ITB

Summary

LSTMs avoid the long-term

dependency problem

LSTMs have a cell state and 3 gates

(forget, input, output)

Computing ht and ct

15

Backpropagation Through Time

Page 18: EDUNEX ITB

EDUNEX ITB

03 RNN Architecture

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF – Teknik Informatika – STEI ITB

Modul 4: Recurrent Neural Network

01

Page 19: EDUNEX ITB

EDUNEX ITB

General Architecture

Τ¦π‘₯ (i)

β„Ž1 (𝑗)

β„Žβ„Ž (k)

𝑦 (π‘š)

Uxh1

Uh1h…

V

…

Uh…hh

Wh1

Wh…

Whh

Τ¦π‘₯1

β„Ž (𝑗)

Τ¦π‘₯2

β„Ž (𝑗)

Τ¦π‘₯𝑛

β„Ž (𝑗)…

n timestep

Return sequence = True/False

02

Page 20: EDUNEX ITB

EDUNEX ITB

Architecture

fixed-sized input vector xt

fixed-sized output vector ot

RNN state st

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

One to many: image captioningMany to one: text classificationMany to many: machine translation, video frame classification, POS tagging

03

Page 21: EDUNEX ITB

EDUNEX ITB

One to Many: Image Captioning

CNN Encoder (Inception) - RNN Decoder (LSTM) (Vinyals dkk., 2014)

04

Page 22: EDUNEX ITB

EDUNEX ITB

Many to One: Text Classification

22https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow

05

Page 23: EDUNEX ITB

EDUNEX ITB

Many to Many: Sequence Tagging

https://www.depends-on-the-definition.com/guide-sequence-tagging-neural-networks-python/

Input is a sequence of words, and output is the sequence of POS tag for each word.

23

06

Page 24: EDUNEX ITB

EDUNEX ITB

Many to Many: Machine Translation

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

● Machine Translation: input is a sequence of words in source language (e.g. German). Output is a sequence of words in target language (e.g. English).

● A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.

24

07

Page 25: EDUNEX ITB

EDUNEX ITB

Implementing RNN on Keras: Many to One

from keras import Sequential

from keras.layers import SimpleRNN, Dense

model = Sequential()

model.add(SimpleRNN(10, input_shape=(50,1)))

#simple recurrent layer, 10 neurons & process 5

0x1 sequences

model.add(Dense(1,activation='linear')) #linear

output because this is a regression problem

https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f

Τ¦π‘₯ (1)

β„Ž (10)

𝑦 (1)

U

V

W

# predict amazon stock closing prices, RNN 50 timestep

08

Page 26: EDUNEX ITB

EDUNEX ITB

Number of Parameter

Τ¦π‘₯ (1)

β„Ž (10)

𝑦 (1)

U

V

W

Total parameter = (1+10+1)*10+(10+1)*1=131Simple RNN:U: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)

09

Page 27: EDUNEX ITB

EDUNEX ITB

Number of Parameter: Example 2model = Sequential() #initialize model

model.add(SimpleRNN(64, input_shape=(50,1), return_sequences=True))#64 neurons

model.add(SimpleRNN(32, return_sequences=True))#32 neurons

model.add(SimpleRNN(16)) #16 neurons

model.add(Dense(8,activation='tanh'))

model.add(Dense(1,activation='linear'))

Total parameter = 8257

= (1+64+1)*64=4224

= (64+32+1)*32=3104

= (32+16+1)*16=784= (16+1)*8=136= (8+1)*1=9

10

Page 28: EDUNEX ITB

EDUNEX ITB

Bidirectional RNNs

β€’ In many applications we want to output a prediction of y (t) which may depend on the whole input sequence. E.g. co-articulation in speech recognition, right neighbors in POS tagging, etc.

β€’ Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence. https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf

11

Page 29: EDUNEX ITB

EDUNEX ITB

Bidirectional RNNs for Information Extraction

https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/

29

12

Page 30: EDUNEX ITB

EDUNEX ITB

Summary

Architecture: 1-to-n, n-to-1,

n-to-n

Number of parameter

RNN

Bidirectional RNN

10

LSTM

Page 31: EDUNEX ITB

EDUNEX ITB

05 Backpropagation Through Time

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF – Teknik Informatika – STEI ITB

Modul 4: Recurrent Neural Network

01

Page 32: EDUNEX ITB

EDUNEX ITB

Backpropagation Through Time (BPTT)

Forward Passget sequence current

output

Backward Passcompute π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘, 𝛿π‘₯𝑑, βˆ†π‘œπ‘’π‘‘π‘‘βˆ’1, 𝛿U, 𝛿W, 𝛿b

Update Weights𝑀𝑛𝑒𝑀 = π‘€π‘œπ‘™π‘‘ βˆ’ . π›Ώπ‘€π‘œπ‘™π‘‘

BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network.

02

Page 33: EDUNEX ITB

EDUNEX ITB

Example

Τ¦π‘₯ (2)

β„Ž (1)

π‘ˆπ‘“ , π‘ˆπ‘–,

π‘ˆπ‘ , π‘ˆπ‘œ

π‘Šπ‘“ ,π‘Šπ‘–,

π‘Šπ‘ ,π‘Šπ‘œ

unfold

Τ¦π‘₯1 =12

[0.536]

Τ¦π‘₯2 =0.53

[0.772]

0.5 1.25

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

W0.100 0.800 0.150 0.250

03

0

Page 34: EDUNEX ITB

EDUNEX ITB

LSTM: Backward Propagation Timestep t

𝑓𝑑 = 𝜎(π‘ˆπ‘“π‘₯𝑑 +π‘Šπ‘“β„Žπ‘‘βˆ’1 + 𝑏f)

𝑖𝑑 = 𝜎(π‘ˆπ‘–π‘₯𝑑 +π‘Šπ‘–β„Žπ‘‘βˆ’1 + 𝑏i)

෩𝐢𝑑 = tanh(π‘ˆπ‘π‘₯𝑑 +π‘Šπ‘β„Žπ‘‘βˆ’1 + 𝑏c)

𝐢𝑑 = 𝑓𝑑 βŠ™πΆπ‘‘βˆ’1 + 𝑖𝑑 βŠ™ ෩𝐢𝑑

π‘œπ‘‘ = 𝜎 π‘ˆπ‘œπ‘₯𝑑 +π‘Šπ‘œβ„Žπ‘‘βˆ’1 + 𝑏o

β„Žπ‘‘ = π‘œπ‘‘ βŠ™ tanh(𝐢t)

π›Ώπ‘œπ‘’π‘‘π‘‘ = βˆ†π‘‘ + βˆ†π‘œπ‘’π‘‘π‘‘

𝛿𝐢𝑑 = π›Ώπ‘œπ‘’π‘‘π‘‘ βŠ™π‘œπ‘‘ βŠ™ 1βˆ’ π‘‘π‘Žπ‘›β„Ž2 𝐢𝑑 + 𝛿𝐢𝑑+1 βŠ™π‘“π‘‘+1

𝛿 ෩𝐢𝑑 = 𝛿𝐢𝑑 βŠ™ 𝑖𝑑 βŠ™ (1 βˆ’ ෩𝐢𝑑2)

𝛿𝑖𝑑 = 𝛿𝐢𝑑 βŠ™ ෩𝐢𝑑 βŠ™ 𝑖𝑑 βŠ™ (1 βˆ’ 𝑖𝑑)

𝛿𝑓𝑑 = 𝛿𝐢𝑑 βŠ™πΆπ‘‘βˆ’1 βŠ™π‘“π‘‘ βŠ™ 1βˆ’ 𝑓𝑑

π›Ώπ‘œπ‘‘ = π›Ώπ‘œπ‘’π‘‘π‘‘ βŠ™ tanh 𝐢𝑑 βŠ™π‘œπ‘‘ βŠ™ 1βˆ’ π‘œπ‘‘

𝛿π‘₯𝑑 = π‘ˆπ‘‡ . π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘

βˆ†π‘œπ‘’π‘‘π‘‘βˆ’1= π‘Šπ‘‡. π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘

04

Page 35: EDUNEX ITB

EDUNEX ITB

Computing π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘ for timestep t=2

Last timestep: βˆ†π‘œπ‘’π‘‘π‘‘= 0; 𝑓𝑑+1 = 0; 𝛿𝐢𝑑+1 = 0

t2: πœ•πΈ

πœ•π‘œ= 0.772 βˆ’ 1.25 = βˆ’0.478β†’ π›Ώπ‘œπ‘’π‘‘2 = βˆ’0.478 + 0 = βˆ’0.478

𝛿𝐢2 = βˆ’0.478 βˆ— 0.85 βˆ— 1 βˆ’ π‘‘π‘Žπ‘›β„Ž2 1.518 + 0 βˆ— 0 = βˆ’0.071

𝛿𝑓2 = βˆ’0.071 βˆ— 0.786 βˆ— 0.870 βˆ— 1 βˆ’ 0.870 = βˆ’0.006𝛿𝑖2 = βˆ’0.071 βˆ— 0.850 βˆ— 0.981 βˆ— 1 βˆ’ 0.981 = βˆ’0.001𝛿ΰ·ͺ𝐢2 = βˆ’0.071 βˆ— 0.981 βˆ— 1 βˆ’ 0.8502 = βˆ’0.019π›Ώπ‘œ2 = βˆ’0.478 βˆ— tanh 1.518 βˆ— 0.850 βˆ— 1 βˆ’ 0.850 = βˆ’0.055

𝛿𝑖𝑑 = 𝛿𝐢𝑑 βŠ™ ෩𝐢𝑑 βŠ™ 𝑖𝑑 βŠ™ (1 βˆ’ 𝑖𝑑)𝛿𝑓𝑑 = 𝛿𝐢𝑑 βŠ™πΆπ‘‘βˆ’1 βŠ™π‘“π‘‘ βŠ™ 1βˆ’ π‘“π‘‘π›Ώπ‘œπ‘‘ = π›Ώπ‘œπ‘’π‘‘π‘‘ βŠ™ tanh 𝐢𝑑 βŠ™π‘œπ‘‘ βŠ™ 1βˆ’ π‘œπ‘‘

π›Ώπ‘œπ‘’π‘‘π‘‘ = βˆ†π‘‘ + βˆ†π‘œπ‘’π‘‘π‘‘π›ΏπΆπ‘‘ = π›Ώπ‘œπ‘’π‘‘π‘‘ βŠ™π‘œπ‘‘ βŠ™ 1βˆ’ π‘‘π‘Žπ‘›β„Ž2 𝐢𝑑 + 𝛿𝐢𝑑+1 βŠ™π‘“π‘‘+1

𝛿 ෩𝐢𝑑 = 𝛿𝐢𝑑 βŠ™ 𝑖𝑑 βŠ™ (1 βˆ’ ෩𝐢𝑑2)

𝐸 =1

2(π‘‘π‘Žπ‘Ÿπ‘”π‘’π‘‘ βˆ’ β„Ž)2 βˆ†π‘‘=

πœ•πΈ

πœ•β„Ž= βˆ’(π‘‘π‘Žπ‘Ÿπ‘”π‘’π‘‘ βˆ’ β„Ž) = β„Ž βˆ’ π‘‘π‘Žπ‘Ÿπ‘”π‘’π‘‘

π›Ώπ‘”π‘Žπ‘‘π‘’π‘ 2 =

βˆ’0.006βˆ’0.001βˆ’0.019βˆ’0.055

05

Page 36: EDUNEX ITB

EDUNEX ITB

Computing 𝛿π‘₯2 and βˆ†π‘œπ‘’π‘‘1 for timestep t=2

𝛿π‘₯𝑑 = π‘ˆπ‘‡ . π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘βˆ†π‘œπ‘’π‘‘π‘‘βˆ’1= π‘Šπ‘‡ . π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

dgates-0.006-0.001-0.019-0.055

dx2-0.047-0.030

W0.100 0.800 0.150 0.250

dout1-0.018

06

Page 37: EDUNEX ITB

EDUNEX ITB

Computing for timestep t=1: βˆ†π‘œπ‘’π‘‘1= βˆ’0.018

π›Ώπ‘œπ‘’π‘‘1 = 0.036 βˆ’ 0.018 = 0.018𝛿𝐢1 = βˆ’0.053𝛿𝑓1 = 0𝛿𝑖1 = βˆ’0.0017𝛿ΰ·ͺ𝐢1 = βˆ’0.017π›Ώπ‘œ1 = 0.0018

𝛿π‘₯𝑑 = π‘ˆπ‘‡ . π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘βˆ†π‘œπ‘’π‘‘π‘‘βˆ’1= π‘Šπ‘‡ . π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

W0.100 0.800 0.150 0.250

dgates0.0000

-0.0017-0.01700.0018

dx1-0.0082-0.0049

dout0-0.0035

07

Page 38: EDUNEX ITB

EDUNEX ITB

Computing 𝛿U, 𝛿W, 𝛿b

π›Ώπ‘ˆ =

𝑑=1

2

π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘ . π‘₯𝑑 =

0.0βˆ’0.0017βˆ’0.01700.0018

1 2 +

βˆ’0.006βˆ’0.001βˆ’0.019βˆ’0.055

0.5 3 =

π›Ώπ‘Š = σ𝑑=12 π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘+1 . β„Žπ‘‘=

βˆ’0.006βˆ’0.001βˆ’0.019βˆ’0.055

[0.536]=

𝛿𝑏 =

𝑑=1

2

π›Ώπ‘”π‘Žπ‘‘π‘’π‘ π‘‘+1 =

dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626

dW-0.0034-0.0006-0.0104-0.0297

db-0.00631-0.00277-0.03641-0.05362

08

Page 39: EDUNEX ITB

EDUNEX ITB

Update Weights (=0.1) 𝑀𝑛𝑒𝑀 = π‘€π‘œπ‘™π‘‘ βˆ’ . π›Ώπ‘€π‘œπ‘™π‘‘

dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626

dW-0.0034-0.0006-0.0104-0.0297

db-0.00631-0.00277-0.03641

Unew0.7003 0.9502 0.4527 0.60260.4519 0.8007 0.2592 0.4163

Uold0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

Wold0.100 0.800 0.150 0.250

Wnew0.1003 0.8001 0.1510 0.2530

bnew0.1506 0.6503 0.2036 0.1054

bold0.1500 0.6500 0.2000 0.1000

09

Page 40: EDUNEX ITB

EDUNEX ITB

Truncated BPTT

https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent

41

Truncated BPTT was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network.

10

Page 41: EDUNEX ITB

EDUNEX ITB

Summary

Backpropagation through time for

LSTMTruncated BPTT

11

Page 42: EDUNEX ITB

EDUNEX ITB