edunex itb
TRANSCRIPT
EDUNEX ITB
EDUNEX ITB
IF4074 Minggu 9-15
β’ Minggu ke-9 (18 Oktober 2021): LSTM + RNN arsitektur + Tubes 2
β’ Minggu ke-10 (25 Oktober 2021): RNN Latihan + BPTT
β’ Minggu ke-11 (1 November 2021): Kuliah tamu (sharing Aplikasi ML di Gojek)
β’ Minggu ke-12 (8 November 2021): Praktikum RNN
β’ Minggu ke-13 (15 November 2021): Feature Engineering 1 / TugasDesain eksperimen
β’ Minggu ke-14 (22 November 2021): Kuis 2
β’ Minggu ke-15 (29 November 2021): Praktikum Feature Engineering 2
EDUNEX ITB
04 LSTM: What & Why
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF β Teknik Informatika β STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Long Short-Term Memory (LSTM): Why
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
βπ‘ = π(ππ₯π‘ +πβπ‘β1 + πxh)
π¦π‘ = π(πβπ‘ + πhy)
RNN: long-term dependency problem
U
W V
π₯π‘
βt-1
βt
Suffer from short-term memory (forward propagation).
Suffer from vanishing gradient problem (backward propagation).RNNs fail to learn greater than 5-10 time steps.In the worst case, this may completely stop the neural network from further training.
02
EDUNEX ITB
Long Short-Term Memory (LSTM): What
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs are explicitly designed to avoid the long-term dependency problem.
Introduced by Hochreiter & Schmidhuber (1997)
LSTM is special kind of RNN. The differences are the operations within the LSTMβs cells. RNN: repeating module have a very simple structure. LSTM: repeating module contains four interacting layers.
03
EDUNEX ITB
LSTM: Cell State & Gatesβ’ as the βmemoryβ of the network
β’ act as a transport highway that transfers relevant information throughout the processing of the sequence.
Cell State
β’ decides what information should be thrown away or kept.
β’ Values closer to 0 means to forget, and closer to 1 means to keep.Forget Gate
β’ Decides what information is relevant to add from the current step
β’ update the cell state by hidden state and current inputInput Gate
β’ decides what the next hidden state should be.
β’ Hidden state contains information on previous inputs. The hidden state is also used for predictions.
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
04
EDUNEX ITB
Forget Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πf)
Value 1 represents
βcompletely keep thisβ
while a 0 represents βcompletely get rid of this.β
π₯π‘
β t-1
πt-1
ππ‘
05
EDUNEX ITB
Input Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πi)
π₯π‘
β t-1
πt-1
ππ‘ ππ‘ Ηππ‘
ΰ·©πΆπ‘ = tanh(πππ₯π‘ +ππβπ‘β1 + πc)
06
EDUNEX ITB
Cell State
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
π₯π‘
β t-1
πt-1
ππ‘ ππ‘ Ηππ‘
πΆπ‘ = ππ‘ βπΆπ‘β1 + ππ‘ β ΰ·©πΆπ‘
07
EDUNEX ITB
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
π₯π‘
β t-1
πt-1
ππ‘ ππ‘ Ηππ‘
ππ‘ = π πππ₯π‘ +ππβπ‘β1 + πo
βπ‘ = ππ‘ β tanh(πΆt)
08
EDUNEX ITB
LSTM Forward Propagation: Example
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
Τ¦π₯ (2)
β (1)
ππ , ππ,
ππ , ππ
ππ ,ππ,
ππ ,ππ
A1 A2 Target
1 2 0.5
0.5 3 1
β¦
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πf)
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πi)
ΰ·©πΆπ‘ = tanh(πππ₯π‘ +ππβπ‘β1 + πc)
πΆπ‘ = ππ‘ βπΆπ‘β1 + ππ‘ β ΰ·©πΆπ‘
ππ‘ = π πππ₯π‘ +ππβπ‘β1 + πo
βπ‘ = ππ‘ β tanh(πΆt)
Uf0.700 0.450
Ui0.950 0.800
Uc0.450 0.250
Uo0.600 0.400
Wf bf0.100 0.150
Wi bi0.800 0.650
Wc bc0.150 0.200
Wo bo0.250 0.100
ht-1 Ct-10 0
09
EDUNEX ITB
Computing ht and ct : Timestep t1
t1=<12, 0.5> Uf.xt Wf.ht-1+bf net_ft ft
1.600 0.150 1.750 0.852
Ui.xt Wi.ht-1+bi net_it it2.550 0.650 3.200 0.961
Uc.xt Wc.ht-1+bc net_~ct ~ct0.950 0.200 1.150 0.818
Uo.xt Wo.ht-1+bo net_ot ot1.400 0.100 1.500 0.818
Ct ht0.786 0.536
ht-1 Ct-10 0
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πf)
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πi)
ΰ·©πΆπ‘ = tanh(πππ₯π‘ +ππβπ‘β1 + πc)
πΆπ‘ = ππ‘ βπΆπ‘β1 + ππ‘ β ΰ·©πΆπ‘
ππ‘ = π πππ₯π‘ +ππβπ‘β1 + πo
βπ‘ = ππ‘ β tanh(πΆt)
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
10
EDUNEX ITB
Computing ht and ct : Timestep t2
t2=<0.53
, 1>
ht-1 Ct-10.786 0.536
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πf)
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πi)
ΰ·©πΆπ‘ = tanh(πππ₯π‘ +ππβπ‘β1 + πc)
πΆπ‘ = ππ‘ βπΆπ‘β1 + ππ‘ β ΰ·©πΆπ‘
ππ‘ = π πππ₯π‘ +ππβπ‘β1 + πo
βπ‘ = ππ‘ β tanh(πΆt)
Uf.xt Wf.ht-1+bf net_ft ft1.700 0.204 1.904 0.870
Ui.xt Wi.ht-1+bi net_it it2.875 1.079 3.954 0.981
Uc.xt Wc.ht-1+bc net_~ct ~ct0.975 0.280 1.255 0.850
Uo.xt Wo.ht-1+bo net_ot ot1.500 0.234 1.734 0.850
Ct ht1.518 0.772
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
11
EDUNEX ITB
Implementing LSTM on Keras: Many to One
from keras import Sequential
from keras.layers import LSTM, Dense
model = Sequential()
model.add(LSTM(10, input_shape=(50,1)))
#10 neurons & process 50x1 sequences
model.add(Dense(1,activation='linearβ))
#linear output as regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
Τ¦π₯ (1)
β (10)
π¦ (1)
U
V
W
# predict amazon stock closing prices, LSTM 50 timestep
12
EDUNEX ITB
Number of Parameter
Τ¦π₯ (1)
β (10)
π¦ (1)
U
V
W
Total parameter = (1+10+1)*4*10+(10+1)*1=491
Simple RNN with equal networks: 131 parameterU: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
13
Total parameter for n unit lstm from m-dimension input to k dimension output = (m+n+1)*4*n+(n+1)*k
EDUNEX ITB
RNN β LSTM β GRU β ReGU
1985Recurrent nets
1997LSTMBi-RNN
2014GRU
2017Residual LSTM
2019Residual Gated Unit
GRU: no cell state, 2 gates
ReGU: shortcut connection
14
EDUNEX ITB
Summary
LSTMs avoid the long-term
dependency problem
LSTMs have a cell state and 3 gates
(forget, input, output)
Computing ht and ct
15
Backpropagation Through Time
EDUNEX ITB
03 RNN Architecture
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF β Teknik Informatika β STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
General Architecture
Τ¦π₯ (i)
β1 (π)
ββ (k)
π¦ (π)
Uxh1
Uh1hβ¦
V
β¦
Uhβ¦hh
Wh1
Whβ¦
Whh
Τ¦π₯1
β (π)
Τ¦π₯2
β (π)
Τ¦π₯π
β (π)β¦
n timestep
Return sequence = True/False
02
EDUNEX ITB
Architecture
fixed-sized input vector xt
fixed-sized output vector ot
RNN state st
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
One to many: image captioningMany to one: text classificationMany to many: machine translation, video frame classification, POS tagging
03
EDUNEX ITB
One to Many: Image Captioning
CNN Encoder (Inception) - RNN Decoder (LSTM) (Vinyals dkk., 2014)
04
EDUNEX ITB
Many to One: Text Classification
22https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow
05
EDUNEX ITB
Many to Many: Sequence Tagging
https://www.depends-on-the-definition.com/guide-sequence-tagging-neural-networks-python/
Input is a sequence of words, and output is the sequence of POS tag for each word.
23
06
EDUNEX ITB
Many to Many: Machine Translation
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
β Machine Translation: input is a sequence of words in source language (e.g. German). Output is a sequence of words in target language (e.g. English).
β A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.
24
07
EDUNEX ITB
Implementing RNN on Keras: Many to One
from keras import Sequential
from keras.layers import SimpleRNN, Dense
model = Sequential()
model.add(SimpleRNN(10, input_shape=(50,1)))
#simple recurrent layer, 10 neurons & process 5
0x1 sequences
model.add(Dense(1,activation='linear')) #linear
output because this is a regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
Τ¦π₯ (1)
β (10)
π¦ (1)
U
V
W
# predict amazon stock closing prices, RNN 50 timestep
08
EDUNEX ITB
Number of Parameter
Τ¦π₯ (1)
β (10)
π¦ (1)
U
V
W
Total parameter = (1+10+1)*10+(10+1)*1=131Simple RNN:U: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
09
EDUNEX ITB
Number of Parameter: Example 2model = Sequential() #initialize model
model.add(SimpleRNN(64, input_shape=(50,1), return_sequences=True))#64 neurons
model.add(SimpleRNN(32, return_sequences=True))#32 neurons
model.add(SimpleRNN(16)) #16 neurons
model.add(Dense(8,activation='tanh'))
model.add(Dense(1,activation='linear'))
Total parameter = 8257
= (1+64+1)*64=4224
= (64+32+1)*32=3104
= (32+16+1)*16=784= (16+1)*8=136= (8+1)*1=9
10
EDUNEX ITB
Bidirectional RNNs
β’ In many applications we want to output a prediction of y (t) which may depend on the whole input sequence. E.g. co-articulation in speech recognition, right neighbors in POS tagging, etc.
β’ Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence. https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf
11
EDUNEX ITB
Bidirectional RNNs for Information Extraction
https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/
29
12
EDUNEX ITB
Summary
Architecture: 1-to-n, n-to-1,
n-to-n
Number of parameter
RNN
Bidirectional RNN
10
LSTM
EDUNEX ITB
05 Backpropagation Through Time
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF β Teknik Informatika β STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Backpropagation Through Time (BPTT)
Forward Passget sequence current
output
Backward Passcompute πΏπππ‘ππ π‘, πΏπ₯π‘, βππ’π‘π‘β1, πΏU, πΏW, πΏb
Update Weightsπ€πππ€ = π€πππ β . πΏπ€πππ
BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network.
02
EDUNEX ITB
Example
Τ¦π₯ (2)
β (1)
ππ , ππ,
ππ , ππ
ππ ,ππ,
ππ ,ππ
unfold
Τ¦π₯1 =12
[0.536]
Τ¦π₯2 =0.53
[0.772]
0.5 1.25
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
03
0
EDUNEX ITB
LSTM: Backward Propagation Timestep t
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πf)
ππ‘ = π(πππ₯π‘ +ππβπ‘β1 + πi)
ΰ·©πΆπ‘ = tanh(πππ₯π‘ +ππβπ‘β1 + πc)
πΆπ‘ = ππ‘ βπΆπ‘β1 + ππ‘ β ΰ·©πΆπ‘
ππ‘ = π πππ₯π‘ +ππβπ‘β1 + πo
βπ‘ = ππ‘ β tanh(πΆt)
πΏππ’π‘π‘ = βπ‘ + βππ’π‘π‘
πΏπΆπ‘ = πΏππ’π‘π‘ βππ‘ β 1β π‘ππβ2 πΆπ‘ + πΏπΆπ‘+1 βππ‘+1
πΏ ΰ·©πΆπ‘ = πΏπΆπ‘ β ππ‘ β (1 β ΰ·©πΆπ‘2)
πΏππ‘ = πΏπΆπ‘ β ΰ·©πΆπ‘ β ππ‘ β (1 β ππ‘)
πΏππ‘ = πΏπΆπ‘ βπΆπ‘β1 βππ‘ β 1β ππ‘
πΏππ‘ = πΏππ’π‘π‘ β tanh πΆπ‘ βππ‘ β 1β ππ‘
πΏπ₯π‘ = ππ . πΏπππ‘ππ π‘
βππ’π‘π‘β1= ππ. πΏπππ‘ππ π‘
04
EDUNEX ITB
Computing πΏπππ‘ππ π‘ for timestep t=2
Last timestep: βππ’π‘π‘= 0; ππ‘+1 = 0; πΏπΆπ‘+1 = 0
t2: ππΈ
ππ= 0.772 β 1.25 = β0.478β πΏππ’π‘2 = β0.478 + 0 = β0.478
πΏπΆ2 = β0.478 β 0.85 β 1 β π‘ππβ2 1.518 + 0 β 0 = β0.071
πΏπ2 = β0.071 β 0.786 β 0.870 β 1 β 0.870 = β0.006πΏπ2 = β0.071 β 0.850 β 0.981 β 1 β 0.981 = β0.001πΏΰ·ͺπΆ2 = β0.071 β 0.981 β 1 β 0.8502 = β0.019πΏπ2 = β0.478 β tanh 1.518 β 0.850 β 1 β 0.850 = β0.055
πΏππ‘ = πΏπΆπ‘ β ΰ·©πΆπ‘ β ππ‘ β (1 β ππ‘)πΏππ‘ = πΏπΆπ‘ βπΆπ‘β1 βππ‘ β 1β ππ‘πΏππ‘ = πΏππ’π‘π‘ β tanh πΆπ‘ βππ‘ β 1β ππ‘
πΏππ’π‘π‘ = βπ‘ + βππ’π‘π‘πΏπΆπ‘ = πΏππ’π‘π‘ βππ‘ β 1β π‘ππβ2 πΆπ‘ + πΏπΆπ‘+1 βππ‘+1
πΏ ΰ·©πΆπ‘ = πΏπΆπ‘ β ππ‘ β (1 β ΰ·©πΆπ‘2)
πΈ =1
2(π‘πππππ‘ β β)2 βπ‘=
ππΈ
πβ= β(π‘πππππ‘ β β) = β β π‘πππππ‘
πΏπππ‘ππ 2 =
β0.006β0.001β0.019β0.055
05
EDUNEX ITB
Computing πΏπ₯2 and βππ’π‘1 for timestep t=2
πΏπ₯π‘ = ππ . πΏπππ‘ππ π‘βππ’π‘π‘β1= ππ . πΏπππ‘ππ π‘
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
dgates-0.006-0.001-0.019-0.055
dx2-0.047-0.030
W0.100 0.800 0.150 0.250
dout1-0.018
06
EDUNEX ITB
Computing for timestep t=1: βππ’π‘1= β0.018
πΏππ’π‘1 = 0.036 β 0.018 = 0.018πΏπΆ1 = β0.053πΏπ1 = 0πΏπ1 = β0.0017πΏΰ·ͺπΆ1 = β0.017πΏπ1 = 0.0018
πΏπ₯π‘ = ππ . πΏπππ‘ππ π‘βππ’π‘π‘β1= ππ . πΏπππ‘ππ π‘
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
dgates0.0000
-0.0017-0.01700.0018
dx1-0.0082-0.0049
dout0-0.0035
07
EDUNEX ITB
Computing πΏU, πΏW, πΏb
πΏπ =
π‘=1
2
πΏπππ‘ππ π‘ . π₯π‘ =
0.0β0.0017β0.01700.0018
1 2 +
β0.006β0.001β0.019β0.055
0.5 3 =
πΏπ = Οπ‘=12 πΏπππ‘ππ π‘+1 . βπ‘=
β0.006β0.001β0.019β0.055
[0.536]=
πΏπ =
π‘=1
2
πΏπππ‘ππ π‘+1 =
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641-0.05362
08
EDUNEX ITB
Update Weights (=0.1) π€πππ€ = π€πππ β . πΏπ€πππ
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641
Unew0.7003 0.9502 0.4527 0.60260.4519 0.8007 0.2592 0.4163
Uold0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
Wold0.100 0.800 0.150 0.250
Wnew0.1003 0.8001 0.1510 0.2530
bnew0.1506 0.6503 0.2036 0.1054
bold0.1500 0.6500 0.2000 0.1000
09
EDUNEX ITB
Truncated BPTT
https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent
41
Truncated BPTT was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network.
10
EDUNEX ITB
Summary
Backpropagation through time for
LSTMTruncated BPTT
11
EDUNEX ITB