![Page 1: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/1.jpg)
RNN & NLP Application
강원대학교 IT대학
이창기
![Page 2: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/2.jpg)
차례
• RNN
• Word Embedding
• NLP application
![Page 3: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/3.jpg)
Recurrent Neural Network
• “Recurrent” property dynamical system over time
![Page 4: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/4.jpg)
Bidirectional RNN
• Exploit future context as well as past
![Page 5: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/5.jpg)
Long Short-Term Memory RNN • Vanishing Gradient Problem for RNN
• LSTM can preserve gradient information
![Page 6: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/6.jpg)
LSTM Block Architecture
![Page 7: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/7.jpg)
Gated Recurrent Unit (GRU)
• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟
• 𝑧𝑡 = 𝜎 𝑊𝑥𝑥𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧
• ℎ 𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ
• ℎ𝑡 = 𝑧𝑡ℎ𝑡 + 1 − 𝑧𝑡 ℎ 𝑡
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
![Page 8: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/8.jpg)
LSTM RNN 응용 예
• Hand Writing by Machine
• Music Composition
• Neural Machine Translation
• Image Caption Generation CNN + LSTM RNN
![Page 9: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/9.jpg)
차례
• RNN
• Word Embedding
• NLP application
![Page 10: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/10.jpg)
Deep Belief Network [Hinton06]
• Key idea
– Pre-train layers with an unsupervised learning algorithm in phases
– Then, fine-tune the whole network by supervised learning
• DBN are stacks of Restricted Boltzmann Machines (RBM)
10
![Page 11: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/11.jpg)
Restricted Boltzmann Machine
• A Restricted Boltzmann machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs
• Major applications – Dimensionality reduction
– Topic modeling, …
11
![Page 12: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/12.jpg)
Training DBN: Pre-Training
• 1. Layer-wise greedy unsupervised pre-training – Train layers in phase from the bottom layer
12
![Page 13: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/13.jpg)
Training DBN: Fine-Tuning
• 2. Supervised fine-tuning for the classification task
13
![Page 14: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/14.jpg)
The Back-Propagation Algorithm
![Page 15: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/15.jpg)
Autoencoder
• Autoencoder is an NN whose desired output is the same as the input
– To learn a compressed representation (encoding) for a set of data.
– Find weight vectors A and B that minimize: Σi(yi-xi)
2
15 <겨울학교14 Deep Learning 자료 참고>
![Page 16: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/16.jpg)
Stacked Autoencoders
• After training, the hidden node extracts features from the input nodes
• Stacking autoencoders constructs a deep network
16 <겨울학교14 Deep Learning 자료 참고>
![Page 17: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/17.jpg)
텍스트의 표현 방식
• One-hot representation (or symbolic)
– Ex. [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
– Dimensionality • 50K (PTB) – 500K (big vocab) – 3M (Google 1T)
– Problem • Motel [0 0 0 0 0 0 0 0 1 0 0] AND
• Hotel [0 0 0 0 0 0 1 0 0 0 0] = 0
• Continuous representation
– Latent Semantic Analysis, Random projection
– Latent Dirichlet Allocation, HMM clustering
– Neural Word Embedding • Dense vector
• By adding supervision from other tasks improve the representation
![Page 18: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/18.jpg)
Neural Network Language Model (Bengio00,03)
Shared weights = Word embedding
• Idea – A word and its context is a
positive training sample
– A random word in that same context negative training sample
– Score(positive) > Score(neg.)
• Training complexity is high – Hidden layer output
– Softmax in the output layer
• Hierarchical softmax
• Negative sampling
• Ranking(hinge loss) Input Dim: 1 Dim: 2 Dim: 3 Dim: 4 Dim: 5
1 (boy) 0.01 0.2 -0.04 0.05 -0.3
2 (girl) 0.02 0.22 -0.05 0.04 -0.4
LT: |V|*d, Input(one hot): |V|*1 LTT I
![Page 19: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/19.jpg)
Ranking-based Model (Collobert)
본 연구 추가
Shared weights = Word embedding
w(t-2)
w(t-1)
w(t) or wc(t)
s or sc
Negative sampling
s > sc
![Page 20: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/20.jpg)
Word2Vec: CBOW, Skip-Gram
• Remove the hidden layer Speedup 1000x – Negative sampling
– Frequent word sampling
– Multi-thread (no lock)
• Continuous Bag-of-words (CBOW) – Predicts the current word given the con
text
• Skip-gram – Predicts the surrounding words given
the current word
– CBOW + DropOut/DropConnect
Shared weights
![Page 21: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/21.jpg)
한국어 Word Embedding: NNLM
• Data – 세종코퍼스 원문 + Korean Wiki abstract + News data
• 2억 8000만 형태소
– Vocab. size: 60,000
• 모든 형태소 대상 (기호, 숫자, 한자, 조사 포함)
• 숫자 정규화 + 형태소/POS: 정부/NNG, 00/SN
• NNLM model – Dimension: 50
– Matlab 구현
– 학습시간: 16일
![Page 22: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/22.jpg)
한국어 Word Embedding: NNLM
![Page 23: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/23.jpg)
한국어 Word Embedding: Word2Vec(CBOW)
• Data – 2012-2013 News + Korean Wiki abstract:
• 9GB raw text
• 29억 형태소
– Vocab. size: 100,000 • 모든 형태소 대상 (기호, 숫자, 한자, 조사 포함)
• 숫자 정규화 + 영어 소문자 + 형태소/POS
• 기존 Word2Vec(CBOW, Skip-Gram) – 모델: CBOW > SKIP-Gram
– 학습 시간: 24분 Shared weights
![Page 24: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/24.jpg)
한국어 Word Embedding: Word2Vec(CBOW)
![Page 25: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/25.jpg)
Word Embedding 응용: Word Analogy
King – Man + Woman ≈ Queen
http://deeplearner.fz-qqq.net/
![Page 26: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/26.jpg)
Bilingual Word Embedding
• Bilingual Word Embedding for PBMT (EMNLP13) – 중국어-영어 word aliment 이용
![Page 27: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/27.jpg)
한영 Bilingual Word Embedding
![Page 28: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/28.jpg)
한영 Bilingual Word Embedding – cont’d
![Page 29: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/29.jpg)
차례
• RNN
• Word Embedding
• NLP application
![Page 30: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/30.jpg)
Sequence Labeling – RNN, LSTM
Word embedding
Feature embedding
![Page 31: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/31.jpg)
FFNN(or CNN), CNN+CRF (SENNA)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
![Page 32: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/32.jpg)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
x(t-1) x(t ) x(t+1)
y(t+1) y(t-1) y(t )
RNN, CRF Recurrent CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
![Page 33: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/33.jpg)
x(t-1) x(t ) x(t+1)
y(t+1) y(t-1) y(t )
C(t) x(t ) h(t )
i (t )
f (t )
o(t )
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
LSTM RNN + CRF LSTM-CRF (KCC 15)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
![Page 34: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/34.jpg)
LSTM-CRF
• 𝑖𝑡 = 𝜎 𝑊𝑥𝑖𝑥𝑡 + 𝑊ℎ𝑖ℎ𝑡−1 + 𝑊𝑐𝑖𝑐𝑡−1 + 𝑏𝑖
• 𝑓𝑡 = 𝜎 𝑊𝑥𝑓𝑥𝑡 + 𝑊ℎ𝑓ℎ𝑡−1 + 𝑊𝑐𝑓𝑐𝑡−1 + 𝑏𝑓
• 𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖𝑡 ⊙ tanh 𝑊𝑥𝑐𝑥𝑡 + 𝑊ℎ𝑐ℎ𝑡−1 + 𝑏𝑐
• 𝑜𝑡 = 𝜎 𝑊𝑥𝑜𝑥𝑡 + 𝑊ℎ𝑜ℎ𝑡−1 + 𝑊𝑐𝑜𝑐𝑡 + 𝑏𝑜
• ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑐𝑡)
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦
• 𝑠 𝐱, 𝐲 = 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡𝑇𝑡=1
• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log exp(𝑠(𝐱, 𝐲′))𝐲′
![Page 35: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/35.jpg)
GRU+CRF
• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟
• 𝑧𝑡 = 𝜎 𝑊𝑥𝑧𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧
• ℎ 𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ
• ℎ𝑡 = 𝑧𝑡 ⊙ ℎ𝑡−1 + 𝟏 − 𝑧𝑡 ⊙ ℎ 𝑡
• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)
• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦
• 𝑠 𝐱, 𝐲 = 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡𝑇𝑡=1
• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log exp(𝑠(𝐱, 𝐲′))𝐲′
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
![Page 36: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/36.jpg)
Bi-LSTM CRF
• Bidirectional LSTM+CRF
• Bidirectional GRU+CRF
• Stacked Bi-LSTM+CRF …
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
bh(t-1) bh(t ) bh(t+1)
![Page 37: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/37.jpg)
Stacked LSTM CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
bh(t-1) bh(t ) bh(t+1)
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
h2(t-1) h2(t ) h2(t+1)
![Page 38: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/38.jpg)
LSTM CRF with Context words = CNN + LSTM CRF
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t-1) y(t )
x(t-2) x(t+2)
• Bi-LSTM CRF =~ LSTM CRF with Context > LSTM CRF
![Page 39: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/39.jpg)
영어 개체명 인식 (KCC 15, Journal submitted)
영어 개체명 인식 (CoNLL03 data set) F1(dev) F1(test)
SENNA (Collobert) - 89.59
Structural SVM (baseline + Word embedding feature) - 85.58
FFNN (Sigm + Dropout + Word embedding) 91.58 87.35
RNN (Sigm + Dropout + Word embedding) 91.83 88.09
LSTM RNN (Sigm + Dropout + Word embedding) 91.77 87.73
GRU RNN (Sigm + Dropout + Word embedding) 92.01 87.96
CNN+CRF (Sigm + Dropout + Word embedding) 93.09 88.69
RNN+CRF (Sigm + Dropout + Word embedding) 93.23 88.76
LSTM+CRF (Sigm + Dropout + Word embedding) 93.82 90.12
GRU+CRF (Sigm + Dropout + Word embedding) 93.67 89.98
![Page 40: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/40.jpg)
한국어 개체명 인식 (Journal submitted)
한국어 개체명 인식 (TV domain) F1(test)
Structural SVM (baseline) (basic + NE dic. + word cluster feature + morpheme feature)
89.03
FFNN (ReLU + Dropout + Word embedding) 87.70
RNN (Tanh + Dropout + Word embedding) 88.93
LSTM RNN (Tanh + Dropout + Word embedding) 89.38 (+0.35)
Bi-LSTM RNN (Tanh + Dropout + Word embedding) 89.21 (+0.18)
CNN+CRF (ReLU + Dropout + Word embedding) 90.06 (+1.03)
RNN+CRF (Sigm + Dropout + Word embedding) 90.52 (+1.49)
LSTM+CRF (Sigm + Dropout + Word embedding) 91.04 (+2.01)
GRU+CRF (Sigm + Dropout + Word embedding) 91.02 (+1.99)
![Page 41: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/41.jpg)
NLU 실험 (ATIS data)
NLU (Airplane Traveling Information System data) F1(dev) F1(test)
FFNN (Sigm + Dropout + Word embedding) 93.75 92.13
RNN (Sigm + Dropout + Word embedding) 98.05 94.78
LSTM RNN (Sigm + Dropout + Word embedding) 98.27 94.85
GRU RNN (Sigm + Dropout + Word embedding) 98.11 94.82
CNN4+CRF (Sigm + Dropout + Word embedding) 96.66 95.56
RNN+CRF (Sigm + Dropout + Word embedding) 98.42 96.19
LSTM+CRF (Sigm + Dropout + Word embedding) 98.79 96.48
![Page 42: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/42.jpg)
학습 속도
0
50
100
150
200
250
FFNN
RNN
LSTM RNN
GRU
SCRN
R-CRF
LSTM R-CRF
GRU-CRF
SCRN-CRF
![Page 43: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/43.jpg)
Neural Architectures for NER (Arxiv16)
• LSTM-CRF model + Char-based Word Representation – Char: Bi-LSTM RNN
![Page 44: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/44.jpg)
Neural Architectures for NER (Arxiv16)
![Page 45: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/45.jpg)
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)
• LSTM-CRF model + Char-level Representation – Char: CNN
![Page 46: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/46.jpg)
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)
![Page 47: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/47.jpg)
NER with Bidirectional LSTM-CNNs (Arxiv16)
![Page 48: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/48.jpg)
의미역 결정 (Semantic Role Labeling)
![Page 49: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/49.jpg)
Korean PropBank
• Virginia Corpus – 군대 도메인, 54,500 단어 도메인이 맞지 않음
• Newswire Corpus – 신문 도메인
(1994/6/2~2000/3/20), 131,800 단어
– 구구조 기반 태깅을 의존 구조로 변환 4,882 문장
• Frame file – 2,749 xml files
• 용언의 root (not stem)
• 용언화 가능 명사 (deverbal noun)
• Frame file 예제
– 덧붙.01
– English Define = add
– Role set • ARG0: adder
• ARG1: thing added
• ARG2: added to
– Mapping1 • Rel = 덧붙이다
• Src = sbj, Trg = ARG0
• Src = obj, Trg = ARG1
• Src = comp, Trg=ARG2
![Page 50: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/50.jpg)
한국어 의미역 결정 (SRL)
• 서술어 인식(PIC)
– 그는 르노가 3월말까지 인수제의 시한을 [갖고]갖.1 있다고 [덧붙였다]덧붙.1
• 논항 인식(AIC)
– 그는 [르노가]ARG0 [3월말까지]ARGM-TMP 인수제의 [시한을]ARG1 [갖고]갖.1 [있다고]AUX 덧붙였다
– [그는]ARG0 르노가 3월말까지 인수제의 시한을 갖고 [있다고]ARG1 [덧붙였다]
덧붙.1
의존 구문 분석
의미역 결정
![Page 51: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/51.jpg)
딥러닝 기반 한국어 의미역 결정 – 한글 및 한국어 정보처리/동계학술대회 15
• Bidirectional LSTM+CRF
• Korean Word embedding
– Predicate word, argument word
– NNLM
• Feature embedding
– POS, distance, direction
– Dependency path, LCA
• Bi-LSTM+CRF 성능 (AIC)
– F1: 78.2% (+1.2)
– Backward LSTM+CRF: F1 77.6%
– S-SVM 성능 (KCC14) • 기본자질: F1 74.3%
• 기본자질+word cluster: 77.0% – 정보과학회 논문지 2015.02
x(t-1) x(t ) x(t+1)
h(t-1) h(t ) h(t+1)
y(t+1) y(t+1) y(t )
bh(t-1) bh(t ) bh(t+1)
![Page 52: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/52.jpg)
Stacked Bi-LSTM CRF (한국어의미역결정) (정보과학회지 제출)
Syntactic information w/ w/o
Structural SVM
FFNN
Backward LSTM CRFs
Bidirectional LSTM CRFs
Stacked Bidirectional LSTM CRFs (2 layers)
Stacked Bidirectional LSTM CRFs (3 layers)
76.96
76.01
76.79
78.16
78.12
78.14
74.15
73.22
76.37
78.17
78.57
78.36
![Page 53: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/53.jpg)
CNN for Sentence Classification (Sentiment Analysis)
• Convolutional NN – Convolution Layer
• Sparse Connectivity
• Shared Weights
• Multiple feature maps
– Sub-sampling Layer • Average/max pooling
• NxN1
• NLP (Sentence Classification)에 적용 – ACL14
– EMNLP14
Multiple feature maps
![Page 54: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/54.jpg)
한국어 감성 분석 – CNN
• Mobile data – Train: 4543, Test: 500
• EMNLP14 모델(CNN) 적용 – Matlab으로 구현
– Word embedding • 한국어 10만 단어 + 도메인 특화 1420 단어
![Page 55: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/55.jpg)
한국어 감성 분석 – CNN 실험 Data set Model Accuracy
Mobile Train: 4543 Test: 500
SVM (word feature) 85.58
CNN(relu,kernel3,hid50)+Word embedding (word feature)
91.20
CNN(relu,kernel3,hid50)+Random init. 89.00
![Page 56: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/56.jpg)
LSTM RNN 기반 한국어 감성분석
• LSTM RNN-based encoding – Sentence embedding 입력
– Fully connected NN 출력
– GRU encoding 도 유사함
x(1) x(2 ) x(t)
h(1) h(2 ) h(t)
y
Data set Model Accuracy
Mobile Train: 4543 Test: 500
SVM (word feature) 85.58
CNN(relu,kernel3,hid50)+Word embedding (word feature)
91.20
GRU encoding + Fully connected NN 91.12
LSTM RNN encoding + Fully connected NN 90.93
![Page 57: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/57.jpg)
Attention-based LSTM RNN encoding 기반 한국어 감성분석
• Bi-LSTM RNN-based encoding + Attention Mechanism
Data set Model Accuracy
Mobile Train: 4543 Test: 500
SVM (word feature) 85.58
CNN(relu,kernel3,hid50)+Word embedding (word feature)
91.20
GRU encoding + Fully connected NN 91.12
Attention-based GRU encoding + Fully connected NN
90.73
y
![Page 58: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/58.jpg)
Attention Mechanism 결과
• 긍정 – 반응/nng/2 속도/nng/2 터치감/nng/2 만족/nng/2 하/xsa/19 고요/ec/25 </s>/44
– 카메라/nng/1 가/jks/2 깨끗이/mag/1 잘/mag/0 나오/vv/0 는/etm/1 듯/nnb/1 하/xsa/2 아/ec/1 참/mag/0 마음/nng/0 에/jkb/10 들/vv/12 었/ep/12 습니다/ef/12 ./sf/15 </s>/21
– 검정/nng/0 보다/jkb/0 화이트/nng/0 를/jko/1 선호/nng/1 하/xsv/3 는데/ec/3 다행/nng/2 이/jks/4 화이트/nng/2 가/jks/6 있/vv/4 어서/ec/5 디자인/nng/4 도/jx/8 맘/nng/8 에/jkb/10 들/vv/9 구요/ec/9 </s>/11
• 부정 – 화면/nng/10 이나/jc/12 해상도/nng/6 는/jx/12 않/nng/25 좋/va/1 아서/ec/8
</s>/23
– 화면/nng/5 자체/nng/9 가/jks/13 작/va/4 은데/ec/3 편하/va/1 겠/ep/7 냐/ec/16 </s>/38
– 메뉴/nng/5 무지/nng/0 느리/va/4 고요/ec/3 인터넷/nng/1 심심/xr/0 하/xsa/3 면/ec/4 렉먹/vv/7 고/ec/12 문자/nng/8 전송/nng/4 속도/nng/2 gg/sl/12 </s>/28
– 요금/nng/12 넘/vv/9 비싸/va/15 아/ec/19 </s>/43
![Page 59: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/59.jpg)
GRU encoding + Deep Output
• GRU encoding + Deep Output – Sentence embedding
입력
– Fully connected NN 출력
• 실험 결과 성능 개선 없음
x(1) x(2 ) x(t)
h(1) h(2 ) h(t)
y
h'(t)
![Page 60: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/60.jpg)
Attention-based GRU encoding + Deep Output
• Attention-based GRU encoding + Deep Output – Sentence embedding
입력
– Fully connected NN 출력
• 실험 결과 성능 개선 없음
h'
y
![Page 61: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/61.jpg)
Stacked GRU encoding + MLP
• Stacked GRU encoding – Sentence embedding
입력
– Fully connected NN 출력
• 실험 중 – 일부 태스크에서 좋은 성능
을 보임
x(1) x(2 ) x(t)
h(1) h(2 ) h(t)
y
h’t)
![Page 62: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/62.jpg)
Neural Machine Translation
T|S 777 항공편 은 3 시간 동안 지상 에 있 겠 습니다 . </s>
flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0
777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0
is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0
on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0
the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0
ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0
for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0
three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0
hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0
. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0
</s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3
![Page 63: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/63.jpg)
Recurrent NN Encoder–Decoder for Statistical Machine Translation (EMNLP14)
GRU RNN Encoding GRU RNN Decoding Vocab: 15,000 (src, tgt)
![Page 64: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/64.jpg)
Sequence to Sequence Learning with Neural Networks (NIPS14 – Google)
Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8-GPU machine)
![Page 65: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/65.jpg)
Neural MT by Jointly Learning to Align and Translate (ICLR15)
GRU RNN + Alignment Encoding GRU RNN Decoding Vocab: 30,000 (src, tgt) Train: 5 days
![Page 66: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/66.jpg)
NMT 모델의 장단점
• NMT의 장점 – Feature Engineering이 필요 없음 – End-to-end 방식의 단일 신경망 구조
• 단어 정렬, 번역 모델, 언어 모델 등이 필요 없음
– 디코더가 간단함 – 구문 분석 등이 필요 없음
• Pre-ordering, Syntax-based SMT
• NMT의 단점 – 출력 언어 단어 사전의 크기가 제한 됨
• 사전의 크기에 비례하여 학습/디코딩 시간이 필요 • 미등록어 처리가 필요함 • 최근 NMT 논문들은 이러한 미등록어 처리에 관한 연구가 대
부분임
![Page 67: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/67.jpg)
문자 단위의 NMT 제안 (WAT15, 한글 및 한국어15)
• 기존의 NMT: 단어 단위의 인코딩-디코딩
– 미등록어 후처리 or NMT 모델의 수정 등이 필요
• 문자 단위의 NMT
– 입력 언어는 단어 단위로 인코딩
• 입력 언어를 문자 단위로 인코딩 성능 하락
– 출력 언어는 문자 단위로 디코딩
• 단어 단위: その/UN 結果/NCA を/PS 詳細/NCD …
• 문자 단위: そ/B の/I 結/B 果/I を/B 詳/B 細/I …
• 문자 단위 NMT의 장점
– 모든 문자를 사전에 등록 미등록어 문제해결
– 학습 및 디코딩 속도 향상
– 기존 NMT 모델의 수정이 필요 없음
– 미등록어 후처리 작업이 필요 없음
![Page 68: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/68.jpg)
본 연구에서 확장한 NMT 모델
• Encoding (source language)
– Bidirectional GRU RNN 이용
– ℎ𝑡 = [ℎ𝑡 , ℎ𝑡].
• Decoding (target language)
– 𝑐 = ℎ0 (global context)
– 𝑐𝑡 = 𝑎𝑡𝑖ℎ𝑖 (local context)
– 𝑧𝑡 = 𝑠𝑖𝑔𝑚 𝑊𝑧𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑧𝑠𝑠𝑡−1 + 𝑏𝑧
– 𝑟𝑡 = 𝑠𝑖𝑔𝑚 𝑊𝑟𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑟𝑠𝑠𝑡−1 + 𝑏𝑟
– 𝑠 𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑠𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑠𝑠(𝑟𝑡𝑠𝑡−1) + 𝑊𝑠𝑐𝑐𝑡 + 𝑏𝑠
– 𝑠𝑡 = (1 − 𝑧𝑡)𝑠𝑡−1 + 𝑧𝑡𝑠 𝑡
– 𝒔′𝒕 = 𝒓𝒆𝒍𝒖 𝑾𝒔′𝒔𝒔𝒕 + 𝒃𝒚
– 𝑦𝑡 =
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑾𝒚𝒔′𝒔′𝒕 + 𝑊𝑦𝑠𝑠𝑡 + 𝑊𝑦𝑦𝐸𝑡(𝑦𝑡−1) + 𝑾𝒚𝒄𝒄 + 𝑏𝑦
S’ S’
yt-1 yt
c
![Page 69: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/69.jpg)
ASPEC J-to-E 실험
• ASPEC J-to-E data
• 성능 (Juman 이용 BLEU)
– PB SMT: 18.45
– HPB SMT: 18.72
– Tree-to-string SMT: 22.16
– NMT (Word-level decoding): 21.63
– NMT (Character-level decoding): 21.72
– NMT (Word+Character-level decoding): 23.76
![Page 70: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/70.jpg)
ASPEC E-to-J 실험 (WAT15)
• ASPEC E-to-J data
• 성능 (Juman 이용 BLEU)
– PB SMT: 27.48
– HPB SMT: 30.19
– Tree-to-string SMT: 32.63
– NMT (Word-level decoding): 29.78
– NMT (Character-level decoding): 33.14 (4위) • RIBES 0.8073 (2위)
– Tree-to-String + NMT(Character-level) re-ranking • BLEU 34.60 (2위)
• Human 53.25 (2위)
![Page 71: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/71.jpg)
JPO K-to-J 실험 (WAT15)
• JPO Patent K-to-J data
• 성능 (Juman 이용 BLEU)
– PB SMT: 69.22
– HPB SMT: 67.41
– NMT(Word-level): 61.52
– NMT(Character-level): 65.72
– PB SMT + NMT(Character-level) re-ranking • BLEU 71.38 (2위)
• Human 14.75 (1위)
![Page 72: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/72.jpg)
ASPEC E-to-J 실험 – 예제
• This/DT:0 paper/NN:1 explaines/NNS:2 experimental/JJ:3 result/NN:4 according/VBG:5 to/TO:6 the/DT:7 model/NN:8 ./.:9 </s>:10
• こ/B:0 の/I:1 モ/B:2 デ/I:3 ル/I:4 に/B:5 よ/B:6 る/I:7 実/B:8 験/I:9 結/B:10 果/I:11 を/B:12 説/B:13 明/I:14 し/B:15 た/B:16 。/B:17 </s>:18
![Page 73: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/73.jpg)
ASPEC J-to-E 실험 – 예제
• 又/CJ:0 メルト/NQC:1 フラクチャー/NCC:2 防止/NCS:3 策/NCC:4 に/PS:5 つい/VC:6 て/PJ:7 も/PC:8 述べ/VC:9 た/VX:10 </s>:11
• M/B:0 e/I:1 l/I:2 t/I:3 i/I:4 n/I:5 g/I:6 prevention/NN:7 measures/NNS:8 are/VBP:9 also/RB:10 described/VBN:11 ./.:12 </s>:13
![Page 74: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/74.jpg)
JPO K-to-J 실험 – 예제
• ○/ETC:0 :/ETC:1 필름/NOUN:2 면/NOUN:3 과의/JOSA:4 접착/NOUN:5 이/JOSA:6 없/NORMALVERB:7 다/EOMI:8 ./ETC:9 </s>:10
• ○/B:0 :/B:1 フ/B:2 ィ/I:3 ル/I:4 ム/I:5 面/B:6 と/B:7 の/B:8 接/B:9 着/I:10 が/B:11 な/B:12 い/I:13 。/B:14 </s>:15
![Page 75: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/75.jpg)
Input-feeding Approach (EMNLP15)
The attentional decisions are made independently, which is suboptimal.
In standard MT, a coverage set is often maintained during the translation process to keep track of which source words have been translated.
Effect: - We hope to make the model fully aware of
previous alignment choices - We create a very deep network spanning
both horizontally and vertically
![Page 76: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/76.jpg)
Copying Mechanism or CopyNet (ACL16)
![Page 77: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/77.jpg)
Abstractive Text Summarization (한글 및 한국어 16)
로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.
RNN_search+input_feeding+CopyNet
![Page 78: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/78.jpg)
End-to-End 한국어 형태소 분석 (동계학술대회16)
Attention + Input-feeding + Copying mechanism
![Page 79: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/79.jpg)
Sequence-to-sequence 기반 한국어 구구조 구문 분석 (한글 및 한국어 16)
입력 예시 1 43/SN 국/NNG <sp> 참가/NNG
입력 예시 2 4 3 <SN> 국 <NNG> <sp> 참 가 <NNG>
43/SN 국/NNG + 참가/NNG
NP NP
NP
(NP (NP 43/SN + 국/NNG) (NP 참가/NNG))
GRU
GRU
GRU
GRU
GRU
GRU
x1
h1t-1
h2t-1
yt-1
h1t
h2t
yt
x2 xT
ct
Attention + Input-feeding
입력
선 생 <NNG> 님 <XSN> 의 <JKG> <sp> 이 야 기 <NNG> <sp> 끝 나 <VV> 자 <EC> <sp> 마 치 <VV> 는 <ETM> <sp>
종 <NNG> 이 <JKS> <sp> 울 리 <VV> 었 <EP> 다 <EF> . <SF>
정답 (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S
(NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
RNN-search[7] (Beam size 10)
(S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
RNN-search + Input-feeding +
Dropout (Beam size 10)
(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )
![Page 80: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/80.jpg)
Sequence-to-sequence 기반 한국어 구구조 구문 분석 (한글 및 한국어 16)
모델 F1
스탠포드 구문분석기[13] 74.65
버클리 구문분석기[13] 78.74
형태소 + <sp> RNN-search[7] (Beam size 10)
87.34(baseline)
87.65*(+0.31)
형태소의 음절 + 품사태그 + <sp>
RNN-search[7] (Beam size 10) 87.69(+0.35)
88.00*(+0.66)
RNN-search + Input-feeding (Beam size 10) 88.23(+0.89)
88.68*(+1.34)
RNN-search + Input-feeding + Dropout (Beam size 10)
88.78(+1.44)
89.03*(+1.69)
![Page 81: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/81.jpg)
![Page 82: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/82.jpg)
Pointer Network (NIPS15) • Travelling Salesman Problem: NP-hard • Pointer Network can learn approximate solutions: O(n^2)
![Page 83: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/83.jpg)
상호참조해결을 위한 포인터 네트워크 모델 (Journal submitted)
• 입력: 단어(형태소) 열, 출발점(대명사, 한정사구(이 별자리 등))
– X = {A:0, B:1, C:2, D:3, <EOS>:4}, Start_Point=A:0
• 출력: 입력 단어 열의 위치(Pointer) 열 Entity
– Y = {A:0, C:2, D:3, <EOS>:4}
• 특징: End-to-end 방식의 대명사 상호참조 해결 (mention detection 과정 X)
A B C D <EOS> A C D <EOS>
Encoding Decoding
Hidden Layer
Projection Layer
Attention Layer
![Page 84: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/84.jpg)
포인터 네트워크를 이용한 한국어 의존 구문 분석 (동계학술대회16)
![Page 85: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/85.jpg)
Neural Responding Machine for Short-Text Conversation (ACL 15)
![Page 86: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/86.jpg)
Neural Responding Machine – cont’d
![Page 87: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/87.jpg)
실험 결과 (ACL 15)
![Page 88: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/88.jpg)
관련연구: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network
Models (arXiv 15) - HRED
![Page 89: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/89.jpg)
관련연구: Attention with Intention for a Neural Network Conversation Model (arXiv 15) - AWI
![Page 90: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/90.jpg)
관련연구: A Diversity-Promoting Objective Function for Neural Conversation Models
(arXiv 16) - MMI
![Page 91: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/91.jpg)
관련연구: A Persona-Based Neural Conversation Model (arXiv 16)
Speaker model + MMI
![Page 92: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/92.jpg)
강원대 딥러닝 기반 챗봇 모델
• Short-Text Conversation 가정
– 사용자의 현재 발화 정보만을 이용하여 대답을 생성
• GRU_encdec 모델
– Gate Recurrent Unit(GRU) 기반의 Encoder-Decoder 모델
• Stacked_GRU_encdec 모델
– 2 layer GRU_encdec 모델
• GRU_search 모델
– Attention 기반의 모델
• GRU_DO_search 모델
– Attention 기반의 모델 + Deep Output 모델
![Page 93: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/93.jpg)
Subtitle 데이터 실험
• 학습데이타: 영화 대사 데이터 100만 문장
• 모델: GRU + enc-dec • what is two plus two ?
– A) one , two , three .
• how old are you ?
– A) 18 years .
• are you a follower or a leader ?
– A) yes , i am .
• my name is david . what is my name ?
– A) david palmer .
• what is the purpose of being intelligent ?
– A) it is difficult to explain .
• what happens if machines can think ?
– A) if they find anything , they 'll kill us .
![Page 94: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/94.jpg)
고객센터 데이터 실험
• 학습데이타: 고객센터 데이터 18만 문장
• 모델: Stacked GRU + enc-dec • 세탁기가 전원이 안 들어와요?
– A) 네 아 그러세요
– A) 많이 불편하셨겠습니다
– A) 드럼이세요 아니면 일반 세탁기세요
• 언제 방문해 줄수 있나요?
– A) 언제 방문 드리면 편하실까요
– A) 언제 방문 드리면 편하시겠어요
• 출장비가 얼마죠?
– A) 출장 점검료는 15000원이구요
– A) 출장비가 만오천원이구요
![Page 95: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/95.jpg)
Short-Text Conversation (동계학술대회16)
- Data: 클리앙 ‘아무거나 질문 게시판’ - 77,346 질문-응답 쌍 - 학습:개발:평가 = 8:1:1
![Page 96: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/96.jpg)
이미지 캡션 생성 소개
• 이미지 내용 이해 이미지 내용을 설명하는 캡션 자동 생성 – 이미지 인식(이해) 기술 + 자연어처리(생성) 기술
• 활용 분야 – 이미지 검색
– 맹인들을 위한 사진 설명, 네비게이션
– 유아 교육, …
![Page 97: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/97.jpg)
기존 연구 • Multimodal RNN (M-RNN) [2]
Baidu CNN + vanilla RNN
CNN: VGGNet • Neural Image Caption
generator (NIC) [4] Google CNN + LSTM RNN
CNN: GoogLeNet
• Deep Visual-Semantic alignments (DeepVS) [5] Stanford University RCNN + Bi-RNN
alignment (training) CNN + vanilla RNN
CNN: AlexNet
![Page 98: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/98.jpg)
AlexNet, VGGNet
![Page 99: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/99.jpg)
RNN을 이용한 이미지 캡션 생성 (동계학술대회 15)
• CNN + RNN
– CNN: VGGNet 15 번째 layer (4096 차원)
– RNN: GRU (LSTM RNN의 변형) • Hidden layer unit: 500, 1000 (Best)
• Multimodal layer unit: 500, 1000 (Best)
– Word embedding • SENNA: 50차원 (Best)
• Word2Vec: 300 차원
– Data set • Flickr 8K : 8000 이미지 * 이미지 캡션 5문장
– 6000 학습, 1000 검증, 1000 평가
• Flickr 30K : 31783 이미지 * 이미지 캡션 5문장
– 29000 학습, 1014 검증, 1000 평가
– 4가지 모델 실험 • GRU-DO1, GRU-DO2, GRU-DO3, GRU-DO4
![Page 100: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/100.jpg)
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
• GRU-DO1 • GRU-DO2
• GRU-DO3 • GRU-DO4
![Page 101: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/101.jpg)
RNN을 이용한 이미지 캡션 생성 (동계학술대회 15)
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
Flickr 30K B-1 B-2 B-3 B-4
m-RNN (Baidu)[2] 60.0 41.2 27.8 18.7
DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7
NIC (Google)[4] 66.3 42.3 27.7 18.3
Ours-GRU-DO1 63.01 43.60 29.74 20.14
Ours-GRU-DO2 63.24 44.25 30.45 20.58
Ours-GRU-DO3 62.19 43.23 29.50 19.91
Ours-GRU-DO4 63.03 43.94 30.13 20.21
Flickr 8K B-1 B-2 B-3 B-4
m-RNN (Baidu)[2] 56.5 38.6 25.6 17.0
DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0
NIC (Google)[4] 63.0 41.0 27.0 -
Ours-GRU-DO1 63.12 44.27 29.82 19.34
Ours-GRU-DO2 61.89 43.86 29.99 19.85
Ours-GRU-DO3 62.63 44.16 30.03 19.83
Ours-GRU-DO4 63.14 45.14 31.09 20.94
![Page 102: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/102.jpg)
Flickr30k 실험 결과
A black and white dog is jumping in the grass
A group of people in the snow
Two men are working on a roof
![Page 103: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/103.jpg)
신규 데이터
A large clock tower in front of a building
A man in a field throwing a frisbee
A little boy holding a white frisbee
A man and a woman are playing with a sheep
![Page 104: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/104.jpg)
한국어 이미지 캡션 생성
한 어린 소녀가 풀로 덮인 들판에 서 있다
건물 앞에 서 있는 한 남자
분홍색 개를 데리고 있는 한 여자와 한 여자
구명조끼를 입은 한 작은 소녀가 웃고 있다
GRU
Embedding
CNNMultimodal
Softmax
Wt
Wt+1
Image
![Page 105: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/105.jpg)
Residual Network + 한국어 이미지 캡션 생성 (동계학술대회16)
![Page 106: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/106.jpg)
Visual Question Answering
Facebook: Visual Q&A
![Page 107: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/107.jpg)
Toronto COCO-QA dataset (ArXiv 15)
• QA dataset 자동 생성
– MS COCO (image caption) data set 이용
– Image caption으로부터 parser 등을 이용하여 질문과 정답을 자동 생성 • 정답은 single word라고 가정함 (classification problem)
• Object, Number, Color, Location types의 질문 생성 – Object: 70%, Color: 17%, Number: 7%, Location: 6%
– 123,287 images, 78,736 train questions, 38,948 questions
– Ex. What is there in front of the sofa? • Answer: table
![Page 108: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/108.jpg)
VIS+LSTM model (Arxiv 15)
![Page 109: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/109.jpg)
DPPnet (포항공대)
![Page 110: RNN & NLP Application - 강원대학교 컴퓨터과학전공cs.kangwon.ac.kr/~leeck/ML/RNN_NLP.pdf · · 2016-11-01Long Short-Term Memory RNN • Vanishing Gradient Problem for](https://reader033.vdocuments.mx/reader033/viewer/2022042708/5acd6f107f8b9aa1518d7773/html5/thumbnails/110.jpg)
Stacked Attention Networks for Image QA (ArXiv 16)