en. 601.467/667 introductionto human language technology ... · •brief introduction of deep...
TRANSCRIPT
![Page 1: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/1.jpg)
EN. 601.467/667 Introduction to Human Language TechnologyDeep Learning I
Shinji Watanabe
1
![Page 2: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/2.jpg)
Today’s agenda
• Big impact in HLT technologies including speech recognition• Before/after deep neural network
• Brief introduction of deep neural network• Why deep neural network after 2010?• Success in the other HLT areas (image and text processing)
2
![Page 3: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/3.jpg)
Today’s agenda
• Big impact in HLT technologies including speech recognition• Before/after deep neural network
• Brief introduction of deep neural network• Why deep neural network after 2010?• Success in the other HLT areas (image and text processing)
No math, no theory, based on my personal experience
3
![Page 4: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/4.jpg)
Short bio
• Research interests• Automatic speech recognition (ASR), speech enhancement, application of
machine learning to speech processing
• Around 20 years of ASR experience since 2001
![Page 5: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/5.jpg)
Automatic speech recognition?
5
![Page 6: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/6.jpg)
Speech recognition evaluation metric
• Word error rate (WER)• Using edit distance word-by-word:
• # inser(on errors = 1, # subs(tu(on errors = 2, # of dele(on errors = 0➡ Edit distance = 3• Word error rate (%): Edit distance (=3) / # reference words (=9) * 100 = 33.3%• How to compute WERs for languages that do not have word boundaries?
• Chunking or using character error rate
Reference) I want to go to the Johns Hopkins campusRecognition result) I want to go to the 10 top kids campus
![Page 7: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/7.jpg)
2001: when I started speech recognition….
1%
10%
100%
2000 201020051995 2015
(Pallett’03, Saon’15, Xiong’16)
Switchboard task (Telephone conversation speech)
Wor
d er
ror r
ate
(WER
)
2016
![Page 8: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/8.jpg)
Really bad age….
• No applicaHon• No breakthrough technologies• Everyone outside speech research criHcized it…• General people don’t know “what is speech recogniHon”
![Page 9: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/9.jpg)
![Page 10: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/10.jpg)
2001: when I started speech recognition….
1%
10%
100%
2000 201020051995 2015
(Pallett’03, Saon’15, Xiong’16)
Switchboard task (Telephone conversation speech)
Wor
d er
ror r
ate
(WER
)
2016
![Page 11: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/11.jpg)
Now we are at
1%
10%
100%
2000 201020051995 2015
Deep learning
(Palle>’03, Saon’15, Xiong’16)
Switchboard task (Telephone conversation speech)
Wor
d er
ror r
ate
(WER
)
2016
5.9%
![Page 12: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/12.jpg)
Everything was changed
• No application• No breakthrough technologies• Everyone outside speech research criticized it…• General people don’t know “what is speech recognition”
![Page 13: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/13.jpg)
Everything was changed
• No application voice search, smart speakers• No breakthrough technologies• Everyone outside speech research criticized it…• General people don’t know “what is speech recognition”
![Page 14: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/14.jpg)
Everything was changed
• No application voice search, smart speakers• No breakthrough technologies deep neural network• Everyone outside speech research criticized it…• General people don’t know “what is speech recognition”
![Page 15: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/15.jpg)
Everything was changed
• No application voice search, smart speakers• No breakthrough technologies deep neural network• Everyone outside speech research criticized it… many people outside
speech research know/respect it• General people don’t know “what is speech recognition”
![Page 16: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/16.jpg)
Everything was changed
• No application voice search, smart speakers• No breakthrough technologies deep neural network• Everyone outside speech research criticized it… many people outside
speech research know/respect it• General people don’t know “what is speech recognition” now my kids
know what I’m doing
![Page 17: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/17.jpg)
![Page 18: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/18.jpg)
Today’s agenda
• Big impact in HLT technologies including speech recognition• Before/after deep neural network
• Brief introduction of deep neural network• Why deep neural network after 2010?• Success in the other HLT areas (image and text processing)
18
![Page 19: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/19.jpg)
Supervised training
• Classification• Phoneme or word recognition given speech• Word prediction given history (language modeling)• Image classification
• Binary classification• Special case of classification problems (only two cases)• Segmentation
• Regression• Predict a clean speech spectrogram from a noisy speech spectrogram
19
Please list an HLT related example of these problems
![Page 20: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/20.jpg)
Supervised training
• ClassificaHon (mostly in HLT applicaHons)• Phoneme or word recogniEon given speech• Word predicEon given history (language modeling)• Image classificaEon
• Binary classificaHon• Special case of classificaEon problems (only two cases)• SegmentaEon
• Regression• Predict a clean speech spectrogram from a noisy speech spectrogram
20
![Page 21: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/21.jpg)
Learning based (data-driven) HLT techniques
21
![Page 22: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/22.jpg)
Speech recognition as a classification problem
Yes
No
one
four
22
Google speech command database https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html
• Use probability distribution 𝑝(𝑌|𝑋), and pick up the most possible word
• Note that this is a very simple problem. The real speech recognition needs to consider a sequential problem
![Page 23: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/23.jpg)
Language model as a classification problem
23
• “I want to go to my office”: correct sentence
• “I want to go to my office”: language modeling is to predict office given history
• Classification problem• 𝑋: “I want to go to my”• 𝑌: “office”• Use probability distribution 𝑝(𝑌|𝑋), and pick up the most possible word
• 𝑃 𝑌 𝑋 is obtained by a deep neural network
![Page 24: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/24.jpg)
Binary classificaNon
• Two category classification problems
• Paper submission
24
![Page 25: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/25.jpg)
Binary classification (we will have more discussion later)
25from h>p://cs.jhu.edu/~kevinduh/a/deep2014/140114-ResearchSeminar.pdf
• We can use a linear classifier• ○: 𝑎!𝑜! + 𝑎"𝑜" + 𝑏 > 0• ×: 𝑎!𝑜! + 𝑎"𝑜" + 𝑏 < 0
• We can also make a probability with the sigmoid function 𝜎• 𝑝 ○ 𝑜!, 𝑜" = 𝜎 𝑎!𝑜! + 𝑎"𝑜" + 𝑏• 𝑝 × 𝑜!, 𝑜" = 1 − 𝜎 𝑎!𝑜! + 𝑎"𝑜" + 𝑏
• Sigmoid function
𝜎 𝑥 =1
1 + 𝑒34𝑜! = −
𝑎"𝑎!𝑜" −
𝑏𝑎!
𝑎!: scaling
![Page 26: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/26.jpg)
Binary classification
30
• We can use a GMM (although not suitable) to model the data distribution• 𝑝 𝑜!, 𝑜" ○ = ∑#𝜔# 𝑁(𝒐|𝝁#, 𝚺#)• 𝑝 𝑜!, 𝑜" × = ∑#𝜔$# 𝑁 𝒐 𝝁$#, 𝚺
$#
• We can build a classifier from the Bayes theorem (speech recognition before 2010)
• 𝑝 ○ 𝑜!, 𝑜" =% 𝑜!, 𝑜" ○
% 𝑜!, 𝑜" ○ &% 𝑜!, 𝑜" ×
• 𝑝 × 𝑜!, 𝑜" =% 𝑜!, 𝑜" ×
% 𝑜!, 𝑜" ○ &% 𝑜!, 𝑜" ×
![Page 27: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/27.jpg)
GePng more difficult with the GMM classifier or linear classifier
31
![Page 28: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/28.jpg)
Getting more difficult with the GMM classifier or linear classifier
32
![Page 29: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/29.jpg)
Neural network
33
• Combination of linear classifiers to classify complicated patterns• More layers, more complicated
patterns
![Page 30: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/30.jpg)
Neural network
34
• Combination of linear classifiers to classify complicated patterns• More layers or more classifiers,
more complicated patterns
![Page 31: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/31.jpg)
Going deeper -> more accurate
35
![Page 32: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/32.jpg)
Neural network used in speech recognition
• Very large combination of linear classifiers
36
Output HMM state or phoneme
Speech features~50 to ~1000 dim
~7hi
dden
laye
rs,
2048
uni
ts
30 ~ 10,000 unitsa i u w N
![Page 33: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/33.jpg)
Difficulties of training
• Blue: accuracy of training data (higher is bePer)• Orange: accuracy of validaEon data (higher is bePer)
37
Which one is better?
![Page 34: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/34.jpg)
Today’s agenda
• Big impact in HLT technologies including speech recognition• Before/after deep neural network
• Brief introduction of deep neural network• Why deep neural network after 2010?• Success in the other HLT areas (image and text processing)
38
![Page 35: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/35.jpg)
Why neural network was not focused
1. Very difficult to train• Batch? On-line? Mini-batch?• Stochastic gradient decent
• Learning rate? Scheduling?• What kind of topologies?• Large computational cost
2. The amount of training data is very critical3. CPU -> GPU
39
Speech features~50 to ~1000 dim
~7hi
dden
laye
rs,
2048
uni
ts
30 ~ 10,000 unitsa i u w N
![Page 36: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/36.jpg)
Before deep learning (2002 – 2009)
• Success of neural networks was very old period• People believed that GMM was better• But very small gain from standard GMMs
40
![Page 38: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/38.jpg)
When I noticed deep learning (2010)
• A. Mohamed, G. E. Dahl, and G. E. Hinton, “Deep belief networks for phone recognition,” in NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.
• This still did not fully convince me (I introduced it at NTT’s reading group)
42
• Using deep belief network as pre-training
• Fine-tuning deep neural network→ Provides stable estimation
![Page 39: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/39.jpg)
Pre-training and fine-tuning
• First train neural network like parameters with deep belief network or autoencoder• Then, using deep neural
network training
43
Speech features~50 to ~1000 dim
~7hi
dden
laye
rs,
2048
uni
ts
30 ~ 10,000 unitsa i u w N
![Page 40: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/40.jpg)
Interspeech 2011 at Florence
• The following three papers convinced me• Feature extraction: Valente, Fabio / Magimai-Doss, Mathew / Wang, Wen
(2011): "Analysis and comparison of recent MLP features for LVCSR systems", In INTERSPEECH-2011, 1245-1248.• Acoustic model: Seide, Frank / Li, Gang / Yu, Dong (2011): "Conversational
speech transcription using context-dependent deep neural networks", In INTERSPEECH-2011, 437-440.• Language model: Mikolov, Tomáš / Deoras, Anoop / Kombrink, Stefan /
Burget, Lukáš / Černocký, Jan (2011): "Empirical evaluation and combination of advanced language modeling techniques", In INTERSPEECH-2011, 605-608.
• I discussed this potential to my NLP folks in NTT but they did not believe it (SVM, log linear model)
44
![Page 41: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/41.jpg)
Late 2012
• My first deep learning (Kaldi nnet)• Kaldi started to support DNN since 2012 (mainly developed by Karel Vesely)
• Deep belief network based pre-training• Feed forward neural network• Sequence-discriminative training
45
Hub5 ‘00 (SWB) WSJ
GMM 18.6 5.6
DNN 14.2 3.6
DNN with sequence-discriminative training
12.6 3.2
![Page 42: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/42.jpg)
46
![Page 43: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/43.jpg)
Build speech recognition with public tools and resources • TED-LIUM (~100 hours)• LIBRISPEECH (~1000 hours)
• We can build Kald+DNN+TED-LIUM to make a English speech recognition system by using one machine (GPU + many core machines)• Before this, it’s only realized by a big company.
47
![Page 44: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/44.jpg)
Same things happened in computer vision
48
![Page 46: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/46.jpg)
ImageNet challenge (Large scale data)
50L. Fei-Fei and O. Russakovsky, Analysis of Large-Scale Visual Recognition, Bay Area Vision Meeting, October, 2013
![Page 47: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/47.jpg)
ImageNet challengeAlexNet, GoogLeNet, VGG, ResNet, …
51
Erro
r rat
e
Deep learning!
![Page 48: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/48.jpg)
Same things happened in text processing
• Recurrent neural network language model (RNNLM) [Mikolov+ (2010)]
52
w0
w1
w1
w2
w2
w3
w3
w4
w4
w5
w5
w6
y2y1 y3 y4 y5
”I” “want” “to” “go” “to” “my
“<s>” ”I” “want” “to” “go” “to”
Perplexity
N-gram (conventional)
336
RNNLM 156
![Page 49: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/49.jpg)
Word embedding examplehttps://www.tensorflow.org/tutorials/word2vec
53
![Page 50: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/50.jpg)
Neural machine translation (New York Times, December 2016)
54
![Page 51: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/51.jpg)
55from https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
![Page 52: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/52.jpg)
Deep neural network toolkit (2013-)
• Theano• Caffe• Torch• CNTK• Chainer• Keras
Later• Tensorflow• PyTorch (used in this course)• MxNet• Etc.
56
![Page 53: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/53.jpg)
Summary
• Before 2011• GMM/HMM, limitation of the performance, bit boring• This is because they are linear models…
• After 2011• DNN/HMM• Toolkit• Public large data• GPU• NLP, image/vision were also moved to DNN• Always something exciting
57
![Page 54: EN. 601.467/667 Introductionto Human Language Technology ... · •Brief introduction of deep neural network •Why deep neural network after 2010? •Success in the other HLT areas](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609dc71595dfcc3add3c1c37/html5/thumbnails/54.jpg)
Any questions?
58