deep learning for nlp
Post on 14-Jan-2017
115 Views
Preview:
TRANSCRIPT
@ODSC
Thomas Delteilhttps://www.linkedin.com/in/thomasdelteil
Miguel Fierro@miguelgfierro
https://miguelgfierro.com
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O p e r a t i o n a l i z a t i o nN L P w i t h C N NN L P
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Interaction between computers
and human language
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
NLP
Machine
translation OCR
Q&A
Sentiment
Analysis
Speech
Recognition
T2STopic
Modelling
Information
Retrieval
Natural
Language
Understanding
Document
Classification
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
£1.3T value of company
datasource: IDC, 2014
10%of organizations expect to
commercialise their data by 2020source: Gartner, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
8.4PBof information per second
as of 2020source: business2comunity, 2016
70%of companies
use customer feedbackSource: business2comunity, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
SpaghettiMilkEatingBroccoli
KittenPuppyHamster
Eating
TOP IC 1 TOP IC 2
… my favourite dish is
spaghetti …… the cute hamster is
eating broccoli…… I love kittens…
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Generative models joint distribution
source: https://en.wikipedia.org/wiki/Hidden_Markov_model
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Conditional models conditional distribution
source: John Lafferty, Andrew McCallum, Fernando C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
ICML, 2001.
ODSC 2016 London – Thomas Delteil #linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Bag of n-grams instead of bag of words
source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
N e e d s G P U s a n d l o t s
o f d a t aG r e a t p e r f o r m a n c eF e a t u r e g e n e r a t i o n
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
wait, wait, wait…
What makes deep learning
deep?
input hidden output
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input hidden hidden hidden output
…
…
…
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Maxima_and_minima
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input
hidden
output
hidden
hidden
ti ti+1 ti+2 ti+3
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
number of layers
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Long_short-term_memory
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Convolution Pooling PoolingConvolution Fully
connected
Fully
connected
Input image Output predictions
7
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Sharpening filter
Laplacian filter
Sobel x-axis filter
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Max pooling with 2x2 kernel and stride of 2x2
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input hidden output
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
SoftmaxReLUtanh
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
When I read some of the rules
for speaking the English
language correctly, I think any
fool can make a rule, and every
fool will mind it
Henry David Thoreau
?122 122 112 90 5 10 21
121 122 112 11 6 11 21
120 118 6 10 11 12 23
118 4 6 5 23 23 23
4 6 1 23 23 21 23
4 5 20 24 23 21 23
source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. ClassificationConvolutional Neural Networks for
Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O D S C - U K N L P
space 0 0 0 0 0 0 0 1 0 0 0
- 0 0 0 0 1 0 0 0 0 0 0
. 0 0 0 0 0 0 0 0 0 0 0
A 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 1 0 0 0 0 0 0 0
D 0 1 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 1 0 0 0 0
L 0 0 0 0 0 0 0 0 0 1 0
M 0 0 0 0 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 0 1 0 0
O 1 0 0 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 1
Q 0 0 0 0 0 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 0 0 0
S 0 0 1 0 0 0 0 0 0 0 0
T 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 1 0 0 0 0 0
V 0 0 0 0 0 0 0 0 0 0 0
W 0 0 0 0 0 0 0 0 0 0 0
X 0 0 0 0 0 0 0 0 0 0 0
Y 0 0 0 0 0 0 0 0 0 0 0
Z 0 0 0 0 0 0 0 0 0 0 0
One-hot encoding over a
vocabulary of characters.
Encoding:
Text = “ODSC-UK NLP”
Vocab: [ ‘ ‘, ‘-’, ‘.’, ‘A’, ‘B’, ‘C’, …, ‘Z’ ]
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
For images:
For text:
Humans to rephrase the examplesSynonyms
Similar semantic meaning
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O D S C - U K N L P … 1013
space 0 0 0 0 0 0 0 1 0 0 0 … …
- 0 0 0 0 1 0 0 0 0 0 0 … …
. 0 0 0 0 0 0 0 0 0 0 0 … …
A 0 0 0 0 0 0 0 0 0 0 0 … …
B 0 0 0 0 0 0 0 0 0 0 0 … …
C 0 0 0 1 0 0 0 0 0 0 0 … …
D 0 1 0 0 0 0 0 0 0 0 0 … …
E 0 0 0 0 0 0 0 0 0 0 0 … …
F 0 0 0 0 0 0 0 0 0 0 0 … …
G 0 0 0 0 0 0 0 0 0 0 0 … …
H 0 0 0 0 0 0 0 0 0 0 0 … …
I 0 0 0 0 0 0 0 0 0 0 0 … …
J 0 0 0 0 0 0 0 0 0 0 0 … …
K 0 0 0 0 0 0 1 0 0 0 0 … …
L 0 0 0 0 0 0 0 0 0 1 0 … …
M 0 0 0 0 0 0 0 0 0 0 0 … …
N 0 0 0 0 0 0 0 0 1 0 0 … …
O 1 0 0 0 0 0 0 0 0 0 0 … …
P 0 0 0 0 0 0 0 0 0 0 1 … …
Q 0 0 0 0 0 0 0 0 0 0 0 … …
R 0 0 0 0 0 0 0 0 0 0 0 … …
S 0 0 1 0 0 0 0 0 0 0 0 … …
T 0 0 0 0 0 0 0 0 0 0 0 … …
U 0 0 0 0 0 1 0 0 0 0 0 … …
V 0 0 0 0 0 0 0 0 0 0 0 … …
W 0 0 0 0 0 0 0 0 0 0 0 … …
X 0 0 0 0 0 0 0 0 0 0 0 … …
Y 0 0 0 0 0 0 0 0 0 0 0 … …
Z 0 0 0 0 0 0 0 0 0 0 0 … …
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 -0.4 … 3.1
… … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … -1
x 256
69x1014x1
1x1008x256
x 1008
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 -0.4 … 3.1
… … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … -1
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 0 … 3.1
… … … … … … … …
255 1.2 3.4 0 1.2 3.2 … 0
1x1008x256
1x1008x256
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … … … 1007
0 6.4 1.1 3.2 0.1 0 … … … 3.1
… … … … … … … … … …
255 1.2 3.4 0 1.2 3.2 … … … 0
0 1 … 335
0 6.4 0.1 … …
… … … … …
255 3.4 3.2
1x1008x256
1x336x256
x 336x 256
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 5 6 7 8 … 335
0 6.4 0.1 … … … … … … … … …
… … … … … … … … … … … …
255 3.4 3.2 … … … … … … … … …
0 1 2 3 4 5 6 … 329
0 -2.4 3.2 … … … … … … …
… … … … … … … … … …
255 … … … … … … … … …
1x330x256
1x336x256
x 256x 330
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1x330x256 <- after 2 convolution (7x1/1) and 1 max pooling (3x1/3)
1x110x256 <- 1 max-pooling (3x1/3)
3x102x256 <- 4 convolutions (3x1/1)
1x34x256 <- 1 max-pooling (3x1/3)
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 5 6 7 8 … 33
0 6.4 0.1 … … … … … … … … …
1 2.1 24.9 … … … … … … … … …
… … … … … … … … … … … …
255 … … … … … … … … … … 9.9
0
0 6.4
1 0.1
… …
35 2.1
36 24.9
… …
… …
… …
… …
8703 9.98704x1x1
1x34x256
x 256
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
8703 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
8704x1x1
0
…
k
1023
x 1024
1024x1x1
𝑓𝑘 𝑋 =
𝑖=0
8703
𝑤𝑘𝑖 ∗ 𝑥𝑖 + 𝑏𝑘
0
0 8.7
1 -2.1
… …
… …
… …
… …
… …
… …
… …
1023 32.1
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1024x1x1
0
…
k
1023
x 1024
1024x1x1
𝑓𝑘 𝑋 =
𝑖=0
8703
𝑤𝑘𝑖 ∗ 𝑥𝑖 + 𝑏𝑘
0
0 8.7
1 -2.1
… …
… …
… …
… …
… …
… …
… …
1023 32.1
ignored
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1024x1x1
0
…
N
x NNx1x1
0
0 2.7
1 0.1
… …
… …
N-1 12.5
ignored
Softmax
0
0 0.1
1 0.01
… …
… …
N-1 0.8
Nx1x1
𝜎 𝒛 𝑖 =𝑒𝑧𝑖
σ𝑗=0𝑁−1 𝑒
𝑧𝑗
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
• MXNet using python bindings
• Training on Azure N-Series, on Tesla K80 GPU
• 3 days of training on 2.5M example for sentiment polarity
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Amazon Review Polarity dataset (1.8M training, 200k testing):
- Crepe model + thesaurus augmentation: 95.07%
- TFIDF + n-grams: 91.64%
AG’s news corpus dataset (4 Classes 120kM training, 7.6k testing):
- Crepe model + thesaurus augmentation: 85.20%
- TFIDF + n-grams: 92.36%
CNN are no silver bullets, but they perform best on very large dataset
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun. Very Deep Convolutional Networks
for Natural Language Processing, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Sergey Ioffe and Christian Szegedy Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
6.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 16.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 1
6.4 3.2 5 10 21
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification. 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
NLP APIs from major cloud providers and market places
- Language detection
- Sentiment Analysis
- Topic detection
- Translation
- Content moderation
- Text to speech
- Speech to text
- Intent modelling
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
+Scalable
Managed
Pay per use pricing
Documentation and sample code
-Generic solutions
Limited customizability
Performance
Latency
Limited batch processing
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Single Machine
Training Data Testing Data
Sample Production
DataModel
Development
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Data pipeline ?
Retraining ?
Scalability ?
Real time / Batch scoring ?
Multiple team / frameworks ?
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Production
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Training
instance(s)
(GPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Training
Data
Serialized
model
Serialized
model
Training
instance(s)
(GPU)
Orchestration Layer (CI/CD / Job scheduling / Monitoring)
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
+Auto-scale and load balancing
Managed
Domain specific training data
Latency
-Pricing less flexible
Deployment pipeline to monitor
Performance
@ODSC
Thomas Delteilhttps://www.linkedin.com/in/thomasdelteil
Miguel Fierro@miguelgfierro
https://miguelgfierro.com
The code of this application is published at:
https://github.com/ilkarman/Bangalore_Senti
ment
Part of our code is based on:
https://github.com/zhangxiangxiao/Crepe
Attribution of some images:
• http://morguefile.com
• https://unsplash.com
• Ana Corrales Photography
• http://wikipedia.org
Amazon dataset citation:
• J. McAuley, C. Targett, J. Shi, A. van den
Hengel. Image-based recommendations
on styles and substitutes. SIGIR, 2015.
• J. McAuley, R. Pandey, J. Leskovec.
Inferring networks of substitutable and
complementary products. Knowledge
Discovery and Data Mining, 2015
Open Data Science Conference London,
8 & 9 October, 2016
© 2016 Microsoft Corporation. All right reserved
top related