deep learning for nlp

58
@ODSC Thomas Delteil https://www.linkedin.com/in/thomasdelteil Miguel Fierro @miguelgfierro https://miguelgfierro.com

Upload: miguel-gonzalez-fierro

Post on 14-Jan-2017

115 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Deep Learning for NLP

@ODSC

Thomas Delteilhttps://www.linkedin.com/in/thomasdelteil

Miguel Fierro@miguelgfierro

https://miguelgfierro.com

Page 2: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

O p e r a t i o n a l i z a t i o nN L P w i t h C N NN L P

Page 3: Deep Learning for NLP
Page 4: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Interaction between computers

and human language

Page 5: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

NLP

Machine

translation OCR

Q&A

Sentiment

Analysis

Speech

Recognition

T2STopic

Modelling

Information

Retrieval

Natural

Language

Understanding

Document

Classification

Page 6: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

£1.3T value of company

datasource: IDC, 2014

10%of organizations expect to

commercialise their data by 2020source: Gartner, 2016

Page 7: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

8.4PBof information per second

as of 2020source: business2comunity, 2016

70%of companies

use customer feedbackSource: business2comunity, 2016

Page 8: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

SpaghettiMilkEatingBroccoli

KittenPuppyHamster

Eating

TOP IC 1 TOP IC 2

… my favourite dish is

spaghetti …… the cute hamster is

eating broccoli…… I love kittens…

Page 9: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Generative models joint distribution

source: https://en.wikipedia.org/wiki/Hidden_Markov_model

Page 10: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Conditional models conditional distribution

source: John Lafferty, Andrew McCallum, Fernando C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.

ICML, 2001.

Page 11: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil #linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Page 12: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Bag of n-grams instead of bag of words

source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification, 2016

Page 13: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

N e e d s G P U s a n d l o t s

o f d a t aG r e a t p e r f o r m a n c eF e a t u r e g e n e r a t i o n

Page 14: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

wait, wait, wait…

What makes deep learning

deep?

input hidden output

Page 15: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

input hidden hidden hidden output

Page 16: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996

Page 17: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: https://en.wikipedia.org/wiki/Maxima_and_minima

Page 18: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

input

hidden

output

hidden

hidden

ti ti+1 ti+2 ti+3

Page 19: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

number of layers

Page 20: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: https://en.wikipedia.org/wiki/Long_short-term_memory

Page 21: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Convolution Pooling PoolingConvolution Fully

connected

Fully

connected

Input image Output predictions

7

Page 22: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Page 23: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Sharpening filter

Laplacian filter

Sobel x-axis filter

Page 24: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Max pooling with 2x2 kernel and stride of 2x2

Page 25: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

input hidden output

Page 26: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

SoftmaxReLUtanh

Page 27: Deep Learning for NLP
Page 28: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

When I read some of the rules

for speaking the English

language correctly, I think any

fool can make a rule, and every

fool will mind it

Henry David Thoreau

?122 122 112 90 5 10 21

121 122 112 11 6 11 21

120 118 6 10 11 12 23

118 4 6 5 23 23 23

4 6 1 23 23 21 23

4 5 20 24 23 21 23

source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. ClassificationConvolutional Neural Networks for

Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014

Page 29: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

O D S C - U K N L P

space 0 0 0 0 0 0 0 1 0 0 0

- 0 0 0 0 1 0 0 0 0 0 0

. 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0 0 0 0

B 0 0 0 0 0 0 0 0 0 0 0

C 0 0 0 1 0 0 0 0 0 0 0

D 0 1 0 0 0 0 0 0 0 0 0

E 0 0 0 0 0 0 0 0 0 0 0

F 0 0 0 0 0 0 0 0 0 0 0

G 0 0 0 0 0 0 0 0 0 0 0

H 0 0 0 0 0 0 0 0 0 0 0

I 0 0 0 0 0 0 0 0 0 0 0

J 0 0 0 0 0 0 0 0 0 0 0

K 0 0 0 0 0 0 1 0 0 0 0

L 0 0 0 0 0 0 0 0 0 1 0

M 0 0 0 0 0 0 0 0 0 0 0

N 0 0 0 0 0 0 0 0 1 0 0

O 1 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 1

Q 0 0 0 0 0 0 0 0 0 0 0

R 0 0 0 0 0 0 0 0 0 0 0

S 0 0 1 0 0 0 0 0 0 0 0

T 0 0 0 0 0 0 0 0 0 0 0

U 0 0 0 0 0 1 0 0 0 0 0

V 0 0 0 0 0 0 0 0 0 0 0

W 0 0 0 0 0 0 0 0 0 0 0

X 0 0 0 0 0 0 0 0 0 0 0

Y 0 0 0 0 0 0 0 0 0 0 0

Z 0 0 0 0 0 0 0 0 0 0 0

One-hot encoding over a

vocabulary of characters.

Encoding:

Text = “ODSC-UK NLP”

Vocab: [ ‘ ‘, ‘-’, ‘.’, ‘A’, ‘B’, ‘C’, …, ‘Z’ ]

Page 30: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

For images:

For text:

Humans to rephrase the examplesSynonyms

Similar semantic meaning

Page 31: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015

Page 32: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Page 33: Deep Learning for NLP

O D S C - U K N L P … 1013

space 0 0 0 0 0 0 0 1 0 0 0 … …

- 0 0 0 0 1 0 0 0 0 0 0 … …

. 0 0 0 0 0 0 0 0 0 0 0 … …

A 0 0 0 0 0 0 0 0 0 0 0 … …

B 0 0 0 0 0 0 0 0 0 0 0 … …

C 0 0 0 1 0 0 0 0 0 0 0 … …

D 0 1 0 0 0 0 0 0 0 0 0 … …

E 0 0 0 0 0 0 0 0 0 0 0 … …

F 0 0 0 0 0 0 0 0 0 0 0 … …

G 0 0 0 0 0 0 0 0 0 0 0 … …

H 0 0 0 0 0 0 0 0 0 0 0 … …

I 0 0 0 0 0 0 0 0 0 0 0 … …

J 0 0 0 0 0 0 0 0 0 0 0 … …

K 0 0 0 0 0 0 1 0 0 0 0 … …

L 0 0 0 0 0 0 0 0 0 1 0 … …

M 0 0 0 0 0 0 0 0 0 0 0 … …

N 0 0 0 0 0 0 0 0 1 0 0 … …

O 1 0 0 0 0 0 0 0 0 0 0 … …

P 0 0 0 0 0 0 0 0 0 0 1 … …

Q 0 0 0 0 0 0 0 0 0 0 0 … …

R 0 0 0 0 0 0 0 0 0 0 0 … …

S 0 0 1 0 0 0 0 0 0 0 0 … …

T 0 0 0 0 0 0 0 0 0 0 0 … …

U 0 0 0 0 0 1 0 0 0 0 0 … …

V 0 0 0 0 0 0 0 0 0 0 0 … …

W 0 0 0 0 0 0 0 0 0 0 0 … …

X 0 0 0 0 0 0 0 0 0 0 0 … …

Y 0 0 0 0 0 0 0 0 0 0 0 … …

Z 0 0 0 0 0 0 0 0 0 0 0 … …

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

0 1 2 3 4 … 1007

0 6.4 1.1 3.2 0.1 -0.4 … 3.1

… … … … … … … …

255 1.2 3.4 -1 1.2 3.2 … -1

x 256

69x1014x1

1x1008x256

x 1008

Page 34: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

0 1 2 3 4 … 1007

0 6.4 1.1 3.2 0.1 -0.4 … 3.1

… … … … … … … …

255 1.2 3.4 -1 1.2 3.2 … -1

0 1 2 3 4 … 1007

0 6.4 1.1 3.2 0.1 0 … 3.1

… … … … … … … …

255 1.2 3.4 0 1.2 3.2 … 0

1x1008x256

1x1008x256

Page 35: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

0 1 2 3 4 … … … 1007

0 6.4 1.1 3.2 0.1 0 … … … 3.1

… … … … … … … … … …

255 1.2 3.4 0 1.2 3.2 … … … 0

0 1 … 335

0 6.4 0.1 … …

… … … … …

255 3.4 3.2

1x1008x256

1x336x256

x 336x 256

Page 36: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

0 1 2 3 4 5 6 7 8 … 335

0 6.4 0.1 … … … … … … … … …

… … … … … … … … … … … …

255 3.4 3.2 … … … … … … … … …

0 1 2 3 4 5 6 … 329

0 -2.4 3.2 … … … … … … …

… … … … … … … … … …

255 … … … … … … … … …

1x330x256

1x336x256

x 256x 330

Page 37: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

1x330x256 <- after 2 convolution (7x1/1) and 1 max pooling (3x1/3)

1x110x256 <- 1 max-pooling (3x1/3)

3x102x256 <- 4 convolutions (3x1/1)

1x34x256 <- 1 max-pooling (3x1/3)

Page 38: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

0 1 2 3 4 5 6 7 8 … 33

0 6.4 0.1 … … … … … … … … …

1 2.1 24.9 … … … … … … … … …

… … … … … … … … … … … …

255 … … … … … … … … … … 9.9

0

0 6.4

1 0.1

… …

35 2.1

36 24.9

… …

… …

… …

… …

8703 9.98704x1x1

1x34x256

x 256

Page 39: Deep Learning for NLP

0

0 6.4

1 0.1

… …

… …

… …

… …

… …

… …

… …

8703 9.9

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

8704x1x1

0

k

1023

x 1024

1024x1x1

𝑓𝑘 𝑋 =

𝑖=0

8703

𝑤𝑘𝑖 ∗ 𝑥𝑖 + 𝑏𝑘

0

0 8.7

1 -2.1

… …

… …

… …

… …

… …

… …

… …

1023 32.1

Page 40: Deep Learning for NLP

0

0 6.4

1 0.1

… …

… …

… …

… …

… …

… …

… …

1023 9.9

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

1024x1x1

0

k

1023

x 1024

1024x1x1

𝑓𝑘 𝑋 =

𝑖=0

8703

𝑤𝑘𝑖 ∗ 𝑥𝑖 + 𝑏𝑘

0

0 8.7

1 -2.1

… …

… …

… …

… …

… …

… …

… …

1023 32.1

ignored

Page 41: Deep Learning for NLP

0

0 6.4

1 0.1

… …

… …

… …

… …

… …

… …

… …

1023 9.9

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

1024x1x1

0

N

x NNx1x1

0

0 2.7

1 0.1

… …

… …

N-1 12.5

ignored

Softmax

0

0 0.1

1 0.01

… …

… …

N-1 0.8

Nx1x1

𝜎 𝒛 𝑖 =𝑒𝑧𝑖

σ𝑗=0𝑁−1 𝑒

𝑧𝑗

Page 42: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

• MXNet using python bindings

• Training on Azure N-Series, on Tesla K80 GPU

• 3 days of training on 2.5M example for sentiment polarity

Page 43: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Amazon Review Polarity dataset (1.8M training, 200k testing):

- Crepe model + thesaurus augmentation: 95.07%

- TFIDF + n-grams: 91.64%

AG’s news corpus dataset (4 Classes 120kM training, 7.6k testing):

- Crepe model + thesaurus augmentation: 85.20%

- TFIDF + n-grams: 92.36%

CNN are no silver bullets, but they perform best on very large dataset

Page 44: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun. Very Deep Convolutional Networks

for Natural Language Processing, 2016

Page 45: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: Sergey Ioffe and Christian Szegedy Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.

Page 46: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

6.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 16.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 1

6.4 3.2 5 10 21

Page 47: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification. 2016

Page 48: Deep Learning for NLP
Page 49: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

NLP APIs from major cloud providers and market places

- Language detection

- Sentiment Analysis

- Topic detection

- Translation

- Content moderation

- Text to speech

- Speech to text

- Intent modelling

Page 50: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

+Scalable

Managed

Pay per use pricing

Documentation and sample code

-Generic solutions

Limited customizability

Performance

Latency

Limited batch processing

Page 51: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Single Machine

Training Data Testing Data

Sample Production

DataModel

Development

Page 52: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Data pipeline ?

Retraining ?

Scalability ?

Real time / Batch scoring ?

Multiple team / frameworks ?

Page 53: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Production

Page 54: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Training

instance(s)

(GPU)

Scoring

instance

(CPU)

Scoring

instance

(CPU)

Scoring

instance

(CPU)

Scoring

instance

(CPU)

Training

Data

Serialized

model

Serialized

model

Training

instance(s)

(GPU)

Orchestration Layer (CI/CD / Job scheduling / Monitoring)

Page 55: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

Page 56: Deep Learning for NLP

ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro

+Auto-scale and load balancing

Managed

Domain specific training data

Latency

-Pricing less flexible

Deployment pipeline to monitor

Performance

Page 57: Deep Learning for NLP

@ODSC

Thomas Delteilhttps://www.linkedin.com/in/thomasdelteil

Miguel Fierro@miguelgfierro

https://miguelgfierro.com

Page 58: Deep Learning for NLP

The code of this application is published at:

https://github.com/ilkarman/Bangalore_Senti

ment

Part of our code is based on:

https://github.com/zhangxiangxiao/Crepe

Attribution of some images:

• http://morguefile.com

• https://unsplash.com

• Ana Corrales Photography

• http://wikipedia.org

Amazon dataset citation:

• J. McAuley, C. Targett, J. Shi, A. van den

Hengel. Image-based recommendations

on styles and substitutes. SIGIR, 2015.

• J. McAuley, R. Pandey, J. Leskovec.

Inferring networks of substitutable and

complementary products. Knowledge

Discovery and Data Mining, 2015

Open Data Science Conference London,

8 & 9 October, 2016

© 2016 Microsoft Corporation. All right reserved