@zelrosai @parisnlp matthieu bizien christophe bourguignat · 2017. 6. 4. · available datasets:...

34
Adding Neurons to Your Assistants Christophe Bourguignat Matthieu Bizien @zelrosAI @ParisNLP

Upload: others

Post on 01-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Adding Neurons to Your Assistants

Christophe Bourguignat

Matthieu Bizien

@zelrosAI @ParisNLP

Page 2: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

What we want to solve at Zelros

Page 3: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

DatasourceConnection

Predictive Modeling

Dialogs & NLPConfiguration

AIEducation

INTELLIGENT VIRTUAL ASSISTANT PLATFORM

Page 4: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Understanding Natural Language Understanding

Source : http://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf

Page 5: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Our playground

Today : retrieval based systems - what works today in practice for conversational agents

Given a user sentence and context, find the best answer among a pre-defined set of intents

Tomorrow : generative models, self-learning

Page 6: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Bots are not born equal

Success

Page 7: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Bots are not born equal

Success

Error

Page 8: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Bots are not born equal

Success

Error

Fallback

Page 9: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

A bit of history

http://disi.unitn.it/~riccardi/papers/specom97.pdf

Page 10: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Things are going fast

01/15 09/16 11/16 12/1606/16

Page 11: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Why we want to build our own NLU system

More fun !

Data Privacy

Performances for our use-cases

Our own roadmap

...

Page 12: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

BUNT, The first public benchmarker for NLU APIs

https://github.com/zelros/bunt

Page 13: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Approach 1 : Supervised N-grams

Intent Utterance X Y

NAME What is your name? N-gram 1

NEED What do you need? N-gram 2

NEED Do you need anything? N-gram 2

✔ Work in practice❌ Out-Of-Vocabulary words?❌ Fallback Detection?

Page 14: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Approach 2 : Word2Vec Step 1: Distance function

What is your name Utterance

0.34 0.32 0.01 0.31 0.27

0.21 0.21 0.04 0.42 0.22

... ... ... .. ..

0.5 0.12 0.5 0.44 0.41

What was your mood Sentence

0.34 0.12 0.01 0.21 0.17

0.21 0.51 0.04 0.52 0.32

... ... ... .. ..

0.5 0.22 0.5 0.24 0.36

MEAN

MEAN

CosineSimilarity

Page 15: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Approach 2 : Word2VecStep 2: Classifier

Distance Function

Nearest Neighbors+

Best Intent

⚠� Work in practice✔ Out-Of-Vocabulary words?❌ Fallback Detection?

Page 16: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Going further: ML without data

Distance Function

Betteralgorithm

Learn the Distance Function

Fallback detection

Nearest Neighbors+

Best Intent

Page 17: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

ML without dataAvailable datasets: QA/QC

15 500 Questions

Supervised : label = type of intent● 6 coarse classes : abbreviation, description,

entity, human, location, numeric value● 50 fine classes

Q AQ C

Sentence Coarse class Fine class

How did serfdom develop in and then leave Russia ? DESC manner

What films featured the character Popeye Doyle ? ENTY cremat

How can I find a list of celebrities ' real names ? DESC manner

What is the full form of .com ? ABBR exp

What contemptible scoundrel stole the cork from my lunch ? HUM ind

https://cogcomp.cs.illinois.edu/Data/QA/QC/

Page 18: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

ML without dataAvailable datasets: SNLI

570k human-written English sentence pairs

Label = entailment, contradiction, or neutral(judgments of five turkers)

Text Judgments Hypothesis

A man inspects the uniform of a figure in some East Asian country.

contradiction C C C C C The man is sleeping

An older and younger man smiling. neutral N N E N N

Two men are smiling and laughing at the cats playing on the floor.

A black race car starts up in front of a crowd of people.

contradiction C C C C C A man is driving down a lonely road.

A soccer game with multiple males playing. entailment E E E E E Some men are playing a sport.

https://nlp.stanford.edu/projects/snli/

Page 19: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

ML without dataAvailable datasets: Quora

Sentence 1 Duplicate Sentence 2

What is the step by step guide to invest in share market in india?

False What is the step by step guide to invest in share market?

Why am I mentally very lonely? How can I solve it?

False Find the remainder when [math]23^{24}[/math] is divided by 24,23?

How do we prepare for UPSC? True How do I prepare for civil service?

How can I be a good geologist? True What should I do to be a great geologist?

400 000 pairs of questions

Supervised : label = are they duplicates?

Page 20: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

ML without dataAvailable datasets: Quora

Page 21: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Siamese Network �

Page 22: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Siamese Network �Creation of a sentence embedding

What is your name

0.34 0.32 0.01 0.31

0.21 0.21 0.04 0.42

... ... ... ..

0.5 0.12 0.5 0.44

Embedding

0.27

0.22

..

0.41

Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))

TimeDistributed(Dense(300, activation=’relu’))

Embedding(nb_tokens+1, 300, input_length=25, trainable=False)

Page 23: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Siamese Network �A simple architecture

What is your name Embedding

0.34 0.32 0.01 0.31 0.27

0.21 0.21 0.04 0.42 0.22

... ... ... .. ..

0.5 0.12 0.5 0.44 0.41

What was your mood Embedding

0.34 0.12 0.01 0.21 0.17

0.21 0.51 0.04 0.52 0.32

... ... ... .. ..

0.5 0.22 0.5 0.24 0.36

Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))

TimeDistributed(Dense(300, activation=’relu’))

Embedding(nb_tokens+1, 300, input_length=25, trainable=False)

Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))

TimeDistributed(Dense(300, activation=’relu’))

Embedding(nb_tokens+1, 300, input_length=25, trainable=False)

SIMILARITY

1/(1 + |h-h’|²)Same weights

Page 25: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Siamese Network �Going further

Dropout?

BiLSTM?

GRU?

MeanPooling?

BatchNorm?

More Layers?

N-char?

Maxout?

Page 26: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Siamese Network �Conclusion

What is your name Embedding

0.34 0.32 0.01 0.31 0.27

0.21 0.21 0.04 0.42 0.22

... ... ... .. ..

0.5 0.12 0.5 0.44 0.41

What was your mood Embedding

0.34 0.12 0.01 0.21 0.17

0.21 0.51 0.04 0.52 0.32

... ... ... .. ..

0.5 0.22 0.5 0.24 0.36

SIMILARITY

Page 27: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Non-siamese network Example

Bilateral Multi-Perspective Matching for Natural Language Sentences

Page 28: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Better Algorithm: SVM

Distance Function

Betteralgorithm

Learn the Distance Function

Fallback detection

Nearest Neighbors+

Best Intent

Page 29: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Better Algorithm: SVM

J.P. Vert

Page 30: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Better Algorithm: SVM

J.P. Vert

So the Siamese Networks are kernels!

So we can use SVM � J.P. Vert

Page 31: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Fallback Detection

Distance Function

Betteralgorithm

Learn the Distance Function

Fallback detection

Nearest Neighbors+

Best Intent

✅✅

Page 32: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Fallback Detection

If Probability < ThresholdThen Fallback

External dataset of unrelated sentences

Page 33: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Learned Distance Function

Betteralgorithm

Learn the Distance Function

Fallback detection

SVM +

Best IntentOr Fallback

✅✅

Page 34: @zelrosAI @ParisNLP Matthieu Bizien Christophe Bourguignat · 2017. 6. 4. · Available datasets: Quora Sentence 1 Duplicate Sentence 2 What is the step by step guide to invest in

Thanks !@chris_bour

@MatthieuBizien

@ParisNLP