@zelrosai @parisnlp matthieu bizien christophe bourguignat · 2017. 6. 4. · available datasets:...
TRANSCRIPT
Adding Neurons to Your Assistants
Christophe Bourguignat
Matthieu Bizien
@zelrosAI @ParisNLP
What we want to solve at Zelros
DatasourceConnection
Predictive Modeling
Dialogs & NLPConfiguration
AIEducation
INTELLIGENT VIRTUAL ASSISTANT PLATFORM
Understanding Natural Language Understanding
Source : http://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf
Our playground
Today : retrieval based systems - what works today in practice for conversational agents
Given a user sentence and context, find the best answer among a pre-defined set of intents
Tomorrow : generative models, self-learning
Bots are not born equal
Success
Bots are not born equal
Success
Error
Bots are not born equal
Success
Error
Fallback
A bit of history
http://disi.unitn.it/~riccardi/papers/specom97.pdf
Things are going fast
01/15 09/16 11/16 12/1606/16
Why we want to build our own NLU system
More fun !
Data Privacy
Performances for our use-cases
Our own roadmap
...
BUNT, The first public benchmarker for NLU APIs
https://github.com/zelros/bunt
Approach 1 : Supervised N-grams
Intent Utterance X Y
NAME What is your name? N-gram 1
NEED What do you need? N-gram 2
NEED Do you need anything? N-gram 2
✔ Work in practice❌ Out-Of-Vocabulary words?❌ Fallback Detection?
Approach 2 : Word2Vec Step 1: Distance function
What is your name Utterance
0.34 0.32 0.01 0.31 0.27
0.21 0.21 0.04 0.42 0.22
... ... ... .. ..
0.5 0.12 0.5 0.44 0.41
What was your mood Sentence
0.34 0.12 0.01 0.21 0.17
0.21 0.51 0.04 0.52 0.32
... ... ... .. ..
0.5 0.22 0.5 0.24 0.36
MEAN
MEAN
CosineSimilarity
Approach 2 : Word2VecStep 2: Classifier
Distance Function
Nearest Neighbors+
Best Intent
⚠� Work in practice✔ Out-Of-Vocabulary words?❌ Fallback Detection?
Going further: ML without data
Distance Function
Betteralgorithm
Learn the Distance Function
Fallback detection
Nearest Neighbors+
Best Intent
ML without dataAvailable datasets: QA/QC
15 500 Questions
Supervised : label = type of intent● 6 coarse classes : abbreviation, description,
entity, human, location, numeric value● 50 fine classes
Q AQ C
Sentence Coarse class Fine class
How did serfdom develop in and then leave Russia ? DESC manner
What films featured the character Popeye Doyle ? ENTY cremat
How can I find a list of celebrities ' real names ? DESC manner
What is the full form of .com ? ABBR exp
What contemptible scoundrel stole the cork from my lunch ? HUM ind
https://cogcomp.cs.illinois.edu/Data/QA/QC/
ML without dataAvailable datasets: SNLI
570k human-written English sentence pairs
Label = entailment, contradiction, or neutral(judgments of five turkers)
Text Judgments Hypothesis
A man inspects the uniform of a figure in some East Asian country.
contradiction C C C C C The man is sleeping
An older and younger man smiling. neutral N N E N N
Two men are smiling and laughing at the cats playing on the floor.
A black race car starts up in front of a crowd of people.
contradiction C C C C C A man is driving down a lonely road.
A soccer game with multiple males playing. entailment E E E E E Some men are playing a sport.
https://nlp.stanford.edu/projects/snli/
ML without dataAvailable datasets: Quora
Sentence 1 Duplicate Sentence 2
What is the step by step guide to invest in share market in india?
False What is the step by step guide to invest in share market?
Why am I mentally very lonely? How can I solve it?
False Find the remainder when [math]23^{24}[/math] is divided by 24,23?
How do we prepare for UPSC? True How do I prepare for civil service?
How can I be a good geologist? True What should I do to be a great geologist?
400 000 pairs of questions
Supervised : label = are they duplicates?
ML without dataAvailable datasets: Quora
Siamese Network �
Siamese Network �Creation of a sentence embedding
What is your name
0.34 0.32 0.01 0.31
0.21 0.21 0.04 0.42
... ... ... ..
0.5 0.12 0.5 0.44
Embedding
0.27
0.22
..
0.41
Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))
TimeDistributed(Dense(300, activation=’relu’))
Embedding(nb_tokens+1, 300, input_length=25, trainable=False)
Siamese Network �A simple architecture
What is your name Embedding
0.34 0.32 0.01 0.31 0.27
0.21 0.21 0.04 0.42 0.22
... ... ... .. ..
0.5 0.12 0.5 0.44 0.41
What was your mood Embedding
0.34 0.12 0.01 0.21 0.17
0.21 0.51 0.04 0.52 0.32
... ... ... .. ..
0.5 0.22 0.5 0.24 0.36
Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))
TimeDistributed(Dense(300, activation=’relu’))
Embedding(nb_tokens+1, 300, input_length=25, trainable=False)
Lambda (lambda x: K.max(x, axis=1), output_shape=(300,))
TimeDistributed(Dense(300, activation=’relu’))
Embedding(nb_tokens+1, 300, input_length=25, trainable=False)
SIMILARITY
1/(1 + |h-h’|²)Same weights
Siamese Network �Learned Similarity
https://github.com/bradleypallen/keras-quora-question-pairs
Siamese Network �Going further
Dropout?
BiLSTM?
GRU?
MeanPooling?
BatchNorm?
More Layers?
N-char?
Maxout?
Siamese Network �Conclusion
What is your name Embedding
0.34 0.32 0.01 0.31 0.27
0.21 0.21 0.04 0.42 0.22
... ... ... .. ..
0.5 0.12 0.5 0.44 0.41
What was your mood Embedding
0.34 0.12 0.01 0.21 0.17
0.21 0.51 0.04 0.52 0.32
... ... ... .. ..
0.5 0.22 0.5 0.24 0.36
SIMILARITY
Non-siamese network Example
Bilateral Multi-Perspective Matching for Natural Language Sentences
Better Algorithm: SVM
Distance Function
Betteralgorithm
Learn the Distance Function
Fallback detection
Nearest Neighbors+
Best Intent
✅
Better Algorithm: SVM
J.P. Vert
Better Algorithm: SVM
J.P. Vert
So the Siamese Networks are kernels!
So we can use SVM � J.P. Vert
Fallback Detection
Distance Function
Betteralgorithm
Learn the Distance Function
Fallback detection
Nearest Neighbors+
Best Intent
✅✅
Fallback Detection
If Probability < ThresholdThen Fallback
External dataset of unrelated sentences
Learned Distance Function
Betteralgorithm
Learn the Distance Function
Fallback detection
SVM +
Best IntentOr Fallback
✅✅
✅
Thanks !@chris_bour
@MatthieuBizien
@ParisNLP