deep learning for nlp applications
TRANSCRIPT
Deep Learning forDeep Learning forNLP ApplicationsNLP Applications
What is DeepWhat is DeepLearning?Learning?
Just a NeuralJust a NeuralNetwork!Network!
"Deep learning" refers toDeep Neural Networks
A Deep Neural Network issimply a Neural Networkwith multiple hidden layers
Neural Networks havebeen around since the1970s
So why now?So why now?
LargeLargeNetworks areNetworks arehard to trainhard to train
Vanishing gradients makebackpropagation harderOverfitting becomes aserious issueSo we settled (for the timebeing) with simpler, moreuseful variations of NeuralNetworks
Then,Then,suddenly ...suddenly ...
We realized we can stackthese simpler NeuralNetworks, making themeasier to trainWe derived more efficientparameter estimation andmodel regularizationmethodsAlso, Moore's law kicked inand GPU computationbecame viable
So what's the bigSo what's the bigdeal?deal?
MASSIVE improvements inMASSIVE improvements inComputer VisionComputer Vision
Speech RecognitionSpeech RecognitionBaidu (with Andrew Ng as their chief) has built a state-of-the-art speech recognition system with DeepLearningTheir dataset: 7000 hours of conversation couple withbackground noise synthesis for a total of 100,000hoursThey processed this through a massive GPU cluster
Cross DomainCross DomainRepresentationsRepresentations
What if you wanted to takean image and generate adescription of it?The beauty ofrepresentation learning isit's ability to be distributedacross tasksThis is the real power ofNeural Networks
But Samiur, whatBut Samiur, whatabout NLP?about NLP?
Deep Learning NLPDeep Learning NLPDistributed word representationsDependency ParsingSentiment AnalysisAnd many others ...
StandardStandardBag of Words
A one-hot encoding20k to 50k dimensionsCan be improved byfactoring in documentfrequency
Word embeddingWord embeddingNeural Word embeddings
Uses a vector spacethat attempts topredict a word given acontext window200-400 dimensions
motel [0.06, -0.01, 0.13, 0.07, -0.06, -0.04, 0, -0.04]
hotel [0.07, -0.03, 0.07, 0.06, -0.06, -0.03, 0.01, -0.05]
Word RepresentationsWord Representations
Word embeddings make semantic similarity andsynonyms possible
Word embeddings have coolWord embeddings have coolproperties:properties:
Dependency ParsingDependency ParsingConverting sentences to adependency based grammar
Simplifying this to the verbsand it's agents is calledSemantic Role Labeling
SentimentSentimentAnalysisAnalysis
Recursive Neural NetworksCan model treestructures very wellThis makes it great forother NLP tasks too(such as parsing)
Get to theGet to theapplications partapplications part
already!already!
ToolsToolsPython
Theano/PyLearn2Gensim (for word2vec)nolearn (uses scikit-learn
Java/Clojure/ScalaDeepLearning4jneuralnetworks by Ivan Vasilev
APIsAlchemy APIMeta Mind
Problem: Funding SentenceProblem: Funding SentenceClassifierClassifier
Build a binary classifier that is able to take any sentencefrom a news article and tell if it's about funding or not.
eg. "Mattermark is today announcing that it has raised a
round of $6.5 million"
Word VectorsWord Vectors
Used Gensim's Word2Vec implementation to trainunsupervised word vectors on the UMBC WebbaseCorpus (~100M documents, ~48GB of text)Then, iterated 20 times on text in news articles in thetech news domain (~1M documents, ~300MB of text)
Sentence VectorsSentence VectorsHow can you compose word vectors to makesentence vectors?
Use paragraph vector model proposed by Quoc LeFeed into an RNN constructed by a dependencytree of the sentenceUse some heuristic function to combine the stringof word vectors
What did we try?What did we try?TF-IDF + Naive BayesWord2Vec + Composition MethodsWord2Vec + TF-IDF + Composition MethodsWord2Vec + TF-IDF + Semantic Role Labeling (SRL) +Composition Methods
Composition MethodsComposition Methods
Where wi represents the i'th word vector,wv the word vector for the verb, and a0 and a1 are
agents
What worked?What worked?Word2Vec + TFIDF + SRL + Circular Convolution/Additive
The first method with simple TFIDF/Naive Bayesperformed extremely poorly because of it's largedimensionalityCombining TFIDF with Word2Vec provided a small,but noticeable improvementAdding SRL and a more sophisticated compositionmethod increased performance by almost 5%
What else could we try?What else could we try?Can we apply this method to generate generalpurpose document vectors?
We are currently using LDA (a topic analysismethod) or simple TFIDF to create documentvectorsHow will this method compare to the alreadyproposed paragraph vector method by Quoc Le?
Can we associate these document vectors with muchsmaller query strings?
eg. Search for artificial intelligence against ourcompanies and get better results than keywordsearch
Who's doing ML atWho's doing ML atMattermark?Mattermark?
mattermark
We need more people! Refer anyone thatyou know that does Data Science/ML