dependency-based word embeddings omer levyyoav goldberg bar-ilan university israel
DESCRIPTION
Our Main Contribution: Generalizing Skip-Gram with Negative SamplingTRANSCRIPT
Dependency-BasedWord Embeddings
Omer Levy Yoav GoldbergBar-Ilan University
Israel
Neural Embeddings• Dense vectors• Each dimension is a latent feature• word2vec (Mikolov et al., 2013)• State-of-the-Art: Skip-Gram with Negative Sampling • “Linguistic Regularities”
king man woman queen
Linguistic Regularities in Sparse and Explicit Word RepresentationsFriday, 2:00 PM, CoNLL 2014
Our Main Contribution:
Generalizing Skip-Gram with Negative Sampling
Skip-Gram with Negative Sampling v2.0• Original implementation assumes bag-of-words contexts• We generalize to arbitrary contexts
• Dependency contexts create qualitatively different word embeddings
• Provide a new tool for linguistically analyzing embeddings
Context Types
Australian scientist discovers star with telescope
Example
Australian scientist discovers star with telescope
Target Word
Australian scientist discovers star with telescope
Bag of Words (BoW) Context
Australian scientist discovers star with telescope
Bag of Words (BoW) Context
Australian scientist discovers star with telescope
Bag of Words (BoW) Context
Australian scientist discovers star with telescope
Syntactic Dependency Context
Australian scientist discovers star with telescope
Syntactic Dependency Contextprep_wit
hnsubj
dobj
Australian scientist discovers star with telescope
Syntactic Dependency Contextprep_wit
hnsubj
dobj
Generalizing Skip-Gram with Negative Sampling
How does Skip-Gram work?• Skip-gram represents each word as a vector
• Skip-gram represents each context word as a different vector
• Same word has 2 different embeddings (as “word”, as “context”)
How does Skip-Gram work?Text
Bag of Words Context
Word-Context Pairs
Learning
How does Skip-Gram work?Text
Bag of Words Contexts
Word-Context Pairs
Learning
Our ModificationText
Arbitrary Contexts
Word-Context Pairs
Learning
Our ModificationText
Arbitrary Contexts
Word-Context Pairs
Learning
Modified word2vec publicly available!
Our Modification: ExampleText
Syntactic Contexts
Word-Context Pairs
Learning
Our Modification: ExampleText (Wikipedia)
Syntactic Contexts
Word-Context Pairs
Learning
Our Modification: ExampleText (Wikipedia)
Syntactic Contexts (Stanford Dependencies)
Word-Context Pairs
Learning
What is the effect of different context types?
What is the effect of different context types?• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…
General Conclusion:• Bag-of-words contexts induce topical similarities• Dependency contexts induce functional similarities• Share the same semantic type• Cohyponyms
• Does this hold for embeddings as well?
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) DependenciesDumbledore Sunnydale
hallows CollinwoodHogwarts half-blood Calarts
(Harry Potter’s school) Malfoy GreendaleSnape Millfield
Related to Harry Potter Schools
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) Dependenciesnondeterministic Paulingnon-deterministic Hotelling
Turing computability Heting(computer scientist) deterministic Lessing
finite-state Hamming
Related to computability Scientists
Online Demo!
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) Dependenciessinging singingdance rapping
dancing dances breakdancing(dance gerund) dancers miming
tap-dancing busking
Related todance Gerunds
Embedding Similarity with Different Contexts• Dependency-based embeddings have more functional similarities
• This phenomenon goes beyond these examples
• Quantitative Analysis (in the paper)
Dependency-based embeddings have more functional similarities
Quantitative Analysis
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Prec
ision
Dependencies
BoW (k=2)
BoW (k=5)
Why do dependencies induce functional similarities?
Dependency Contexts & Functional Similarity• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…
• In explicit representations, we can look at the features and analyze
• But embeddings are a black box!• Dimensions are latent and don’t necessarily have any meaning
Analyzing Embeddings
Peeking into Skip-Gram’s Black Box• Skip-Gram allows a peek…
• Contexts are embedded in the same space!
• Given a word , find the contexts it “activates” most:
Associated ContextsTarget Word Dependencies
students/prep_at-1
educated/prep_at-1
Hogwarts student/prep_at-1
stay/prep_at-1
learned/prep_at-1
Associated ContextsTarget Word Dependencies
machine/nn-1
test/nn-1
Turing theorem/poss-1
machines/nn-1
tests/nn-1
Associated ContextsTarget Word Dependencies
dancing/conjdancing/conj-1
dancing singing/conj-1
singing/conjballroom/nn
Analyzing Embeddings• We found a way to linguistically analyze embeddings
• Together with the ability to engineer contexts…
• …we now have the tools to create task-tailored embeddings!
Conclusion
Conclusion• Generalized Skip-Gram with Negative Sampling to arbitrary contexts
• Different contexts induce different similarities
• Suggest a way to peek inside the black box of embeddings
• Code, demo, and word vectors available from our websites
• Make linguistically-motivated task-tailored embeddings today!Thank you for listening :)