Download - lda2vec Text by the Bay 2016
![Page 1: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/1.jpg)
lda2vec (word2vec, and lda)
Christopher Moody @ Stitch Fix
![Page 2: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/2.jpg)
About
@chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody
Gaussian Processes t-SNE
chainer deep learning
Tensor Decomposition
![Page 3: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/3.jpg)
word2vec
lda
1
23ld
a2vec
![Page 4: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/4.jpg)
1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained
word2vec
![Page 5: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/5.jpg)
1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent
word2vec
![Page 6: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/6.jpg)
word
2vec
word2vec: learn word vector w from it’s surrounding context
w
![Page 7: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/7.jpg)
word
2vec
“The fox jumped over the lazy dog”Maximize the likelihood of seeing the words given the word over.
P(the|over) P(fox|over)
P(jumped|over) P(the|over) P(lazy|over) P(dog|over)
…instead of maximizing the likelihood of co-occurrence counts.
![Page 8: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/8.jpg)
word
2vec
P(fox|over)
What should this be?
![Page 9: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/9.jpg)
word
2vec
P(vfox|vover)
Should depend on the word vectors.
P(fox|over)
![Page 10: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/10.jpg)
word
2vec
“The fox jumped over the lazy dog”
P(w|c)
Extract pairs from context window around every input word.
![Page 11: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/11.jpg)
word
2vec
“The fox jumped over the lazy dog”
c
P(w|c)
Extract pairs from context window around every input word.
![Page 12: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/12.jpg)
word
2vec
“The fox jumped over the lazy dog”
w
P(w|c)
c
Extract pairs from context window around every input word.
![Page 13: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/13.jpg)
word
2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 14: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/14.jpg)
word
2vec
“The fox jumped over the lazy dog”
P(w|c)
w c
Extract pairs from context window around every input word.
![Page 15: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/15.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 16: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/16.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 17: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/17.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 18: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/18.jpg)
word
2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 19: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/19.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 20: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/20.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 21: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/21.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 22: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/22.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 23: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/23.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 24: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/24.jpg)
objectiv
e
Measure loss between w and c?
How should we define P(w|c)?
![Page 25: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/25.jpg)
objectiv
e
w . c
How should we define P(w|c)?
Measure loss between w and c?
![Page 26: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/26.jpg)
word
2vec
w . c ~ 1
objectiv
e
w
c
vcanada . vsnow ~ 1
![Page 27: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/27.jpg)
word
2vec
w . c ~ 0
objectiv
e
w
cvcanada . vdesert ~0
![Page 28: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/28.jpg)
word
2vec
w . c ~ -1
objectiv
e
w
c
![Page 29: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/29.jpg)
word
2vec
w . c ∈ [-1,1]
objectiv
e
![Page 30: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/30.jpg)
word
2vec
But we’d like to measure a probability.
w . c ∈ [-1,1]
objectiv
e
![Page 31: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/31.jpg)
word
2vec
But we’d like to measure a probability.
objectiv
e
∈ [0,1]σ(c·w)
![Page 32: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/32.jpg)
word
2vec
But we’d like to measure a probability.
objectiv
e
∈ [0,1]σ(c·w)
w c
w c
SimilarDissimilar
![Page 33: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/33.jpg)
word
2vec
Loss function:
objectiv
e
L=σ(c·w)
Logistic (binary) choice. Is the (context, word) combination from our dataset?
![Page 34: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/34.jpg)
word
2vec
The skip-gram negative-sampling model
objectiv
e
Trivial solution is that context = word for all vectors
L=σ(c·w)w
c
![Page 35: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/35.jpg)
word
2vec
The skip-gram negative-sampling model
L = σ(c·w) + σ(-c·wneg)
objectiv
e
Draw random words in vocabulary.
![Page 36: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/36.jpg)
word
2vec
The skip-gram negative-sampling model
objectiv
e
Discriminate positive from negative samples
Multiple Negative
L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg)
![Page 37: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/37.jpg)
word
2vec
The SGNS ModelPMI
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
Levy & Goldberg 2014
L = σ(c·w) + σ(-c·wneg)
![Page 38: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/38.jpg)
word
2vec
The SGNS ModelPMI
Levy & Goldberg 2014
‘traditional’ NLP
L = σ(c·w) + σ(-c·wneg)
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
![Page 39: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/39.jpg)
word
2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PMI
ci·wj = log
Levy & Goldberg 2014
#(ci,wj)/n
k #(wj)/n #(ci)/n
‘traditional’ NLP
![Page 40: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/40.jpg)
word
2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PMI
ci·wj = log
Levy & Goldberg 2014
popularity of c,wk (popularity of c) (popularity of w)
‘traditional’ NLP
![Page 41: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/41.jpg)
word
2vec
PMI
99% of word2vec is counting.
And you can count words in SQL
![Page 42: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/42.jpg)
word
2vec
PMI
Count how many times you saw c·w
Count how many times you saw c
Count how many times you saw w
![Page 43: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/43.jpg)
word
2vec
PMI
…and this takes ~5 minutes to compute on a single core. Computing SVD is a completely standard math library.
![Page 44: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/44.jpg)
word2vec
![Page 45: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/45.jpg)
![Page 46: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/46.jpg)
![Page 47: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/47.jpg)
![Page 48: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/48.jpg)
![Page 49: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/49.jpg)
![Page 50: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/50.jpg)
![Page 51: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/51.jpg)
![Page 52: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/52.jpg)
![Page 53: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/53.jpg)
![Page 54: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/54.jpg)
![Page 55: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/55.jpg)
![Page 56: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/56.jpg)
![Page 57: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/57.jpg)
![Page 58: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/58.jpg)
![Page 59: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/59.jpg)
![Page 60: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/60.jpg)
![Page 61: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/61.jpg)
ITEM_3469 + ‘Pregnant’
![Page 62: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/62.jpg)
+ ‘Pregnant’
![Page 63: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/63.jpg)
= ITEM_701333 = ITEM_901004 = ITEM_800456
![Page 64: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/64.jpg)
![Page 65: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/65.jpg)
what about?LDA?
![Page 66: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/66.jpg)
LDA on Client Item Descriptions
![Page 67: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/67.jpg)
LDA on Item
Descriptions (with Jay)
![Page 68: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/68.jpg)
LDA on Item
Descriptions (with Jay)
![Page 69: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/69.jpg)
LDA on Item
Descriptions (with Jay)
![Page 70: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/70.jpg)
lda vs word2vec
![Page 71: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/71.jpg)
Bayesian Graphical ModelML Neural Model
![Page 72: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/72.jpg)
word2vec is local: one word predicts a nearby word
“I love finding new designer brands for jeans”
![Page 73: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/73.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
![Page 74: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/74.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
![Page 75: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/75.jpg)
“I love finding new designer brands for jeans”
In LDA, documents globally predict words.
doc 7681
![Page 76: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/76.jpg)
typical word2vec vector
[ 0%, 9%, 78%, 11%]
typical LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
All sum to 100%All real values
![Page 77: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/77.jpg)
5D word2vec vector
[ 0%, 9%, 78%, 11%]
5D LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
![Page 78: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/78.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
dense sparse
![Page 79: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/79.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Similar in fewer ways (more interpretable)
Similar in 100D ways (very flexible)
+mixture +sparse
![Page 80: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/80.jpg)
can we do both? lda2vec
![Page 81: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/81.jpg)
-1.9 0.85 -0.6 -0.3 -0.5
Lufthansa is a German airline and when
fox
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Lufthansa is a German airline and when
German
word2vec predicts locally: one word predicts a nearby word
![Page 82: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/82.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
Document vector predicts a word from
a global context
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 83: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/83.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing mixtures & sparsity!
German
![Page 84: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/84.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing mixtures & sparsity!
![Page 85: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/85.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Now it’s a mixture.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 86: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/86.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
![Page 87: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/87.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 1 = “religion” Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
![Page 88: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/88.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
![Page 89: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/89.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 2 = “politics” Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
![Page 90: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/90.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 91: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/91.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 92: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/92.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 93: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/93.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Sparsity!
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
34% 32% 34%
t=0
41% 26% 34%
t=10
99% 1% 0%
t=∞
time
![Page 95: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/95.jpg)
+ API docs + Examples + GPU + Tests
@chrisemoody
lda2vec.com
![Page 96: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/96.jpg)
@chrisemoody Example Hacker News comments
Topics: http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/
hacker_news/lda2vec/lda2vec.ipynb
Word vectors: https://github.com/cemoody/
lda2vec/blob/master/examples/hacker_news/lda2vec/
word_vectors.ipynb
![Page 97: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/97.jpg)
@chrisemoody
lda2vec.com
human-interpretable doc topics, use LDA.
machine-useable word-level features, use word2vec.
if you like to experiment a lot, and have topics over user / doc / region / etc. features, use lda2vec. (and you have a GPU)
If you want…
![Page 100: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/100.jpg)
CreditLarge swathes of this talk are from
previous presentations by:
• Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper
![Page 101: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/101.jpg)
“PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we model topics to sentences? lda2lstm
![Page 102: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/102.jpg)
Can we model topics to sentences? lda2lstm
“PS! Thank you for such an awesome idea”doc_id=1846
@chrisemoody
Can we model topics to images? lda2ae
TJ Torres
![Page 103: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/103.jpg)
and now for something completely crazy4Fun Stuff
![Page 104: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/104.jpg)
translation
(using just a rotation matrix)
Miko
lov
2013
English
Spanish
Matrix Rotation
![Page 105: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/105.jpg)
deepwalk
Perozz
i
et al 2
014
learn word vectors from sentences
“The fox jumped over the lazy dog”
vOUT vOUT vOUT vOUTvOUTvOUT
‘words’ are graph vertices ‘sentences’ are random walks on the graph
word2vec
![Page 106: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/106.jpg)
Playlists at Spotify
context
sequence
lear
ning
‘words’ are song indices ‘sentences’ are playlists
![Page 107: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/107.jpg)
Playlists at Spotify
contextErik
Bernhar
dsson
Great performance on ‘related artists’
![Page 108: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/108.jpg)
Fixes at Stitch Fix
sequence
lear
ning
Let’s try: ‘words’ are items ‘sentences’ are fixes
![Page 109: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/109.jpg)
Fixes at Stitch Fix
context
Learn similarity between styles because they co-occur
Learn ‘coherent’ styles
sequence
lear
ning
![Page 110: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/110.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ningGot lots of structure!
![Page 111: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/111.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
![Page 112: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/112.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
Nearby regions are consistent ‘closets’
![Page 114: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/114.jpg)
context dependent
Levy
& G
oldberg
2014
Australian scientist discovers star with telescopecontext +/- 2 words
![Page 115: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/115.jpg)
context dependent
context
Australian scientist discovers star with telescope
Levy
& G
oldberg
2014
![Page 116: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/116.jpg)
context dependent
context
Australian scientist discovers star with telescopecontext
Levy
& G
oldberg
2014
![Page 117: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/117.jpg)
context dependent
context
BoW DEPS
topically-similar vs ‘functionally’ similar
Levy
& G
oldberg
2014
![Page 119: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/119.jpg)
![Page 120: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/120.jpg)
Crazy Approaches
Paragraph Vectors (Just extend the context window)
Content dependency (Change the window grammatically)
Social word2vec (deepwalk) (Sentence is a walk on the graph)
Spotify (Sentence is a playlist of song_ids)
Stitch Fix (Sentence is a shipment of five items)
![Page 121: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/121.jpg)
![Page 122: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/122.jpg)
CBOW
“The fox jumped over the lazy dog”
Guess the word given the context
~20x faster. (this is the alternative.)
vOUT
vIN vINvIN vINvIN vIN
SkipGram
“The fox jumped over the lazy dog”
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context given the word
Better at syntax. (this is the one we went over)
![Page 123: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/123.jpg)
lda2
vec
vDOC = a vtopic1 + b vtopic2 +…
Let’s make vDOC sparse
![Page 124: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/124.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the topic vectors. 😔
vDOC = topic0 + topic1
Let’s say that vDOC ads
![Page 125: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/125.jpg)
lda2
vec
softmax(vOUT * (vIN+ vDOC))
![Page 126: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/126.jpg)
![Page 127: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/127.jpg)
theory of lda2vec
lda2
vec
![Page 128: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/128.jpg)
pyLDAvis of lda2vec
lda2
vec
![Page 129: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/129.jpg)
LDA Results
context
History
I loved every choice in this fix!! Great job!
Great Stylist Perfect
![Page 130: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/130.jpg)
LDA Results
context
History
Body Fit
My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted.
Very hard for me to find pants that fit right.
![Page 131: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/131.jpg)
LDA Results
context
History
Sizing
Really enjoyed the experience and the pieces, sizing for tops was too big.
Looking forward to my next box!
Excited for next
![Page 132: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/132.jpg)
LDA Results
context
History
Almost Bought
It was a great fix. Loved the two items I kept and the three I sent back were close!
Perfect
![Page 133: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/133.jpg)
All of the following ideas will change what ‘words’ and ‘context’ represent.
![Page 134: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/134.jpg)
parag
raph
vecto
r
What about summarizing documents?
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
![Page 135: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/135.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
Normal skipgram extends C words before, and C words after.
IN
OUT OUT
![Page 136: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/136.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
![Page 137: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/137.jpg)
fromgensim.modelsimportDoc2Vecfn=“item_document_vectors”model=Doc2Vec.load(fn)model.most_similar('pregnant')matches=list(filter(lambdax:'SENT_'inx[0],matches))
#['...Iamcurrently23weekspregnant...',#'...I'mnow10weekspregnant...',#'...notshowingtoomuchyet...',#'...15weeksnow.Babybump...',#'...6weekspostpartum!...',#'...12weekspostpartumandamnursing...',#'...Ihavemybabyshowerthat...',#'...amstillbreastfeeding...',#'...Iwouldloveanoutfitforababyshower...']
sente
nce
sear
ch
![Page 138: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/138.jpg)
![Page 139: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/139.jpg)
![Page 140: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/140.jpg)
![Page 141: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/141.jpg)
![Page 142: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/142.jpg)
![Page 143: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/143.jpg)
![Page 144: lda2vec Text by the Bay 2016](https://reader035.vdocuments.mx/reader035/viewer/2022062904/587a52b41a28ab520b8b4b47/html5/thumbnails/144.jpg)