lda2vec text by the bay 2016 with notes

146
lda2vec (word2vec, and lda) Christopher Moody @ Stitch Fix Welcome, thanks for coming, having me, organizer NLP can be a messy aair because you have to teach a computer about the irregularities and ambiguities of the English language and have to teach it this sort of hierarchical & sparse nature of english grammar & vocab 3rd trimester, pregnant “wears scrubs” — medicine taking a trip — a fix for vacation clothing power and promise of word vectors is to sweep away a lot of issues

Upload: christopher-moody

Post on 15-Apr-2017

1.192 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Lda2vec text by the bay 2016 with notes

lda2vec (word2vec, and lda)

Christopher Moody @ Stitch Fix

Welcome, thanks for coming, having me, organizer

NLP can be a messy affair because you have to teach a computer about the irregularities and ambiguities of the English language

and have to teach it this sort of hierarchical & sparse nature of english grammar & vocab

3rd trimester, pregnant“wears scrubs” — medicinetaking a trip — a fix for vacation clothing

power and promise of word vectors is to sweep away a lot of issues

Page 2: Lda2vec text by the bay 2016 with notes

About

@chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody

Gaussian Processes t-SNE

chainer deep learning

Tensor Decomposition

Page 3: Lda2vec text by the bay 2016 with notes

word2vec

lda

1

23ld

a2vec

Page 4: Lda2vec text by the bay 2016 with notes

1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained

word2vec

1. Learns what words mean — can solve analogies cleanly.1. Not treating words as blocks, but instead modeling relationships

2. Distributed representations form the basis of more complicated deep learning systems3.

Page 5: Lda2vec text by the bay 2016 with notes

1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent

word2vec

Page 6: Lda2vec text by the bay 2016 with notes

word

2vec

word2vec: learn word vector w from it’s surrounding context

w

1. not mention neural networks2. Let’s talk about training first3. n-grams transition probability vs tf-idf / LSI co-occurence matrices4. Here we will learn the embedded representation directly, with no intermediates, update it w/ every example

Page 7: Lda2vec text by the bay 2016 with notes

word

2vec

“The fox jumped over the lazy dog”Maximize the likelihood of seeing the words given the word over.

P(the|over) P(fox|over)

P(jumped|over) P(the|over) P(lazy|over) P(dog|over)

…instead of maximizing the likelihood of co-occurrence counts.

1. Context — the words surrounding the training word2. Naively assume, BoW, not recurrent, no state3. Still a pretty simple assumption!

Conditioning on just *over* no other secret parameters or anything

Page 8: Lda2vec text by the bay 2016 with notes

word

2vec

P(fox|over)

What should this be?

Page 9: Lda2vec text by the bay 2016 with notes

word

2vec

P(vfox|vover)

Should depend on the word vectors.

P(fox|over)

Trying to learn the word vectors, so let’s start with those(we’ll randomly initialize them to begin with)

Page 10: Lda2vec text by the bay 2016 with notes

word

2vec

“The fox jumped over the lazy dog”

P(w|c)

Extract pairs from context window around every input word.

Page 11: Lda2vec text by the bay 2016 with notes

word

2vec

“The fox jumped over the lazy dog”

c

P(w|c)

Extract pairs from context window around every input word.

IN = training word = context

Page 12: Lda2vec text by the bay 2016 with notes

word

2vec

“The fox jumped over the lazy dog”

w

P(w|c)

c

Extract pairs from context window around every input word.

Page 13: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

w c

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

Page 14: Lda2vec text by the bay 2016 with notes

word

2vec

“The fox jumped over the lazy dog”

P(w|c)

w c

Extract pairs from context window around every input word.

Page 15: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

c w

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

Page 16: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

c w

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

Page 17: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

c w

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

inner most for-loopv_in was fix over the for loopincrement v_in to point to the next word

Page 18: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

w c

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

Page 19: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

cw

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

…So that at a high level is what we want word2vec to do.

Page 20: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

cw

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

…So that at a high level is what we want word2vec to do.

Page 21: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

cw

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

…So that at a high level is what we want word2vec to do.

Page 22: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

c w

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

…So that at a high level is what we want word2vec to do.

Page 23: Lda2vec text by the bay 2016 with notes

word

2vec

P(w|c)

c w

“The fox jumped over the lazy dog”

Extract pairs from context window around every input word.

called skip grams

…So that at a high level is what we want word2vec to do.

two for loops

Page 24: Lda2vec text by the bay 2016 with notes

objectiv

e

Measure loss between w and c?

How should we define P(w|c)?

Now we’ve defined the high-level update path for the algorithm.

Need to define this prob exactly in order to define our updates.

Boils down to diff between word & context — want to make as similar as possible, and then the probability will go up.

Page 25: Lda2vec text by the bay 2016 with notes

objectiv

e

w . c

How should we define P(w|c)?

Measure loss between w and c?

Use cosine sim.

could imagine euclidean dist, mahalonobis dist

Page 26: Lda2vec text by the bay 2016 with notes

word

2vec

w . c ~ 1

objectiv

e

w

c

vcanada . vsnow ~ 1

Dot product has these properties:Similar vectors have similarly near 1 (if normed)

Page 27: Lda2vec text by the bay 2016 with notes

word

2vec

w . c ~ 0

objectiv

e

w

cvcanada . vdesert ~0

Orthogonal vectors have similarity near 0

Page 28: Lda2vec text by the bay 2016 with notes

word

2vec

w . c ~ -1

objectiv

e

w

c

Orthogonal vectors have similarity near -1

Page 29: Lda2vec text by the bay 2016 with notes

word

2vec

w . c ∈ [-1,1]

objectiv

e

But the inner product ranges from -1 to 1 (when normalized)

Page 30: Lda2vec text by the bay 2016 with notes

word

2vec

But we’d like to measure a probability.

w . c ∈ [-1,1]

objectiv

e

…and we’d like a probability

Page 31: Lda2vec text by the bay 2016 with notes

word

2vec

But we’d like to measure a probability.

objectiv

e

∈ [0,1]σ(c·w)

Transform again using sigmoid

Page 32: Lda2vec text by the bay 2016 with notes

word

2vec

But we’d like to measure a probability.

objectiv

e

∈ [0,1]σ(c·w)

w c

w c

SimilarDissimilar

Transform again using sigmoid

Page 33: Lda2vec text by the bay 2016 with notes

word

2vec

Loss function:

objectiv

e

L=σ(c·w)

Logistic (binary) choice. Is the (context, word) combination from our dataset?

Are these 2 similar?

This is optimized with identical vectors… everything looks exactly the same

Page 34: Lda2vec text by the bay 2016 with notes

word

2vec

The skip-gram negative-sampling model

objectiv

e

Trivial solution is that context = word for all vectors

L=σ(c·w)w

c

no contrast! add negative samples

Page 35: Lda2vec text by the bay 2016 with notes

word

2vec

The skip-gram negative-sampling model

L = σ(c·w) + σ(-c·wneg)

objectiv

e

Draw random words in vocabulary.

no contrast! add negative samples

discrimanate that this (w, c) is from observed vocabulary, this is a randomly drawn word

example: (fox, jumped) but (-fox, career)

no popularity guessing as in softmax

Page 36: Lda2vec text by the bay 2016 with notes

word

2vec

The skip-gram negative-sampling model

objectiv

e

Discriminate positive from negative samples

Multiple Negative

L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg)

mult samples(fox, jumped)

not:(fox, career)(fox, android)(fox, sunlight)

remind: this L is being computed, and we’ll nudge all of the values of c & w via GD to optimize L

so that’s SGNS / w2vThat’s it! A bit disservice to make this a giant network

Page 37: Lda2vec text by the bay 2016 with notes

word

2vec

The SGNS ModelPMI

ci·wj = PMI(Mij) - log k

…is extremely similar to matrix factorization!

Levy & Goldberg 2014

L = σ(c·w) + σ(-c·wneg)

explain what matrix is, row, col, entries

e.g. can solve word2vec via SVD if we want deterministically

One of the most cited NLP papers of 2014

Page 38: Lda2vec text by the bay 2016 with notes

word

2vec

The SGNS ModelPMI

Levy & Goldberg 2014

‘traditional’ NLP

L = σ(c·w) + σ(-c·wneg)

ci·wj = PMI(Mij) - log k

…is extremely similar to matrix factorization!

Dropped indices for readability

most cited because connection to PMIInfo-theoretic measure association of (w and c)

Page 39: Lda2vec text by the bay 2016 with notes

word

2vec

The SGNS Model

L = σ(c·w) + Σσ(-c·w)

PMI

ci·wj = log

Levy & Goldberg 2014

#(ci,wj)/n

k #(wj)/n #(ci)/n

‘traditional’ NLP

instead of looping over all words, you can count and roll them up the way

props are just counts divided by number of obs

word2vec adds this bias term

cool that all of for looping did before can be reduced to this form

Page 40: Lda2vec text by the bay 2016 with notes

word

2vec

The SGNS Model

L = σ(c·w) + Σσ(-c·w)

PMI

ci·wj = log

Levy & Goldberg 2014

popularity of c,wk (popularity of c) (popularity of w)

‘traditional’ NLP

props are just counts divided by number of obs

word2vec adds this bias term

down weights rare terms

More frequent words are weighted more so than infrequent words. Rare words are downweighted.

Make sense given that words distribution follows Zipf law.

This weighting is what makes SGNS so powerful.

theoretical is nice, but practical is even better

Page 41: Lda2vec text by the bay 2016 with notes

word

2vec

PMI

99% of word2vec is counting.

And you can count words in SQL

Page 42: Lda2vec text by the bay 2016 with notes

word

2vec

PMI

Count how many times you saw c·w

Count how many times you saw c

Count how many times you saw w

Page 43: Lda2vec text by the bay 2016 with notes

word

2vec

PMI

…and this takes ~5 minutes to compute on a single core. Computing SVD is a completely standard math library.

Page 44: Lda2vec text by the bay 2016 with notes

word2vec

explain table

Page 45: Lda2vec text by the bay 2016 with notes

intuition about word vectors

Page 46: Lda2vec text by the bay 2016 with notes

Showing just 2 of the ~500 dimensions. Effectively we’ve PCA’d itonly 4 of 100k words

Page 47: Lda2vec text by the bay 2016 with notes
Page 48: Lda2vec text by the bay 2016 with notes
Page 49: Lda2vec text by the bay 2016 with notes
Page 50: Lda2vec text by the bay 2016 with notes
Page 51: Lda2vec text by the bay 2016 with notes
Page 52: Lda2vec text by the bay 2016 with notes
Page 53: Lda2vec text by the bay 2016 with notes
Page 54: Lda2vec text by the bay 2016 with notes
Page 55: Lda2vec text by the bay 2016 with notes
Page 56: Lda2vec text by the bay 2016 with notes
Page 57: Lda2vec text by the bay 2016 with notes

If we only had locality and not regularity, this wouldn’t necessarily be true

Page 58: Lda2vec text by the bay 2016 with notes
Page 59: Lda2vec text by the bay 2016 with notes
Page 60: Lda2vec text by the bay 2016 with notes

So we live in a vector space where operations like addition and subtraction are semantically meaningful.

So here’s a few examples of this working.

Really get the idea of these vectors as being ‘mixes’ of other ideas & vectors

Page 61: Lda2vec text by the bay 2016 with notes

ITEM_3469 + ‘Pregnant’

SF is a person service

Box

Page 62: Lda2vec text by the bay 2016 with notes

+ ‘Pregnant’

I love the stripes and the cut around my neckline was amazing

someone else might write ‘grey and black’

subtlety and nuance in that language

For some items, we have several times the collected works of shakespeare

Page 63: Lda2vec text by the bay 2016 with notes

= ITEM_701333 = ITEM_901004 = ITEM_800456

Page 64: Lda2vec text by the bay 2016 with notes

Stripes and are safe for maternityAnd also similar tones and flowy — still great for expecting mothers

Page 65: Lda2vec text by the bay 2016 with notes

what about?LDA?

Page 66: Lda2vec text by the bay 2016 with notes

LDA on Client Item Descriptions

This shows the incredible amount of structure

Page 67: Lda2vec text by the bay 2016 with notes

LDA on Item

Descriptions (with Jay)

clunky jewelrydangling delicate jewelry elsewhere

Page 68: Lda2vec text by the bay 2016 with notes

LDA on Item

Descriptions (with Jay)

topics on patterns, styles — this cluster is similarly described as high contrast tops with popping colors

Page 69: Lda2vec text by the bay 2016 with notes

LDA on Item

Descriptions (with Jay)

bright dresses for a warm summer

LDA helps us model topics over documents in an interpretable way

Page 70: Lda2vec text by the bay 2016 with notes

lda vs word2vec

Page 71: Lda2vec text by the bay 2016 with notes

Bayesian Graphical ModelML Neural Model

Page 72: Lda2vec text by the bay 2016 with notes

word2vec is local: one word predicts a nearby word

“I love finding new designer brands for jeans”

as if the world where one very long text string. no end of documents, no end of sentence, etc.

and a window across words

Page 73: Lda2vec text by the bay 2016 with notes

“I love finding new designer brands for jeans”

But text is usually organized.

as if the world where one very long text string. no end of documents, no end of sentence, etc.

Page 74: Lda2vec text by the bay 2016 with notes

“I love finding new designer brands for jeans”

But text is usually organized.

as if the world where one very long text string. no end of documents, no end of sentence, etc.

Page 75: Lda2vec text by the bay 2016 with notes

“I love finding new designer brands for jeans”

In LDA, documents globally predict words.

doc 7681

these are client comment which are short, only predict dozens of words

but could be legal documents, or medical documents, 10k words

Page 76: Lda2vec text by the bay 2016 with notes

typical word2vec vector

[ 0%, 9%, 78%, 11%]

typical LDA document vector

[ -0.75, -1.25, -0.55, -0.12, +2.2]

All sum to 100%All real values

Page 77: Lda2vec text by the bay 2016 with notes

5D word2vec vector

[ 0%, 9%, 78%, 11%]

5D LDA document vector

[ -0.75, -1.25, -0.55, -0.12, +2.2]

Sparse All sum to 100%

Dimensions are absolute

Dense All real values

Dimensions relative

much easier to say to another human 78% than it is +2.2 of something and -1.25 of something else

w2v an address — 200 main st. — figure out from neighbors

LDA is a *mixture* model

78% of some ingredient

but

w2v isn’t -1.25 of some ingredient

ingredient = topics

Page 78: Lda2vec text by the bay 2016 with notes

100D word2vec vector

[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]

100D LDA document vector

[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]

Sparse All sum to 100%

Dimensions are absolute

Dense All real values

Dimensions relative

dense sparse

lda is sparse, has 95 dims close to zero

Page 79: Lda2vec text by the bay 2016 with notes

100D word2vec vector

[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]

100D LDA document vector

[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]

Similar in fewer ways (more interpretable)

Similar in 100D ways (very flexible)

+mixture +sparse

Page 80: Lda2vec text by the bay 2016 with notes

can we do both? lda2vec

series of expgrain of salt

Page 81: Lda2vec text by the bay 2016 with notes

-1.9 0.85 -0.6 -0.3 -0.5

Lufthansa is a German airline and when

fox

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Lufthansa is a German airline and when

German

word2vec predicts locally: one word predicts a nearby word

read example

We extract pairs of pivot and target words that occur in a moving window that scans across the corpus. For every pair, the pivot word is used to predict the nearby target word.

Page 82: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

German

Document vector predicts a word from

a global context

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

German: French or Spanish+airlines a document vector similar to the word vector for airline.

we know we can add vectors

German + airline: like Lufthansa, Condor Flugdienst, and Aero Lloyd.

A latent vector is randomly initialized for every document in the corpus. Very similar to doc2vec and paragraph vectors.

Page 83: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

We’re missing mixtures & sparsity!

German

Good for training sentiment models, great scores.

interpretability

Page 84: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

We’re missing mixtures & sparsity!

German

Too many documents.

about as interpretable a hash

Page 85: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

Now it’s a mixture.

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

German

document X is +0.34 in topic 0, -0.1 in topic 2 and 0.17 in topic 3

model with 3 topics.

before had 500 DoF; now has just a few.

better choose really good topics, because I only have a few available to summarize the entire document

Page 86: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

Trinitarian baptismal

Pentecostals Bede

schismatics excommunication

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

#topicsDocument weight

Each topic has a distributed representation that lives in the same space as the word vectors. While each topic is not literally a token present in the corpus, it is similar to other tokens.

Page 87: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

topic 1 = “religion” Trinitarian baptismal

Pentecostals Bede

schismatics excommunication

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

#topicsDocument weight

Page 88: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

Milosevic absentee

Indonesia Lebanese Isrealis

Karadzic

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

#topicsDocument weight

notice one column over in the topic matrix

Page 89: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

topic 2 = “politics” Milosevic absentee

Indonesia Lebanese Isrealis

Karadzic

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

#topicsDocument weight

topic vectors, document vectors, and word vectors all live in the same space

Page 90: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

Page 91: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

The document weights are softmax transformed weights to yield the document proportions.

similar to logistic, but instead of 0-1 now a vectors of %

100% and indicates the topic proportions of a single document

one document might be 41% in topic 0, 26% in topic 1, and 34% in topic 3

very close to LDA-like representations

Page 92: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

1st time i did this, still very dense in percentages. so if 100 topics, had 1% each

might be, but still dense!

a zillion categories, 1% of this, 1% of this, 1% of this…

mathematically works, addition of lots of little bits (distributed)

Page 93: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

Sparsity!

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

34% 32% 34%

t=0

41% 26% 34%

t=10

99% 1% 0%

t=∞

time

init balanced, but dense

Dirichlet likelihood loss encourages proportions vectors to become sparser over time.

relatively simple

start pink, go to white and sparse

Page 94: Lda2vec text by the bay 2016 with notes

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8

Lufthansa is a German airline and when

#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss

Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+

Lufthansa is a German airline and when

German

end up with something that’s quite a bit more complicated

but it achieves our goals: mixes word vectors w/ sparse interpretable document representations

Page 95: Lda2vec text by the bay 2016 with notes

@chrisemoody Example Hacker News comments

Word vectors: https://github.com/cemoody/

lda2vec/blob/master/examples/hacker_news/lda2vec/

word_vectors.ipynb

read examples

play around at home — on

Page 96: Lda2vec text by the bay 2016 with notes

@chrisemoody Example Hacker News comments

Topics: http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/

hacker_news/lda2vec/lda2vec.ipynb

http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/lda2vec.ipynb

topic 16 — sci, phystopic 1 — housingtopic 8 — finance, bitcointopic 23 — programming languages

topic 6 — transportationtopic 3 — education

Page 97: Lda2vec text by the bay 2016 with notes

@chrisemoody

lda2vec.com

http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb

Page 98: Lda2vec text by the bay 2016 with notes

+ API docs + Examples + GPU + Tests

@chrisemoody

lda2vec.com

http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb

Page 99: Lda2vec text by the bay 2016 with notes

@chrisemoody

lda2vec.com

human-interpretable doc topics, use LDA.

machine-useable word-level features, use word2vec.

if you like to experiment a lot, and have topics over user / doc / region / etc. features, use lda2vec. (and you have a GPU)

If you want…

http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb

Page 100: Lda2vec text by the bay 2016 with notes

?@chrisemoody

Multithreaded Stitch Fix

Page 101: Lda2vec text by the bay 2016 with notes

@chrisemoody

lda2vec.com

http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb

Page 102: Lda2vec text by the bay 2016 with notes

CreditLarge swathes of this talk are from

previous presentations by:

• Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper

Richar & Xin Rong both ludic explanations of the word2vec gradient

Page 103: Lda2vec text by the bay 2016 with notes

“PS! Thank you for such an awesome idea”

@chrisemoody

doc_id=1846

Can we model topics to sentences? lda2lstm

Data Labs @ SF is all about mixing cutting edge algorithms but we absolutely need interpretability.

initial vector is a dirichlet mixture — moves us from bag of words to sentence-level LDA

give a sent that’s 80% religion, 10% politics

word2vec on word level, LSTM on the sentence level, LDA on document level

Dirichlet-squeeze internal states and manipulations, that’ll help us understand the science of LSTM dynamics

Page 104: Lda2vec text by the bay 2016 with notes

Can we model topics to sentences? lda2lstm

“PS! Thank you for such an awesome idea”doc_id=1846

@chrisemoody

Can we model topics to images? lda2ae

TJ Torres

Page 105: Lda2vec text by the bay 2016 with notes

and now for something completely crazy4Fun Stuff

Page 106: Lda2vec text by the bay 2016 with notes

translation

(using just a rotation matrix)

Miko

lov

2013

English

Spanish

Matrix Rotation

Blow mind

Explain plot

Not a complicated NN here

Still have to learn the rotation matrix — but it generalizes very nicely.

Have analogies for every linalg op as a linguistic operator

Robust framework and tools to do science on words

Page 107: Lda2vec text by the bay 2016 with notes

deepwalk

Perozz

i

et al 2

014

learn word vectors from sentences

“The fox jumped over the lazy dog”

vOUT vOUT vOUT vOUT vOUTvOUT

‘words’ are graph vertices ‘sentences’ are random walks on the graph

word2vec

Page 108: Lda2vec text by the bay 2016 with notes

Playlists at Spotify

context

sequence

lear

ning

‘words’ are song indices ‘sentences’ are playlists

Page 109: Lda2vec text by the bay 2016 with notes

Playlists at Spotify

contextErik

Bernhar

dsson

Great performance on ‘related artists’

Page 110: Lda2vec text by the bay 2016 with notes

Fixes at Stitch Fix

sequence

lear

ning

Let’s try: ‘words’ are items ‘sentences’ are fixes

Page 111: Lda2vec text by the bay 2016 with notes

Fixes at Stitch Fix

context

Learn similarity between styles because they co-occur

Learn ‘coherent’ styles

sequence

lear

ning

Page 112: Lda2vec text by the bay 2016 with notes

Fixes at Stitch Fix?

context

sequence

lear

ningGot lots of structure!

Page 113: Lda2vec text by the bay 2016 with notes

Fixes at Stitch Fix?

context

sequence

lear

ning

Page 114: Lda2vec text by the bay 2016 with notes

Fixes at Stitch Fix?

context

sequence

lear

ning

Nearby regions are consistent ‘closets’

What sorts of sequences do you have at Quora? What kinds of things can you learn from context?

Page 115: Lda2vec text by the bay 2016 with notes

?@chrisemoody

Multithreaded Stitch Fix

Page 116: Lda2vec text by the bay 2016 with notes

context dependent

Levy

& G

oldberg

2014

Australian scientist discovers star with telescopecontext +/- 2 words

Page 117: Lda2vec text by the bay 2016 with notes

context dependent

context

Australian scientist discovers star with telescope

Levy

& G

oldberg

2014

What if we

Page 118: Lda2vec text by the bay 2016 with notes

context dependent

context

Australian scientist discovers star with telescopecontext

Levy

& G

oldberg

2014

Page 119: Lda2vec text by the bay 2016 with notes

context dependent

context

BoW DEPS

topically-similar vs ‘functionally’ similar

Levy

& G

oldberg

2014

Page 120: Lda2vec text by the bay 2016 with notes

?@chrisemoody

Multithreaded Stitch Fix

Page 121: Lda2vec text by the bay 2016 with notes
Page 122: Lda2vec text by the bay 2016 with notes

Crazy Approaches

Paragraph Vectors (Just extend the context window)

Content dependency (Change the window grammatically)

Social word2vec (deepwalk) (Sentence is a walk on the graph)

Spotify (Sentence is a playlist of song_ids)

Stitch Fix (Sentence is a shipment of five items)

Page 123: Lda2vec text by the bay 2016 with notes

See previous

Page 124: Lda2vec text by the bay 2016 with notes

CBOW

“The fox jumped over the lazy dog”

Guess the word given the context

~20x faster. (this is the alternative.)

vOUT

vIN vINvIN vINvIN vIN

SkipGram

“The fox jumped over the lazy dog”

vOUT vOUT

vIN

vOUT vOUT vOUTvOUT

Guess the context given the word

Better at syntax. (this is the one we went over)

CBOW sums words vectors, loses the order in the sentenceBoth are good at semantic relationships Child and kid are nearby Or gender in man, woman If you blur words over the scale of context — 5ish words, you lose a lot grammatical nuanceBut skipgram preserves order Preserves the relationship in pluralizing, for example

Page 125: Lda2vec text by the bay 2016 with notes

lda2

vec

vDOC = a vtopic1 + b vtopic2 +…

Let’s make vDOC sparse

Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …

Page 126: Lda2vec text by the bay 2016 with notes

lda2

vec

This works! 😀 But vDOC isn’t as interpretable as the topic vectors. 😔

vDOC = topic0 + topic1

Let’s say that vDOC ads

Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …

Page 127: Lda2vec text by the bay 2016 with notes

lda2

vec

softmax(vOUT * (vIN+ vDOC))

we want k *sparse* topics

Page 128: Lda2vec text by the bay 2016 with notes

Shows that are many words similar to vacation actually come in lots of flavors — wedding words (bachelorette, rehearsals)— holiday/event words (birthdays, brunch, christmas, thanksgiving)— seasonal words (spring, summer,)— trip words (getaway)— destinations

Page 129: Lda2vec text by the bay 2016 with notes

theory of lda2vec

lda2

vec

Page 130: Lda2vec text by the bay 2016 with notes

pyLDAvis of lda2vec

lda2

vec

Page 131: Lda2vec text by the bay 2016 with notes

LDA Results

context

History

I loved every choice in this fix!! Great job!

Great Stylist Perfect

There are k tags

Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"

Page 132: Lda2vec text by the bay 2016 with notes

LDA Results

context

History

Body Fit

My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted.

Very hard for me to find pants that fit right.

There are k tags

Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"

Page 133: Lda2vec text by the bay 2016 with notes

LDA Results

context

History

Sizing

Really enjoyed the experience and the pieces, sizing for tops was too big.

Looking forward to my next box!

Excited for next

There are k tags

Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"

Page 134: Lda2vec text by the bay 2016 with notes

LDA Results

context

History

Almost Bought

It was a great fix. Loved the two items I kept and the three I sent back were close!

Perfect

There are k tags

Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"

Page 135: Lda2vec text by the bay 2016 with notes

All of the following ideas will change what ‘words’ and ‘context’ represent.

But we’ll still use the same w2v algo

Page 136: Lda2vec text by the bay 2016 with notes

parag

raph

vecto

r

What about summarizing documents?

On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that

Page 137: Lda2vec text by the bay 2016 with notes

On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that

The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.

parag

raph

vecto

r

Normal skipgram extends C words before, and C words after.

IN

OUT OUT

Except we stay inside a sentence

Page 138: Lda2vec text by the bay 2016 with notes

On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that

The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.

parag

raph

vecto

r

A document vector simply extends the context to the whole document.

IN

OUT OUT

OUT OUTdoc_1347

Page 139: Lda2vec text by the bay 2016 with notes

fromgensim.modelsimportDoc2Vecfn=“item_document_vectors”model=Doc2Vec.load(fn)model.most_similar('pregnant')matches=list(filter(lambdax:'SENT_'inx[0],matches))

#['...Iamcurrently23weekspregnant...',#'...I'mnow10weekspregnant...',#'...notshowingtoomuchyet...',#'...15weeksnow.Babybump...',#'...6weekspostpartum!...',#'...12weekspostpartumandamnursing...',#'...Ihavemybabyshowerthat...',#'...amstillbreastfeeding...',#'...Iwouldloveanoutfitforababyshower...']

sente

nce

sear

ch

Page 140: Lda2vec text by the bay 2016 with notes
Page 141: Lda2vec text by the bay 2016 with notes
Page 142: Lda2vec text by the bay 2016 with notes
Page 143: Lda2vec text by the bay 2016 with notes
Page 144: Lda2vec text by the bay 2016 with notes
Page 145: Lda2vec text by the bay 2016 with notes
Page 146: Lda2vec text by the bay 2016 with notes