word2vec with tensorflow...word embedding • how to represent words in machine learning? -treat...

Word2VecwithTensorFlowDepartment of Computer Science, National Tsing Hua University, Taiwan

Deep Learning

Wordembedding• Howtorepresentwordsinmachinelearning?

- Treatwordsasdiscreteatomicsymbols,andtherefore‘cat’mayberepresentedas‘1’and‘pet’as‘2.’

- However,thisembeddingincludesnosemanEcs.TaskslearningfromthisrepresentaEoncausesmodelshardtotrain.

- BeHersoluEons?

My favorite pet is cat .

Wordembedding• IntuiEon:ModelingrelaEonshipsbetweenwords.

- SemanEcsSimilarity

๏ CBOW(ConEnuousBag-of-Wordsmodel),

๏ Skip-Gram

- Grammar/Syntax

CBOW&Skip-Gram• AssumpEonsinCBOW&Skip-Gram:

- Wordswithsimilarmeaning/domainhavetheclosecontexts.

My favorite pet is ___ . cat, dog

CBOW&Skip-Gram• CBOW:PredicEngthetarget

wordswithcontextwords.• Skip-Gram:PredicEng

contextwordsfromthetargetwords.

CBOW&Skip-Gram• Forexample,giventhesentence:

thequickbrownfoxjumpedoverthelazydog.

• CBOWwillbetrainedonthedataset:([the, brown], quick), ([quick, fox], brown), …

• Skip-Gramwillbetrainedonthedataset:(quick, the), (quick, brown), (brown, quick), (brown, fox)…

Picture from https://kavita-ganesan.com/

Skip-Gram

Picture from https://lilianweng.github.io/

SoWmax&WordsclassificaEon

• RecapthesoWmaxfuncEon:

• Whiletrainingtheembedding,weneedtosumovertheenErevocabularyinthedenominator,whichiscostly.

σ(z)i =ezi

∑Kj=1 ezj

p(w = i |x) =exp(h⊤vw′ �i

)∑j∈V exp(h⊤vw′�j

)

(vocabulary size) is thousands of millions|V |

NoiseContrasEveEsEmaEon

• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .

• ObjecEvefuncEon:

• Deriveform:

w w̃

maximum likelihood estimation → noise contrastive estimation

arg max log p(d = 1 |wt , x) +N

∑̃wi∼Q

log p(d = 0 | w̃i, x)

arg max logp(wt |x)

p(wt |x) + Nq(w̃)+

N

∑̃wi∼Q

logNq(w̃i)

p(wt |x) + Nq(w̃i)

NoiseContrasEveEsEmaEon

• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .

• AWertraining:evaluatewithsoWmaxfuncEon.

w w̃

maximum likelihood estimation → noise contrastive estimation

Homework• Goal:pracEcemodelsubclassing

- Trainyourword2vecviamodelsubclassing

- Deadline:2020-11-5(Thur)23:59

Homework• FuncEonalAPI:

Homework• Modelsubclassing:

Reference• hHps://ruder.io/word-embeddings-soWmax/

index.html#noisecontrasEveesEmaEon

• hHps://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html

• hHps://aegis4048.github.io/opEmize_computaEonal_efficiency_of_skip-gram_with_negaEve_sampling

• hHp://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

• hHps://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrasEve-esEmaEon.pdf

• hHps://www.tensorflow.org/api_docs/python/k/nn/nce_loss

https://ruder.io/word-embeddings-softmax/index.html#noisecontrastiveestimation


https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html

https://aegis4048.github.io/optimize_computational_efficiency_of_skip-gram_with_negative_sampling


http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

https://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf


https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss



https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html



http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf



https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss

word2vec with tensorflow...word embedding • how to represent words in machine learning? -treat...

Documents