word2vec with tensorflow...word embedding • how to represent words in machine learning? -treat...
TRANSCRIPT
Word2VecwithTensorFlowDepartment of Computer Science, National Tsing Hua University, Taiwan
Deep Learning
Wordembedding• Howtorepresentwordsinmachinelearning?
- Treatwordsasdiscreteatomicsymbols,andtherefore‘cat’mayberepresentedas‘1’and‘pet’as‘2.’
- However,thisembeddingincludesnosemanEcs.TaskslearningfromthisrepresentaEoncausesmodelshardtotrain.
- BeHersoluEons?
My favorite pet is cat .
Wordembedding• IntuiEon:ModelingrelaEonshipsbetweenwords.
- SemanEcsSimilarity
๏ CBOW(ConEnuousBag-of-Wordsmodel),
๏ Skip-Gram
- Grammar/Syntax
CBOW&Skip-Gram• AssumpEonsinCBOW&Skip-Gram:
- Wordswithsimilarmeaning/domainhavetheclosecontexts.
My favorite pet is ___ . cat, dog
CBOW&Skip-Gram• CBOW:PredicEngthetarget
wordswithcontextwords.• Skip-Gram:PredicEng
contextwordsfromthetargetwords.
CBOW&Skip-Gram• Forexample,giventhesentence:
thequickbrownfoxjumpedoverthelazydog.
• CBOWwillbetrainedonthedataset:([the, brown], quick), ([quick, fox], brown), …
• Skip-Gramwillbetrainedonthedataset:(quick, the), (quick, brown), (brown, quick), (brown, fox)…
Picture from https://kavita-ganesan.com/
Skip-Gram
Picture from https://lilianweng.github.io/
SoWmax&WordsclassificaEon
• RecapthesoWmaxfuncEon:
• Whiletrainingtheembedding,weneedtosumovertheenErevocabularyinthedenominator,whichiscostly.
σ(z)i =ezi
∑Kj=1 ezj
p(w = i |x) =exp(h⊤vw′ �i
)∑j∈V exp(h⊤vw′�j
)
(vocabulary size) is thousands of millions|V |
NoiseContrasEveEsEmaEon
• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .
• ObjecEvefuncEon:
• Deriveform:
w w̃
maximum likelihood estimation → noise contrastive estimation
arg max log p(d = 1 |wt , x) +N
∑̃wi∼Q
log p(d = 0 | w̃i, x)
arg max logp(wt |x)
p(wt |x) + Nq(w̃)+
N
∑̃wi∼Q
logNq(w̃i)
p(wt |x) + Nq(w̃i)
NoiseContrasEveEsEmaEon
• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .
• AWertraining:evaluatewithsoWmaxfuncEon.
w w̃
maximum likelihood estimation → noise contrastive estimation
Homework• Goal:pracEcemodelsubclassing
- Trainyourword2vecviamodelsubclassing
- Deadline:2020-11-5(Thur)23:59
Homework• FuncEonalAPI:
Homework• Modelsubclassing:
Reference• hHps://ruder.io/word-embeddings-soWmax/
index.html#noisecontrasEveesEmaEon
• hHps://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html
• hHps://aegis4048.github.io/opEmize_computaEonal_efficiency_of_skip-gram_with_negaEve_sampling
• hHp://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf
• hHps://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrasEve-esEmaEon.pdf
• hHps://www.tensorflow.org/api_docs/python/k/nn/nce_loss