word2vec with tensorflow...word embedding • how to represent words in machine learning? -treat...

18
Word2Vec with TensorFlow Department of Computer Science, National Tsing Hua University, Taiwan Deep Learning

Upload: others

Post on 23-Jan-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Word2VecwithTensorFlowDepartment of Computer Science, National Tsing Hua University, Taiwan

Deep Learning

Page 2: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Wordembedding• Howtorepresentwordsinmachinelearning?

- Treatwordsasdiscreteatomicsymbols,andtherefore‘cat’mayberepresentedas‘1’and‘pet’as‘2.’

- However,thisembeddingincludesnosemanEcs.TaskslearningfromthisrepresentaEoncausesmodelshardtotrain.

- BeHersoluEons?

My favorite pet is cat .

Page 3: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Wordembedding• IntuiEon:ModelingrelaEonshipsbetweenwords.

- SemanEcsSimilarity

๏ CBOW(ConEnuousBag-of-Wordsmodel),

๏ Skip-Gram

- Grammar/Syntax

Page 4: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

CBOW&Skip-Gram• AssumpEonsinCBOW&Skip-Gram:

- Wordswithsimilarmeaning/domainhavetheclosecontexts.

My favorite pet is ___ . cat, dog

Page 5: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

CBOW&Skip-Gram• CBOW:PredicEngthetarget

wordswithcontextwords.• Skip-Gram:PredicEng

contextwordsfromthetargetwords.

Page 6: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

CBOW&Skip-Gram• Forexample,giventhesentence:

thequickbrownfoxjumpedoverthelazydog.

• CBOWwillbetrainedonthedataset:([the, brown], quick), ([quick, fox], brown), …

• Skip-Gramwillbetrainedonthedataset:(quick, the), (quick, brown), (brown, quick), (brown, fox)…

Page 7: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Picture from https://kavita-ganesan.com/

Page 8: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Skip-Gram

Picture from https://lilianweng.github.io/

Page 9: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

SoWmax&WordsclassificaEon

• RecapthesoWmaxfuncEon:

• Whiletrainingtheembedding,weneedtosumovertheenErevocabularyinthedenominator,whichiscostly.

σ(z)i =ezi

∑Kj=1 ezj

p(w = i |x) =exp(h⊤vw′ �i

)∑j∈V exp(h⊤vw′�j

)

(vocabulary size) is thousands of millions|V |

Page 10: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

NoiseContrasEveEsEmaEon

• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .

• ObjecEvefuncEon:

• Deriveform:

w w̃

maximum likelihood estimation → noise contrastive estimation

arg max log p(d = 1 |wt , x) +N

∑̃wi∼Q

log p(d = 0 | w̃i, x)

arg max logp(wt |x)

p(wt |x) + Nq(w̃)+

N

∑̃wi∼Q

logNq(w̃i)

p(wt |x) + Nq(w̃i)

Page 11: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

NoiseContrasEveEsEmaEon

• IntuiEon:EsEmatetheparametersbylearningtodiscriminatebetweenthedata andsomearEficiallygeneratednoise .

• AWertraining:evaluatewithsoWmaxfuncEon.

w w̃

maximum likelihood estimation → noise contrastive estimation

Page 12: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’
Page 13: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’
Page 14: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’
Page 15: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Homework• Goal:pracEcemodelsubclassing

- Trainyourword2vecviamodelsubclassing

- Deadline:2020-11-5(Thur)23:59

Page 16: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Homework• FuncEonalAPI:

Page 17: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Homework• Modelsubclassing:

Page 18: Word2Vec with TensorFlow...Word embedding • How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore ‘cat’ may be represented as ‘1’

Reference• hHps://ruder.io/word-embeddings-soWmax/

index.html#noisecontrasEveesEmaEon

• hHps://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html

• hHps://aegis4048.github.io/opEmize_computaEonal_efficiency_of_skip-gram_with_negaEve_sampling

• hHp://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

• hHps://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrasEve-esEmaEon.pdf

• hHps://www.tensorflow.org/api_docs/python/k/nn/nce_loss