nips2013読み会: distributed representations of words and phrases and their compositionality

16
Distributed Representations of Words and Phrases and their Compositionality (株)Preferred Infrastructure 海野 裕也 (@unnonouno) 2014/01/23 NIPS2013読み会@東京大学

Upload: yuya-unno

Post on 15-Jan-2015

10.205 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

  • 1. 2014/01/23NIPS2013@ Distributed Representations of Words and Phrases and their Compositionality Preferred Infrastructure (@unnonouno)

2. (@unnonouno) l Preferred Infrastructure (PFI) ll llJubauts http://jubat.us l l 2 3. lMikolovICLR2013word2vec llBerlin German + France = Paris!! l lBefore: After: 15~303 4. word2vec [Mikolov+13] l l vec(Berlin) vec(German) + vec(France) vec(Paris) lParis!! France German Berlin4 5. Skip gram[Mikolov+13] l: w1, w2, , wTwi c5 vww 5 6. l llW = 105 ~ 1076[Mikolov+13] 7. Hierarchical Softmax (HS) [Morin+05]w n3n1n2 (x)=1/(1 + exp(-x)) l l 7 8. Noise Contrastive Estimation (NCE) [Gutmann +12] l l Softmax8 9. Negative Sampling (NEG) 1 log P(wo|wI) = l lNCE k ll5~202~5P(w)1-gram3/4 9 10. 2 l l lathe t10-5f(w) P10 11. l[Mikolov+13]analogical reasoning task ll lvec(Berlin) vec(Germany) + vec(France) vec(Paris)NEGHierarchical SoftmaxNCE 11 12. l l l 12 13. l lNEGHS l72%13 14. l l2 2 AND14 15. llSoftmaxllDistributional Hypothesis lwords that occur in the same contexts tend to have similar meanings (wikipedia)15 16. lll[Mikolov+13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR 2013. [Morin+05] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. AISTATS 2005. [Gutmann+12] Michael U. Gutmann and Aapo Hyvarinen. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics. JMLR 2012. 16