word sense determination from wikipedia data using neural ... · in proceedings of the joint...
TRANSCRIPT
![Page 1: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/1.jpg)
WordSenseDeterminationfromWikipediaDataUsing
NeuralNetworks
AdvisorDr. Chris Pollett
Committee MembersDr. JonPearceDr. Suneuy Kim
ByQiaoLiu
![Page 2: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/2.jpg)
Agenda
• Introduction• Background• ModelArchitecture• DataSetsandDataPreprocessing• Implementation• ExperimentsandDiscussions• ConclusionandFutureWork
![Page 3: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/3.jpg)
Introduction
• Wordsensedisambiguationisthetaskofidentifyingwhichsenseofanambiguouswordisusedinasentence.
in1890,hebecamecustodianoftheMilwaukeepublicmuseumwherehecollectedplant specimensfortheirgreenhouse
…...sendcollectedfluidtoamunicipalsewagetreatmentplant oracommercialwastewatertreatmentfacility
• Wordsensedisambiguationisusefulinnaturallanguageprocessingtasks,suchasspeechsynthesis,questionanswering,andmachinetranslation.
![Page 4: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/4.jpg)
Introduction
Sensediscrimination Senselabeling
Sensediscrimination Senselabeling
WordSenseDisambiguation
Lexicalsampletask
All-wordstaskProjectpurpose
• Twovariantsofwordsensedisambiguationtask:
lexicalsampletaskall-wordstask
• Twosubtasks:sensediscriminationsenselabeling
![Page 5: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/5.jpg)
Introduction
Sensediscrimination Senselabeling
Sensediscrimination Senselabeling
WordSenseDisambiguation
Lexicalsampletask
All-wordstaskProjectpurpose
• Twovariantsofwordsensedisambiguationtask:
lexicalsampletaskall-wordstask
• Twosubtasks:sensediscriminationsenselabeling
![Page 6: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/6.jpg)
Background
ExistingWork
![Page 7: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/7.jpg)
Background
Approach1:Dictionary-based
Givenatargetwordt tobedisambiguatedinContextc.1. retrieveallthesensedefinitionsfortfromadictionary.2. selectthesenseswhosedefinitionhavethemostoverlapwithcoft.
• Thisapproachrequiresahand-builtmachinereadablesemanticsensedictionary.
![Page 8: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/8.jpg)
Background
Approach2:Supervisedmachinelearning
1. Extractasetoffeaturesfromthecontextofthetargetword.2. Usethefeaturetotrainclassifiersthatcanlabelambiguouswordsin
newtext.
• Thisapproachrequirescostlylargehand-builtresources,becauseeachambiguouswordneedbelabelledintrainingdata.
• Asemi-supervisedapproachwasproposedin1995byYarowsky.Inthisapproach,theydonotrelyonalargehand-builtdata,duetousingbootstrappingtogeneratedictionaryfromasmallhand-labeledseed-set.
![Page 9: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/9.jpg)
Background
Approach3:Unsupervisedmachinelearning
Interpretthesenseoftheambiguouswordasclustersofsimilarcontexts.Contextsandwordsarerepresentedbyahigh-dimensional,real-valuedvectorusingco-occurrencecounts.
• Inourproject,weuseamodificationofthisapproach:• Wordembeddings aretrainedusingWikipediapages.• Wordvectorsofcontextscomputedbytheseembeddingarethenclustered.• Givenanewwordtodisambiguate,weuseitscontextandtheword
embeddingtofindawordvectorcorrespondingtothiscontext.Thenwedeterminetheclusteritbelongs.
• Inrelatedwork,Schütze usedadatasettakenfromtheNewYorkTimesNewsService anddidclusteringbutwithadifferentkindofwordvector.
![Page 10: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/10.jpg)
Background
• Wordembeddings
Awordembeddingisaparameterizedfunctionmappingwordsinsomelanguagetohigh-dimensionalvectors(perhaps200to500dimensions)
word→𝑅"W(“plant”)=[0.3,-0.2,0.7,…]W(“crane”)=[0.5,0.4-0.6,…]
![Page 11: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/11.jpg)
ModelArchitecture
• ManyNLPtaskstaketheapproachoffirstlearningagoodwordrepresentationonataskandthenusingthatrepresentationforothertasks.Weusedthisapproachforthewordsensedeterminationtask.
![Page 12: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/12.jpg)
ModelArchitecture
• Learnagoodwordrepresentationofataskandthenusingthatrepresentationforothertasks.
• WeusedtheSkip-grammodelastheneuralnetworklanguagemodellayer
![Page 13: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/13.jpg)
ModelArchitecture
Skip-gramModelArchitecture• Thetrainingobjectivewastolearnwordembeddings goodatpredictingthe
contextwordsinasentence.• Wetrainedtheneuralnetworkbyfeedingitwordpairsoftargetwordand
contextwordfoundinourtrainingdataset.
𝐽$ 𝜃 = ( ( 𝑝(𝑤,-.|𝑤,; 𝜃1�
345.54.67
8
,9:
𝐽 𝜃 = −1𝑉> > log( 𝑝(𝑤,-.|𝑤,; 𝜃)1
�
345.54.67
8
,9:
𝑝 𝑤C 𝑤, =ex p(𝑤CG𝑤,)
∑ ex p(𝑤.G𝑤,18.9:
![Page 14: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/14.jpg)
• k-meansclustering
k-meansisasimpleunsupervisedclassificationalgorithm.Theaimofthek-meansalgorithmistodividempointsinndimensionsintokclusterssothatthewithin-clustersumofsquaresisminimize
Thedistributionalhypothesissaysthatsimilarwordsappearinsimilarcontexts[9,10].Thus,wecanusek-meanstodivideallvectorsofcontextintokclusters.
ModelArchitecture
![Page 15: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/15.jpg)
• Datasourcehttps://dumps.wikimedia.org/enwiki/20170201/Thepages-articles.xml ofWikipediadatadumpcontainscurrentversionofallarticlepages,templates,andotherpages.
• TrainingdataformodelWordpairs:(targetword,contextword)
DataSetsandDataPreprocessing
Sentence Trainingsamples (windowsize=2)
natural languageprocessingprojectsarefun (natural,language), (natural,processing)
naturallanguage processingprojectsarefun (language,natural), (language,processing), (language,projects)
naturallanguageprocessing projectsarefun (processing,natural), (processing,language), (processing,projects)
naturallanguageprocessingprojects arefun (projects,language), (projects,processing), (projects,are), (projects,fun)
naturallanguageprocessingprojectsare fun (are,processing), (are,project), (are,fun)
naturallanguageprocessingprojectsarefun (fun,projects), (fun,are)
![Page 16: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/16.jpg)
DataSetandDataPreprocessing
Stepstoprocessdata:• Extracted90Msentences
• Countedwords,createdadictionaryandareverseddictionary
• Regeneratedsentences
• Created5Bwordpairs
![Page 17: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/17.jpg)
Implementation
Theoptimizer:• Gradientdescent findstheminimumofafunctionbytakingsteps
proportionaltothe positive ofthegradient.Ineachiterationofgradientdescent,weneedtocalculateallexamples.
• Insteadofcomputingthegradientofthewholetrainingset,eachiterationofstochasticgradientdescent onlyestimatesthisgradientbasedonabatchofrandomlypickedexamples.
Weusedstochasticgradientdescenttooptimizethevectorrepresentationduringtraining.
![Page 18: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/18.jpg)
Implementation
Theparameters:Parameters Meaning
VOC_SIZE Thevocabularysize.
SKIP_WINDOW Thewindowsizeoftextwordsaroundtargetword.
NUM_SKIPS Thenumberofcontextwords,whichwillberandomlytooktogeneratewordpairs.
EMBEDDING_SIZE Thenumberofparametersinthewordembedding.Thesizeofthewordvector.
LR Thelearningrateofgradientdescent
BATCH_SIZE Thesizeofeachbatchinstochasticgradientdescent.Runningonebatch isonestep.
NUM_STEPS Thenumberoftrainingstep.
NUM_SAMPLE Thenumberofnegativesamples.
![Page 19: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/19.jpg)
Implementation
Toolsandpackages:
• TensorFlow r1.4• TensorBoard 0.1.6• Python2.7.10• WikipediaExtractorv2.55• sklearn.cluster [15]• numpy
![Page 20: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/20.jpg)
ExperimentsandDiscussions
TheexperimentalresultsarecomparedwithSchütze’sunsupervisedlearningapproachin1998:• Schütze usedadataset(435M)takenfromtheNewYork
TimesNewsService.WeusedthedatasetextractedfromWikipediapages(12G).
• Schütze usedco-occurrencecountstogeneratevectors,whichhadlargenumbersofvectordimension(1,000/2,000).WeusedtheSkip-grammodeltolearnadistributedwordrepresentationwithadimensionof250.
• Schütze appliedsingular-valuedecompositionduetolargenumbersofvectordimension.Takingadvantageofasmallernumberofdimension,wedidnotneedtoperformmatrixdecomposition.
![Page 21: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/21.jpg)
• WeexperimentedtheSkip-grammodelwithdifferentparametersandselectedonewordembeddingforclustering.
• Skip-grammodelparameters
ExperimentsandDiscussions
![Page 22: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/22.jpg)
Experimentwithskip-grammodel• Used“averageloss”toestimatetheloss
overevery100Kbatches.• Visualizedsomewords’nearestwords.
ExperimentsandDiscussions
![Page 23: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/23.jpg)
Experimentwithclassifyingwordsenses• Clusteredthecontextsoftheoccurrencesofgivenambiguouswordinto
two/threecoherentgroups.• Manuallyassignedlabelstotheoccurrencesofambiguouswordsinthetest
corpus,andcomparethemwithmachinelearnedlabelstocalculateaccuracy.• Beforewordsensedetermination,weassignedalloccurrencestothemost
frequentmeaning,andusedthefractionasthebaseline.
𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑤𝑖𝑡ℎ𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑙𝑒𝑎𝑟𝑛𝑒𝑑𝑠𝑒𝑛𝑠𝑒𝑙𝑎𝑏𝑒𝑙𝑇ℎ𝑒𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑡𝑒𝑠𝑡𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠
accuracy =
ExperimentsandDiscussions
![Page 24: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/24.jpg)
• “Schütze’s baseline”columngivesthefractionofthemostfrequentsenseinhisdatasets.
• “Schütze’s accuracy”columngivestheresultsofhisdisambiguationexperimentswithlocaltermsfrequencyifapplicable.
• Wegotbetteraccuracyoutofexperimentswith“capital”and“plant”.
• However,themodelcannotdeterminethesensesofword“interest”and“sake”,whichhasabaselineover85%inourdatasets.
ExperimentsandDiscussions
![Page 25: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/25.jpg)
Discussions• Ourdatasets(12G)aremuchlargerthanSchütze’s datasets(435M).
Forexample,thesizeofhistrainingsetforword“capital”is13,015,andoursis179,793.Thelargerdatasetsmighthavehelpedtoincreasetheaccuracyforsomewords.
• Wealsoobservedthatwhenthebaselineishigh(>=85%),themodelcannotdeterminethesensesoftheword.Theperformanceofunsupervisedlearningreliesonsufficientinformationfromthetrainingdata.However,themodeldidn’tgettrainedwithsufficientdatacarryinglessfrequentmeanings.
• Thesizeofthetrainingdata,andthedistributionofthesensesofthetargetwordhassignificantinfluenttotheperformanceofthemodel.
ExperimentsandDiscussions
![Page 26: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/26.jpg)
Conclusion
• Inthisproject,weutilizedthedistributionalwordrepresentationandthedistributionalhypothesistobuildamodularmodeltoclassifythesensesofambiguouswords.
• Ourexperimentsshowedourmodelperformedwellwhenanambiguouswordhadeachsenseaccountsforthan20%ofoccurrencesinthetrainingdataset.
ConclusionandFutureWork
![Page 27: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/27.jpg)
FutureWork• Optimizetheclassifier.Onepossibleapproachmightbeusing
weightedsumofcontextsbytakingIDFintoaccount.• Extendandexperimentthisapproachtoothermodelswith
differentclassifiers.Theclassifierwhichworkswellwhenoccurrencesareskewedtooneclassmightimprovetheaccuracyforwordswithlargeportionofoccurrencesareusingthemostfrequentsense.
• Tokenizethecorpus,wecouldreducethetimecostoftrainingbyreducingvocabularysize.
ConclusionandFutureWork
![Page 28: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/28.jpg)
• Y.Bengio,R.Ducharme,P.Vincent.Aneuralprobabilisticlanguagemodel.JournalofMachineLearningResearch,3:1137-1155,2003.
• TomasMikolov,KaiChen,GregCorrado,andJeffreyDean.Efficientestimationofwordrepresentationsinvectorspace.ICLRWorkshop,2013.
• G.E.Hinton,J.L.McClelland,D.E.Rumelhart.Distributedrepresentations.In:Paralleldistributedprocessing:Explorationsinthemicrostructureofcognition.Volume1:Foundations,MITPress,1986.
• T.Brants,A.C.Popat,P.Xu,F.J.Och,andJ.Dean.Largelanguagemodelsinmachinetranslation.InProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalLanguageLearning,2007.
• DavidERumelhart,GeoffreyEHintont,andRonaldJWilliams.Learningrepresentationsbybackpropagating errors.Nature,323(6088):533–536,1986.
• H.Schwenk.Continuousspacelanguagemodels.ComputerSpeechandLanguage,vol.21,2007.• T.Mikolov,A.Deoras,S.Kombrink,L.Burget,J.Cˇernocky´.EmpiricalEvaluationandCombination
ofAdvancedLanguageModelingTechniques,In:ProceedingsofInterspeech,2011.
References
![Page 29: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/29.jpg)
• TomasMikolov,IlyaSutskever,KaiChen,GregS.Corrado,andJeffDean.Distributedrepresentationsofwordsandphrasesandtheircompositionality.InAdvancesinNeuralInformationProcessingSystems,2013a.
• JamesR.CurranandMarcMoens.Improvementsinautomaticthesaurusextraction.InProceedingsoftheACL-02workshoponUnsupervisedlexicalacquisition,pages59–66.2002.
• PatrickPantel andDekang Lin.Discoveringwordsensesfromtext.InProc.OfSIGKDD-02,pages613–619,NewYork,NY,USA.ACM.2002.
• MichaelLesk.Automaticsensedisambiguationusingmachinereadabledictionaries:Howtotellapineconefromanicecreamcone.InProceedingsofSIGDOC,pages24-26,1986.
• Olah,Christopher.DeepLearning,NLP,andRepresentations.Retrievedfromhttp://colah.github.io/posts/2014-07-NLP-RNNs-Representations/.2014
• Hartigan,J.A.andWong,M.A.AlgorithmAS136:AK-MeansClusteringAlgorithm.JournaloftheRoyalStatisticalSociety.SeriesC(AppliedStatistics).28(1):pages100–108,1979.
• Schütze,Hinrich.Dimensionsofmeaning.InProceedingsofSupercomputing’92,pages787-796,1992.
References
![Page 30: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/30.jpg)
• Pedregosa etal.,Scikit-learn:MachineLearninginPython,JMLR12,pp.2825-2830,2011.• MichaelUGutmann andAapo Hyv¨arinen.Noise-contrastiveestimationofunnormalized
statisticalmodels,withapplicationstonaturalimagestatistics.TheJournalofMachine LearningResearch,13:307–361,2012.
• Bottou L.(2010)Large-ScaleMachineLearningwithStochasticGradientDescent.In:LechevallierY.,Saporta G.(eds)ProceedingsofCOMPSTAT'2010.Physica-Verlag HD
• TensorFlow Tutorial,tf.nn.nce_loss.Retriveved fromhttps://www.tensorflow.org/api_docs/python/tf/nn/nce_loss.2017
• McCormick,C,Word2VecTutorialPart2- NegativeSampling.Retrievedfrom http://www.mccormickml.com,2017,January11.
• D.Yarowsky,Unsupervisedwordsensedisambiguationrivalingsupervisedmethods,Proc.33rdAnnualmeetingoftheACL,Cambridge,MA,USA,pp189-196,1995.
• Schütze,Hinrich,Automaticwordsensediscrimination,ComputationalLinguistics,v.24n.1,March1998
References
![Page 31: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/31.jpg)
Questions
Thank You!
![Page 32: Word Sense Determination from Wikipedia Data Using Neural ... · In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language](https://reader036.vdocuments.mx/reader036/viewer/2022070810/5f0945db7e708231d426086c/html5/thumbnails/32.jpg)
Appendix: ModelArchitecture
Skip-grammodelarchitecture• Wetrainedtheneuralnetworkbyfeedingitwordpairsoftargetword
andcontextwordfoundinourtrainingdataset.