ontology-based word sense disambiguation for scientific literature
TRANSCRIPT
Ontology-based Word Sense Disambiguation For Scientific Literature Roman Prokofyev, Gianluca Demartini, Philippe Cudre-Mauroux, Alexey Boyarsky and Oleg Ruchayskiy eXascale Infolab University of Fribourg, Switzerland
March 25, ECIR 2013, Moscow
Problem definition State Space Model
Sequential Standard Model Supersymmetric Standard Model
• Machine translation: correct lexical choice. • Information retrieval: ambiguity in queries, result diversification, etc. • Knowledge extraction: proper text analysis and classification (our case).
Datasets • ScienceWISE abstract dataset + SW ontology
http://sciencewise.info • MSH abstract dataset + ontology from bioontology.org
Available at http://exascale.info/papers/ecir2013disambig
Our contribution: leveraging the structure of community-based ontology to improve correct sense identification.
• Concept Context Vectors
• Document Concept Context Vectors
Base models Star formation efficiency (SFE) (Instability, 4), (Supernova, 2), (Milky Way, 3),…
Min distance
1 (Milky way, 1), (Electron neutrino, 1), (Electron antineutrino, 1),…
2 (Local analysis, 1), (White dwarf, 3), (Poynting-Robertson effect, 1), …
��������Minimum over the ontological paths to other concepts in the document
Ontology shortest path
Nearest neighbors
��������
Average distance to other concepts in the document
Co-occurring 1-hop neighbors from the ontology
Graph models evaluation Approach Precision (ScienceWISE) Precision (MSH) Min Distance 0.8882 0.6728 Ontology Shortest Path 0.8646 0.5677 Nearest neighbors 0.7393 0.7237
Approach Precision (ScienceWISE) Precision (MSH) Naïve Bayes 0.8513 0.6731 Binary CCV 0.9334 0.9077 + Ontology Shortest Path 0.9444 0.8077 + Nearest neighbors 0.9453 0.9060
Combined approaches