maurice hermans. ontologies ontology mapping research question string similarities winkler...
TRANSCRIPT
Evaluating a Generalization of the Winkler Extension in the Context of Ontology Mapping
Maurice Hermans
Bachelor Conference 2
Ontologies Ontology Mapping Research Question String Similarities Winkler Extension Proposed Extension Evaluation Results Conclusion
Outline
22-6-2012
Bachelor Conference 3
Provide a vocabulary of terms that describe a domain of interest
There are several ways in which ontologies can differ:◦ Encoding◦ Lexical◦ Syntactic◦ Semantic◦ Semiotic
Ontologies
22-6-2012
Bachelor Conference 4
Knowledge systems used in the same domain can be built according to different specifications and requirements
This makes it very hard to exchange data between multiple knowledge systems which do not use the same ontology
Ontology mapping frameworks provide knowledge systems with the capacity to exchange information with other knowledge systems which use different ontologies.
Ontology Mapping
22-6-2012
Bachelor Conference 5
To what extend can string similarities, applied to concept names, be improved such that
these are better suited for ontology mapping?
Research Question
22-6-2012
Bachelor Conference 6
Levenshtein◦ Uses the number of edit operations required to convert string one string
to another
Jaro◦ Uses the number of matching characters between two strings and their
relative position
Jaccard◦ Compares the sets of tokens of two strings
SoftTFIDF◦ Includes tokens which are similar according to a secondary similarity
function
String Similarities
22-6-2012
Bachelor Conference 7
Uses the length of the of the longest common prefix of s and t to assign a more favourable rating
Most commonly used with the Jaro similarity
◦ Where: Sim is the basis similarity and P’ the length of the common prefix bounded at 4
Winkler Extension
22-6-2012
Bachelor Conference 8
Uses the length of the longest common substring (LCS) of s and t to assign more favourable ratings
)◦ Where: Sim is the basis similarity, LCS the length of the
longest common substring and S the scaling for the bonus
Proposed Extension
22-6-2012
Bachelor Conference 10
Two datasets are used:◦ 2010 Ontology Alignment Evaluation Initiative◦ Dataset created by Cohen et al. 2000
Similarities are evaluated using precision and recall values
Evaluation
22-6-2012