maurice hermans. ontologies ontology mapping research question string similarities winkler...

15
Evaluating a Generalization of the Winkler Extension in the Context of Ontology Mapping Maurice Hermans

Upload: abel-simpson

Post on 17-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Evaluating a Generalization of the Winkler Extension in the Context of Ontology Mapping

Maurice Hermans

Bachelor Conference 2

Ontologies Ontology Mapping Research Question String Similarities Winkler Extension Proposed Extension Evaluation Results Conclusion

Outline

22-6-2012

Bachelor Conference 3

Provide a vocabulary of terms that describe a domain of interest

There are several ways in which ontologies can differ:◦ Encoding◦ Lexical◦ Syntactic◦ Semantic◦ Semiotic

Ontologies

22-6-2012

Bachelor Conference 4

Knowledge systems used in the same domain can be built according to different specifications and requirements

This makes it very hard to exchange data between multiple knowledge systems which do not use the same ontology

Ontology mapping frameworks provide knowledge systems with the capacity to exchange information with other knowledge systems which use different ontologies.

Ontology Mapping

22-6-2012

Bachelor Conference 5

To what extend can string similarities, applied to concept names, be improved such that

these are better suited for ontology mapping?

Research Question

22-6-2012

Bachelor Conference 6

Levenshtein◦ Uses the number of edit operations required to convert string one string

to another

Jaro◦ Uses the number of matching characters between two strings and their

relative position

Jaccard◦ Compares the sets of tokens of two strings

SoftTFIDF◦ Includes tokens which are similar according to a secondary similarity

function

String Similarities

22-6-2012

Bachelor Conference 7

Uses the length of the of the longest common prefix of s and t to assign a more favourable rating

Most commonly used with the Jaro similarity

◦ Where: Sim is the basis similarity and P’ the length of the common prefix bounded at 4

Winkler Extension

22-6-2012

Bachelor Conference 8

Uses the length of the longest common substring (LCS) of s and t to assign more favourable ratings

)◦ Where: Sim is the basis similarity, LCS the length of the

longest common substring and S the scaling for the bonus

Proposed Extension

22-6-2012

Bachelor Conference 9

Two partial ontologies from the OAEI dataset

Example

22-6-2012

Bachelor Conference 10

Two datasets are used:◦ 2010 Ontology Alignment Evaluation Initiative◦ Dataset created by Cohen et al. 2000

Similarities are evaluated using precision and recall values

Evaluation

22-6-2012

Bachelor Conference 11

OAEI Cohen

Optimal weight for both datasets is around 0.8

Results

22-6-2012

Bachelor Conference 12

OAEI Cohen

Results

22-6-2012

Bachelor Conference 13

OAEI Cohen

Results

22-6-2012

Bachelor Conference 14

OAEI Cohen

Results

22-6-2012

Bachelor Conference 15

Conclusion

22-6-2012