linked open vocabulary ranking and terms discovery
TRANSCRIPT
Linked Open Vocabulary Ranking and Terms
Discovery
Ioannis StavrakantonakisPhD candidate
University of Innsbruck - STI Innsbruck
Ioannis Stavrakantonakis, Anna Fensel, Dieter Fensel
3
webpage
n keywords
n searches
n*m search results
n result terms
extract
perform
extract
filter
Discovering vocabulary terms
use
vocab-recommender
Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms discovery assistant.
• Outlook - What is next?
4
Survey of vocabulary terms discovery
• 64 participants with valid submissions out of 66.
• Familiar to Computer Science but without any experience in the annotations topic.
• 4 use cases: article (NASA), exhibition (Louvre), hotel (room description), recipe (pizza).
• 1 week time to submit the answers.
• Completion time provided by them.
• “Use solely the LOV Search for the discovery of terms.”
6
Measured selection time
8
• A few outliers in the terms selection time.
• Least skewed measurements for the exhibition and recipe cases.
• Takes in average 1hr.
• Exhibition & recipe cases have the lowest time.
Distribution of participants and schema.org
9
The 47% of the proposed terms belong to the schema.org namespace.
Is it due to a specific use case? - No.
General observations• 2 out of 66 participants failed to provide a valid
submission (3%).
• Static parts (media) were considered for inclusion only from 10% of the participants.
• The terms discovery process per se, didn’t include a guideline to follow.
• Participants faced cases that the vocabulary terms description was hard to follow.
10
Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms discovery assistant.
• Outlook - What is next?
11
Inclusion rules
• Be written in RDF and be dereferenceable (URI).
• Be parseable without errors.
• Terms should have an rdfs:label.
• Reuse relevant existing vocabularies.
• Provide metadata about itself.
No guarantee about the effectiveness.
Linked Open Vocabularies
13
Ranking
14
• Atemezing and Troncy: Information Content (IC) in LOV. a) Terms occurrence in comparison to the maximum term occurrence in the set of vocabularies. b) Centrality of the vocabulary.
• DWRank: a) Hub score. b) Authority score. (no LOD usage)
• TermPicker: Suggests types and properties from vocabularies that other LOD providers have combined together with the one the engineer has used to model the given part (using Schema Level Patterns).
Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms discovery assistant.
• Outlook - What is next?
15
MethodologyAim: Assist the exploration of the vocabulary space for a given input webpage W.
Output: A set of vocabulary terms covering the needs of a given W.
Requires: • Perform all the discovery steps in an automatic manner. • Rank result terms to provide the best matches. • Describe the output in a transparent way that helps the
user educate herself about the vocabulary space.
17
Methodology
Ranker
Extractor
Searcher
Static recommenderRecommender
webpage Vocabulary terms set generator
Result set T
18
Ranking dimensions
• Vocabularies ranking (backlinks, inactive vocabs).
• Vocabulary authors profile.
• Vocabulary terms ranking (LOD usage: LODStats, vocab.cc).
• Vocabulary terms result set for similar webpages.
19
Vocabulary authorsHow can we address the low ranking scores of new
vocabularies?
Promote newly created vocabularies by authors that have provided vocabularies in the past with a
desired quality level (reflected in ranking).
20
vocabularies
authors
Static recommendations• Refers to images, videos, audio objects.
• Important aspect of the webpage interpretation by the search engines.
• Static mappings to some well defined schema.org terms.
21
Implementation
• As a Web service.
• Modular, i.e. any part can be substituted.
• Input: URL or Keywords.
• Output: Set of vocabulary terms described using the vSearch vocabulary.
22
Comparison with survey
25
Approachrecall: 71% ( - cooking time, cooking method) precision: 100% (no irrelevant terms)
Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms discovery assistant.
• Outlook - What is next?
26
Approach accomplishments• Provide vocabulary terms recommendations for a given
webpage.
• Simplify the discovery process.
• Address the cold start problem for new vocabularies in the search.
• Educate the users around the semantic annotations topic.
• Provide a general vocabulary that can be used to describe search results.
28
What is next?
• For the presented approach: Recommendation of actions given the entities that have been proposed by the approach.
• For the Web: Utilise the structured and semantically annotated data to assist our daily lives. Interpret the annotated websites as APIs.
29
References Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Linked Open Vocabulary ranking and terms discovery. In Proceedings of the SEMANTiCS 2016.
Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Towards a vocabulary terms discovery assistant. In Proceedings of the SEMANTiCS 2016 Posters & Demos.
G. A. Atemezing and R. Troncy. Information content based ranking metric for Linked Open Vocabularies. In Proceedings of the 10th International Conference on Semantic Systems, 2014.
P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, and B. Vatant. Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semantic Web, 2015.
J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the Linked Open Data cloud. In European Semantic Web Conference (ESWC), 2016.
Photo credits for the sections’ photos: https://unsplash.com/photos/7m2gkYUDfFEhttps://unsplash.com/photos/DJ_kOgH5u0o https://unsplash.com/photos/o4-YyGi5JBc https://unsplash.com/photos/a8YV2C3yBMk https://unsplash.com/photos/s9XMNEm-M9c