linked open vocabulary ranking and terms discovery

31
Linked Open Vocabulary Ranking and Terms Discovery Ioannis Stavrakantonakis PhD candidate University of Innsbruck - STI Innsbruck Ioannis Stavrakantonakis, Anna Fensel, Dieter Fensel

Upload: ioannis-stavrakantonakis

Post on 14-Apr-2017

168 views

Category:

Presentations & Public Speaking


3 download

TRANSCRIPT

Linked Open Vocabulary Ranking and Terms

Discovery

Ioannis StavrakantonakisPhD candidate

University of Innsbruck - STI Innsbruck

Ioannis Stavrakantonakis, Anna Fensel, Dieter Fensel

2

3

webpage

n keywords

n searches

n*m search results

n result terms

extract

perform

extract

filter

Discovering vocabulary terms

use

vocab-recommender

Outline

• Survey of vocabulary terms discovery.

• State of the art.

• Vocab-recommender: The vocabulary terms discovery assistant.

• Outlook - What is next?

4

Survey of vocabulary terms discovery

Survey of vocabulary terms discovery

• 64 participants with valid submissions out of 66.

• Familiar to Computer Science but without any experience in the annotations topic.

• 4 use cases: article (NASA), exhibition (Louvre), hotel (room description), recipe (pizza).

• 1 week time to submit the answers.

• Completion time provided by them.

• “Use solely the LOV Search for the discovery of terms.”

6

Number of selected terms

median 50% ofdata points

whisker - minimum

whisker - maximum

7

Measured selection time

8

• A few outliers in the terms selection time.

• Least skewed measurements for the exhibition and recipe cases.

• Takes in average 1hr.

• Exhibition & recipe cases have the lowest time.

Distribution of participants and schema.org

9

The 47% of the proposed terms belong to the schema.org namespace.

Is it due to a specific use case? - No.

General observations• 2 out of 66 participants failed to provide a valid

submission (3%).

• Static parts (media) were considered for inclusion only from 10% of the participants.

• The terms discovery process per se, didn’t include a guideline to follow.

• Participants faced cases that the vocabulary terms description was hard to follow.

10

Outline

• Survey of vocabulary terms discovery.

• State of the art.

• Vocab-recommender: The vocabulary terms discovery assistant.

• Outlook - What is next?

11

State of the art

Inclusion rules

• Be written in RDF and be dereferenceable (URI).

• Be parseable without errors.

• Terms should have an rdfs:label.

• Reuse relevant existing vocabularies.

• Provide metadata about itself.

No guarantee about the effectiveness.

Linked Open Vocabularies

13

Ranking

14

• Atemezing and Troncy: Information Content (IC) in LOV. a) Terms occurrence in comparison to the maximum term occurrence in the set of vocabularies. b) Centrality of the vocabulary.

• DWRank: a) Hub score. b) Authority score. (no LOD usage)

• TermPicker: Suggests types and properties from vocabularies that other LOD providers have combined together with the one the engineer has used to model the given part (using Schema Level Patterns).

Outline

• Survey of vocabulary terms discovery.

• State of the art.

• Vocab-recommender: The vocabulary terms discovery assistant.

• Outlook - What is next?

15

Vocab-recommender: The vocabulary terms discovery assistant

MethodologyAim: Assist the exploration of the vocabulary space for a given input webpage W.

Output: A set of vocabulary terms covering the needs of a given W.

Requires: • Perform all the discovery steps in an automatic manner. • Rank result terms to provide the best matches. • Describe the output in a transparent way that helps the

user educate herself about the vocabulary space.

17

Methodology

Ranker

Extractor

Searcher

Static recommenderRecommender

webpage Vocabulary terms set generator

Result set T

18

Ranking dimensions

• Vocabularies ranking (backlinks, inactive vocabs).

• Vocabulary authors profile.

• Vocabulary terms ranking (LOD usage: LODStats, vocab.cc).

• Vocabulary terms result set for similar webpages.

19

Vocabulary authorsHow can we address the low ranking scores of new

vocabularies?

Promote newly created vocabularies by authors that have provided vocabularies in the past with a

desired quality level (reflected in ranking).

20

vocabularies

authors

Static recommendations• Refers to images, videos, audio objects.

• Important aspect of the webpage interpretation by the search engines.

• Static mappings to some well defined schema.org terms.

21

Implementation

• As a Web service.

• Modular, i.e. any part can be substituted.

• Input: URL or Keywords.

• Output: Set of vocabulary terms described using the vSearch vocabulary.

22

Describing search results

23

The vSearch vocabulary

Example

24

Comparison with survey

25

Approachrecall: 71% ( - cooking time, cooking method) precision: 100% (no irrelevant terms)

Outline

• Survey of vocabulary terms discovery.

• State of the art.

• Vocab-recommender: The vocabulary terms discovery assistant.

• Outlook - What is next?

26

OutlookWhat’s next?

Approach accomplishments• Provide vocabulary terms recommendations for a given

webpage.

• Simplify the discovery process.

• Address the cold start problem for new vocabularies in the search.

• Educate the users around the semantic annotations topic.

• Provide a general vocabulary that can be used to describe search results.

28

What is next?

• For the presented approach: Recommendation of actions given the entities that have been proposed by the approach.

• For the Web: Utilise the structured and semantically annotated data to assist our daily lives. Interpret the annotated websites as APIs.

29

Thank you!

[email protected] istavrak.com

@istavrak

References Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Linked Open Vocabulary ranking and terms discovery. In Proceedings of the SEMANTiCS 2016.

Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Towards a vocabulary terms discovery assistant. In Proceedings of the SEMANTiCS 2016 Posters & Demos.

G. A. Atemezing and R. Troncy. Information content based ranking metric for Linked Open Vocabularies. In Proceedings of the 10th International Conference on Semantic Systems, 2014.

P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, and B. Vatant. Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semantic Web, 2015.

J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the Linked Open Data cloud. In European Semantic Web Conference (ESWC), 2016.

Photo credits for the sections’ photos: https://unsplash.com/photos/7m2gkYUDfFEhttps://unsplash.com/photos/DJ_kOgH5u0o https://unsplash.com/photos/o4-YyGi5JBc https://unsplash.com/photos/a8YV2C3yBMk https://unsplash.com/photos/s9XMNEm-M9c