vista-tv workpackage 6: external data service for metadata enrichment & novel tv recommendations

Post on 29-Nov-2014

502 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

http://vista-tv.eu/ ViSTA-TV project: Video Stream Analytics for Viewers in the TV Industry http://sirup.wmprojects.nl/

TRANSCRIPT

Video Stream Analytics for Viewers and the TV Industry

WP6: External Data Service

2 WP6: External Data Service

Objectives

WP6 Objectives

3 TITLE

•  O.6.1 •  External data service design •  Analysis of candidate sources •  Analysis of data extracted

•  O.6.2 •  External data service employed •  Enrich the EPG data •  Enrich feature extraction data •  Discover links between programs for novel recommendations

•  O.6.3 •  Publish data to the Linked Open Data cloud

The external data service aims at supporting the recommendation process by improving the connectivity of TV programs,

which does not surface with the standard EPG metadata.

ViSTA-TV External Data Service

4 TITLE

load

enrich publish

load

External Data Service 5

"World War II"

"Television Program"

"Green Cross Code"

"Tom Stoppard"

"David Prowse"

synopsis concepts

"In this episode, Larry meets two veterans who each lost a limb in World

War 2 to ask how differently we treat today\'s injured soldiers. Plus a

look back at the iconic Green Cross Code films.

With Stuart Hall and Miriam Stoppard"

po:long_synopsis

"Larry Lamb"

"Miriam Stoppard"

"Stuart Hall"

po:creditpo:credit

"http://dbpedia.org/resource/Larry_Lamb_(newspaper_editor)"

"http://dbpedia.org/resource/Larry_Lamb_(actor)"

"http://dbpedia.org/resource/Miriam_stoppard"

"http://dbpedia.org/resource/Stuart_Hall_(boxer)"

"http://dbpedia.org/resource/Stuart_Hall_(presenter)"

"http://dbpedia.org/resource/Stuart_Hall_(cultural_theorist)"

"http://dbpedia.org/resource/Stuart_Hall_(musician)"

po:credit

EPG

DWH

Concept tagging

DBpedia:<LABEL> LABELrdfs:labeldc:subject

LanguageDetection

SynopsisCredits

Title

DBpedia:<concept>

Zattoo Data Service: RDF

6 WP6: External Data Service

"9966901"

po:pid

"Die allerbeste Sebastian

Winkler Show"dc:title

"mit Motsi Mabuse, Lady Bitch Ray und

Sarah Brendel"

zattoo:episode_title

po:masterbrand

"(Premiere in Einsfestival )"

po:long_synopsis

po:category

po:episode

rdf:type

po:credit

po:credit

po:credit

"guest"

"Sarah Brendel"

"guest"

"Motsi Mabuse"

"guest"

"Lady Bitch Ray"

po:role

po:alias

po:role

po:alias

po:role

po:alias

po="http://purl.org/ontology/po/" zattoo="http://zattoo.com/" dc="http://purl.org/dc/elements/1.1/"rdf ="http://www.w3.org/1999/02/221rdf1syntax1ns#"

7 WP6: External Data Service

8 WP6: External Data Service

Enrichments Service

9 WP6: External Data Service

http://eculture2.cs.vu.nl:4000/browse/list_graphs

10 WP6: External Data Service

11 WP6: External Data Service

12 WP6: External Data Service

13 WP6: External Data Service

14 WP6: External Data Service

LOD Linking Service

15 TITLE

WP5

16 WP6: External Data Service

Recommendations

LOD for recommendations

17 External Data Service

•  LOD datasets provide additional information which can be used to provide novel TV recommendations

•  The challenge is to identify those links which are more useful to be used in the recommendation process.

•  We started to analyze the datasets to identify features which can help in selecting the right links to use

18 WP6: External Data Service

Current & Future Work

Current & Future Work

1.  Continuously adding new sources

2.  Continuous improvement of EPG enrichment quality •  complimentary services

•  crowdsourcing

3.  Defining LOD-based notion of serendipity

4.  Further studies on the LOD patterns and their suitability for recommendations

5.  Applying approach in other domains, e.g. books

19 TITLE

1. Adding new sources

20 TITLE

Dataset � Objects � Triples � Links to ... �

DBpedia � 3.77 mil � 400 mil � 27.2 mil �

Freebase � 23 mil � 337 mil � 3.9 mil �

BBC � 60 mil � 43.237 �

BBC music � 20 mil � 23.000 �

NYT � 10.467 � 345.889 � 23.400 �

MusicBrainz � 178 mil � 855.754 �

Flickr � 1.95 mil � 5.61 mil � 3.400.000 �

LinkedMDB � 503.242 � 6 mil � 162 756 �

GeoNames � 8 mil � 94 mil � 0 �

LinkedGeoData � 1 bil � 20 bil � 53204 �

2. Data cleaning

Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.

enrichment

“Canaletto” ontology:Location

“Rococo” dbpedia:Rococo_(band)

•  Type mis-classification •  URI mis-annotation

v  Integration of different text annotators results v  Validation through crouwdsourcing tasks

Collaboration with: Silvia Giannini

2. Data cleaning

extractor label DBpedia ontology class DBpedia URI Canaletto ontology:Location dbpedia:Canaletto

TextRazor Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

•  Label •  NERD ontology class •  sameAs link

•  Label •  DBpedia ontology class •  Dbpedia URI

•  Label •  DBpedia category •  Wikipedia page

•  Label •  DBpedia ontology class •  DBpedia URI

Type & URI alignment

Voting system: <Canaletto, dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto> 3/4

Validate: •  Labels relevance •  Relevant labels types

results

integration

Aggregated enrichment

(based on majority vote)

Automatic integration of

text annotators for enrichment

Analysis of collected data for: •  Voting system validation

(also URIs) •  Parameters tuning

(e.g., complementarity handling)

Program synopsis

What if: •  there is a tie-break? •  majority of annotators are wrong? •  more granular alignment ontologies

are adopted to avoid lack of type (or, type owl:Thing)?

Aggregated enrichment

(based on majority vote)

24 WP6: External Data Service

LOD & Serendipity

3. LOD-based Sependipity

25 WP6: External Data Service

Collaboration with:

LOD-based Sependipity

26 WP6: External Data Service

27 WP6: External Data Service

Diversity

4. LOD-based Patterns for Diversity

28 WP6: External Data Service

LOD-based method for increasing diversity in recommendations •  extracts all the patterns from an

RDF dataset à clusters generated & measured for diversity

•  fed into two statistical models •  to determine, which semantic

patterns can extract subsets of Linked Data to improve diversity in recommendations

•  data characterization step to choose model

•  diversity measures, e.g. entropy & semantic similarity

•  IMDB & DBPedia noisiness, size & sparsity of LOD

29 WP6: External Data Service

Applied to ‘Books’ Domain

References •  Valentina Maccatrozzo, Lora Aroyo and Willem Robert van Hage, Crowdsourced

Evaluation of Semantic Patterns for Recommendations, User Modeling, Adaptation, and Personalization, Rome, Italy, July 10-14, 2013.

•  Valentina Maccatrozzo, Davide Ceolin and Lora Aroyo, LOD Enrichment of TV Programs, in W3C Italy Event: Linked Open Data: where are we?, Rome, Italy, February 20-21, 2014.

•  Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo and Paul Groth, Semantic Pattern-based Recommender, Extended Semantic Web Conference (ESWC2014), Heraclion, Greece, May 25-29, 2014.

•  Ceolin, Davide, Moreau, Luc, O'Hara, Kieron, Fokkink, Wan, Van Hage, Willem Robert, Maccatrozzo, Valentina, Sackley, Alistair, Schreiber, Guus and Shadbolt, Nigel (2014) Two procedures for analyzing the reliability of open government data. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'2014), Montpellier, FR, 15 Jul 2014.

30 TITLE

top related