vista-tv workpackage 6: external data service for metadata enrichment & novel tv recommendations

30
Video Stream Analytics for Viewers and the TV Industry WP6: External Data Service

Upload: lora-aroyo

Post on 29-Nov-2014

502 views

Category:

Technology


1 download

DESCRIPTION

http://vista-tv.eu/ ViSTA-TV project: Video Stream Analytics for Viewers in the TV Industry http://sirup.wmprojects.nl/

TRANSCRIPT

Page 1: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

Video Stream Analytics for Viewers and the TV Industry

WP6: External Data Service

Page 2: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

2 WP6: External Data Service

Objectives

Page 3: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

WP6 Objectives

3 TITLE

•  O.6.1 •  External data service design •  Analysis of candidate sources •  Analysis of data extracted

•  O.6.2 •  External data service employed •  Enrich the EPG data •  Enrich feature extraction data •  Discover links between programs for novel recommendations

•  O.6.3 •  Publish data to the Linked Open Data cloud

The external data service aims at supporting the recommendation process by improving the connectivity of TV programs,

which does not surface with the standard EPG metadata.

Page 4: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

ViSTA-TV External Data Service

4 TITLE

load

enrich publish

load

Page 5: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

External Data Service 5

"World War II"

"Television Program"

"Green Cross Code"

"Tom Stoppard"

"David Prowse"

synopsis concepts

"In this episode, Larry meets two veterans who each lost a limb in World

War 2 to ask how differently we treat today\'s injured soldiers. Plus a

look back at the iconic Green Cross Code films.

With Stuart Hall and Miriam Stoppard"

po:long_synopsis

"Larry Lamb"

"Miriam Stoppard"

"Stuart Hall"

po:creditpo:credit

"http://dbpedia.org/resource/Larry_Lamb_(newspaper_editor)"

"http://dbpedia.org/resource/Larry_Lamb_(actor)"

"http://dbpedia.org/resource/Miriam_stoppard"

"http://dbpedia.org/resource/Stuart_Hall_(boxer)"

"http://dbpedia.org/resource/Stuart_Hall_(presenter)"

"http://dbpedia.org/resource/Stuart_Hall_(cultural_theorist)"

"http://dbpedia.org/resource/Stuart_Hall_(musician)"

po:credit

EPG

DWH

Concept tagging

DBpedia:<LABEL> LABELrdfs:labeldc:subject

LanguageDetection

SynopsisCredits

Title

DBpedia:<concept>

Page 6: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

Zattoo Data Service: RDF

6 WP6: External Data Service

"9966901"

po:pid

"Die allerbeste Sebastian

Winkler Show"dc:title

"mit Motsi Mabuse, Lady Bitch Ray und

Sarah Brendel"

zattoo:episode_title

po:masterbrand

"(Premiere in Einsfestival )"

po:long_synopsis

po:category

po:episode

rdf:type

po:credit

po:credit

po:credit

"guest"

"Sarah Brendel"

"guest"

"Motsi Mabuse"

"guest"

"Lady Bitch Ray"

po:role

po:alias

po:role

po:alias

po:role

po:alias

po="http://purl.org/ontology/po/" zattoo="http://zattoo.com/" dc="http://purl.org/dc/elements/1.1/"rdf ="http://www.w3.org/1999/02/221rdf1syntax1ns#"

Page 7: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

7 WP6: External Data Service

Page 8: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

8 WP6: External Data Service

Enrichments Service

Page 9: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

9 WP6: External Data Service

http://eculture2.cs.vu.nl:4000/browse/list_graphs

Page 10: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

10 WP6: External Data Service

Page 11: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

11 WP6: External Data Service

Page 12: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

12 WP6: External Data Service

Page 13: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

13 WP6: External Data Service

Page 14: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

14 WP6: External Data Service

Page 15: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

LOD Linking Service

15 TITLE

WP5

Page 16: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

16 WP6: External Data Service

Recommendations

Page 17: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

LOD for recommendations

17 External Data Service

•  LOD datasets provide additional information which can be used to provide novel TV recommendations

•  The challenge is to identify those links which are more useful to be used in the recommendation process.

•  We started to analyze the datasets to identify features which can help in selecting the right links to use

Page 18: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

18 WP6: External Data Service

Current & Future Work

Page 19: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

Current & Future Work

1.  Continuously adding new sources

2.  Continuous improvement of EPG enrichment quality •  complimentary services

•  crowdsourcing

3.  Defining LOD-based notion of serendipity

4.  Further studies on the LOD patterns and their suitability for recommendations

5.  Applying approach in other domains, e.g. books

19 TITLE

Page 20: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

1. Adding new sources

20 TITLE

Dataset � Objects � Triples � Links to ... �

DBpedia � 3.77 mil � 400 mil � 27.2 mil �

Freebase � 23 mil � 337 mil � 3.9 mil �

BBC � 60 mil � 43.237 �

BBC music � 20 mil � 23.000 �

NYT � 10.467 � 345.889 � 23.400 �

MusicBrainz � 178 mil � 855.754 �

Flickr � 1.95 mil � 5.61 mil � 3.400.000 �

LinkedMDB � 503.242 � 6 mil � 162 756 �

GeoNames � 8 mil � 94 mil � 0 �

LinkedGeoData � 1 bil � 20 bil � 53204 �

Page 21: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

2. Data cleaning

Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.

enrichment

“Canaletto” ontology:Location

“Rococo” dbpedia:Rococo_(band)

•  Type mis-classification •  URI mis-annotation

v  Integration of different text annotators results v  Validation through crouwdsourcing tasks

Collaboration with: Silvia Giannini

Page 22: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

2. Data cleaning

extractor label DBpedia ontology class DBpedia URI Canaletto ontology:Location dbpedia:Canaletto

TextRazor Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto

•  Label •  NERD ontology class •  sameAs link

•  Label •  DBpedia ontology class •  Dbpedia URI

•  Label •  DBpedia category •  Wikipedia page

•  Label •  DBpedia ontology class •  DBpedia URI

Type & URI alignment

Voting system: <Canaletto, dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto> 3/4

Page 23: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

Validate: •  Labels relevance •  Relevant labels types

results

integration

Aggregated enrichment

(based on majority vote)

Automatic integration of

text annotators for enrichment

Analysis of collected data for: •  Voting system validation

(also URIs) •  Parameters tuning

(e.g., complementarity handling)

Program synopsis

What if: •  there is a tie-break? •  majority of annotators are wrong? •  more granular alignment ontologies

are adopted to avoid lack of type (or, type owl:Thing)?

Aggregated enrichment

(based on majority vote)

Page 24: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

24 WP6: External Data Service

LOD & Serendipity

Page 25: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

3. LOD-based Sependipity

25 WP6: External Data Service

Collaboration with:

Page 26: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

LOD-based Sependipity

26 WP6: External Data Service

Page 27: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

27 WP6: External Data Service

Diversity

Page 28: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

4. LOD-based Patterns for Diversity

28 WP6: External Data Service

LOD-based method for increasing diversity in recommendations •  extracts all the patterns from an

RDF dataset à clusters generated & measured for diversity

•  fed into two statistical models •  to determine, which semantic

patterns can extract subsets of Linked Data to improve diversity in recommendations

•  data characterization step to choose model

•  diversity measures, e.g. entropy & semantic similarity

•  IMDB & DBPedia noisiness, size & sparsity of LOD

Page 29: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

29 WP6: External Data Service

Applied to ‘Books’ Domain

Page 30: ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

References •  Valentina Maccatrozzo, Lora Aroyo and Willem Robert van Hage, Crowdsourced

Evaluation of Semantic Patterns for Recommendations, User Modeling, Adaptation, and Personalization, Rome, Italy, July 10-14, 2013.

•  Valentina Maccatrozzo, Davide Ceolin and Lora Aroyo, LOD Enrichment of TV Programs, in W3C Italy Event: Linked Open Data: where are we?, Rome, Italy, February 20-21, 2014.

•  Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo and Paul Groth, Semantic Pattern-based Recommender, Extended Semantic Web Conference (ESWC2014), Heraclion, Greece, May 25-29, 2014.

•  Ceolin, Davide, Moreau, Luc, O'Hara, Kieron, Fokkink, Wan, Van Hage, Willem Robert, Maccatrozzo, Valentina, Sackley, Alistair, Schreiber, Guus and Shadbolt, Nigel (2014) Two procedures for analyzing the reliability of open government data. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'2014), Montpellier, FR, 15 Jul 2014.

30 TITLE