notube: metadata enrichment
Post on 01-Nov-2014
905 Views
Preview:
DESCRIPTION
TRANSCRIPT
WP4: TV Data Text Enrichment
Pavel Mihaylov (OT) and partners
Contents
Ontotext and its role in the project
WP4: text, audio and video
Goals and achievements
Demo
Conclusions
26-‐27 March 2012 NoTube 3rd review 2
• Seman5c technology developer est. in 2000 – Staff: 65 employees and mulMple contractors
• Global leader in semanMc technologies – Seman5c Databases: high performance RDF DBMS, scalable reasoning
– Seman5c Search: text-‐mining (IE), InformaMon Retrieval (IR)
– Web Mining: focused crawling, screen scraping, data fusion
• Role in NoTube – WP4 leader
– Seman5c Enrichment
– Experience from mulMple European projects
26-‐27 March 2012 NoTube 3rd review 3
WP4: Content Enrichment
Content • Text: EPGs, programme descripMons
• Audio • Video
Enrichment • Adding metadata • Content about content
26-‐27 March 2012 NoTube 3rd review 4
Goal: Text enrichment
SemanMc annotaMon component
Recognising items of interest in text
Assigning links to Linked Open Data
• Analyses short or free-‐text text segments
• Extends them with further world knowledge
26-‐27 March 2012 NoTube 3rd review 5
Goal: Text enrichment (2)
26-‐27 March 2012 NoTube 3rd review 6
Live at the Apollo 2/6 Not Going Out star Lee Mack presents sets from American comic Rich Hall and Scotland’s very own Danny Bhoy.
Goal: MulMlingual
TV world
English
German
Italian
Dutch
Arabic Bulgarian
French
Korean
Turkish
26-‐27 March 2012 NoTube 3rd review 7
Goal: Graph enrichment
• EnMMes extracted from text
Build upon basic enrichment
• Follow a chain of LOD predicates
Exploit relaMons in SemanMc Repository
• A richer set of enMMes
Enrich the basic enrichment
26-‐27 March 2012 NoTube 3rd review 8
Goal: Graph enrichment (2)
26-‐27 March 2012 NoTube 3rd review 9
Goal: Graph enrichment (3)
• Film • TelevisionShow • Work • Band/MusicalArMst • Actor • Place
Classes to enrich
26-‐27 March 2012 NoTube 3rd review 10
Film enrichment
26-‐27 March 2012 NoTube 3rd review 11
• Film class • At least one common indirect relaMon
• TelevisionShow class • At least two common indirect relaMons
TelevisionShow enrichment
26-‐27 March 2012 NoTube 3rd review 12
• Work except Film and TelevisionShow • At least one common indirect rela?on
Work enrichment
26-‐27 March 2012 NoTube 3rd review 13
Band/MusicalArMst enrichment
26-‐27 March 2012 NoTube 3rd review 14
• Band and MusicalAr5st • At least one direct rela?on
Actor enrichment
26-‐27 March 2012 NoTube 3rd review 15
• Actor class • Starring relaMon from at least two common Works
Place enrichment
26-‐27 March 2012 NoTube 3rd review 16
• Place class • At least one direct rela?on
Lupedia
Text enrichment service
• Input: plain text, e.g. programme descripMons • Output: Linked Open Data enrichment • XML, json, RDFa
• Features: • MulMlingual • Graph enrichment • MulMple vocabularies • Configurable • Fast
26-‐27 March 2012 NoTube 3rd review 17
Lupedia over Mme
Becer service
MulMlingualism
New matching opMons and
filters
HeurisMcs
Predicate, heurisMcs and class weights
DisambiguaMon Most specific class in output
MulMple vocabularies
Selectable vocabulary
Graph enrichment
26-‐27 March 2012 NoTube 3rd review 18
EvaluaMon summary
Lupedia compared to OpenCalais and AlchemyAPI
• Only two other similar services • Much becer coverage than either of them • Comparable precision
• Custom vocabularies & filters • Tuned to TV domain
Lupedia is a unique service
26-‐27 March 2012 NoTube 3rd review 19
Links to other WPs
• EnMty URIs point to WP1 models
WP1
• Lupedia in NLP based profiling and enrichment
WP3
• Lupedia in SmartLink and Watch’n’Buy
WP5
• IntegraMon, enrichment in demo apps
WP6
• 7a news enrichment
• 7c programme descripMon enrichment
WP7 26-‐27 March 2012 NoTube 3rd review 20
Lupedia demo
26-‐27 March 2012 NoTube 3rd review 21
http://lupedia.ontotext.com/
Emerging compeMMon
Lupedia Yahoo WikiMachine En5tyPedia
LOD output DBpedia & LinkedMDB
DBpedia ?
MulMlingual ar, bg, nl, en, fr, de, it, ko, tr
en, zh en, pt, it ?
Confidence yes yes yes ?
Graph enrichment
yes yes* no ?
Remark Tuned to TV domain, one of the pioneers
No direct access to LOD, graph enrichment too abstract
Too generic, precision seems lower
Not yet released
26-‐27 March 2012 NoTube 3rd review 22
Lessons & Impact
Lessons learnt: • Emerging similar services clearly show the need for such services
• Coverage and language support are important
Lupedia recognised as one of the major players and included in NERD: • AggregaMng named enMty services and comparing their performance
• hcp://nerd.eurecom.fr
Various partners willing to use
Lupedia in other projects
26-‐27 March 2012 NoTube 3rd review 23
Life aker NoTube
Will be kept alive as a
demo service
Closed source
Possibly an OpenCalais-‐like service in
future
26-‐27 March 2012 NoTube 3rd review 24
QuesMons?
26-‐27 March 2012 NoTube 3rd review 25
top related