hyperted - searching and browsing through fragments of ted talks
DESCRIPTION
A web application that aims to browse and recommend Media Fragments of TED Talks based on entities extracted in the subtitles. This is a short presentation of my semestral internship in EURECOM.TRANSCRIPT
![Page 1: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/1.jpg)
Searching and browsing through fragments of TED Talks
MARIELLA SABATINO – [email protected] GO!
25/09/2014 1
![Page 2: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/2.jpg)
TED is a global set of conferences, held throughout North America, Europe and Asia. TED Talks address a wide range of topics within the research and practice of science and culture. The speakers are given a maximum of 18 minutes to present their ideas in the most innovative and engaging way they can, often through storytelling.
TED Talks
25/09/2014 2
![Page 3: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/3.jpg)
Problem
Users are overwhelmed with
audiovisual content
Users browse fast, looking for topic
of interest
Which are the fragments potentially
relevant without having to watch the
entire video?
It is very difficult to find interesting documents
25/09/2014 3
![Page 4: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/4.jpg)
Research questions
how to recommend related media fragments within the same video collection
1 2 3
detect segments of interest in a video?
recommend related media
fragments within the same video
collection?
design a web application that provides a rich
environment for exploring a video
collection?
HOW TO:
25/09/2014 4
![Page 5: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/5.jpg)
Browsing and recommendation of Media Fragments of TED Talks based on entities extracted in the subtitles
Integration of the Media Fragments concept and the subtitles enrichment performed by NERD on a Node.js server
HyperTED
25/09/2014 5
![Page 6: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/6.jpg)
Research question 1
how to recommend related media fragments within the same video collection
1 2 3
detect segments of interest in a
video?
recommend related media fragments within
the same video collection?
design a web application that provides a rich
environment for exploring a video collection?
HOW TO:
25/09/2014 6
![Page 7: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/7.jpg)
2 3
What is a NER task? 1
Named Entity Recognition (NER) aims to locate and classify elements of textual document into pre-defined categories such as: • People names; • Organizations names; • Places; • Temporal and numerical expressions. These elements and the categories take respectively the name of entities and ontologies.
25/09/2014 7
![Page 8: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/8.jpg)
2 3
For example… 1
“This is Nikita, a security guard from one of the bars in St. Petersburg.”
“This is Nikita, a security guard from one of the bars in St. Petersburg.”
NER
Example taken from the transcript of https://www.ted.com/talks/2089
25/09/2014 8
PERSON
FUNCTION
LOCATION
Category: type in the NER task.
Natural Language Processing (NPL) Task disambiguating URL in a knowledge base. E.g. http://dbpedia.org/resource/Saint_Petersburg.
![Page 9: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/9.jpg)
Web Tools that use NER algorithms.
Open APIs for research use.
2 3
NER extractors 1
25/09/2014 9
![Page 10: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/10.jpg)
2 3
NERD 1
Compare performance of NER tools available on web.
Unify the results of NER extractors in a common output.
http://nerd.eurecom.fr/
25/09/2014 10
![Page 11: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/11.jpg)
2 3 NER extractors evaluation
1
DOCUMENTS ANALYZED: 5 short TED Talks NUMBER OF EVALUATORS: 1 STEPS OF EVALUATION: • Selection of the meaningful
concepts on the subtitles; • Run of each extractor; • Comparison of the results.
25/09/2014 11
PRECISION: the fraction of retrieved documents that are relevant RECALL: is the fraction of relevant documents that are retrieved. F-MEASURE: is the level of accuracy considering both the Precision and the Recall
![Page 12: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/12.jpg)
2 3 NER extractors evaluation
1
EXTRACTOR PRECISION RECALL F-MEASURE
AlchemyAPI 0,15 0,03 0,05147488928
DataTXT 0,21 0,36 0,2652521588
DBpedia Spotlight 0,14 0,37 0,1994140988
Lupedia 0,18 0,02 0,04389924763
OpenCalais 0,27 0,09 0,1347540544
Saplo 0,00 0,00 0
Textrazor 0,17 0,40 0,2416065311
THD 0,12 0,05 0,07485426603
Wikimeta 0,13 0,08 0,09514781377
Yahoo! Content Analysis 0,52 0,13 0,202927267
Zemanta 0,44 0,18 0,2511994999
Combined 0,11 0,54 0,1859774587
25/09/2014 12
![Page 13: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/13.jpg)
http://www.w3.org/TR/media-frags/
2 3
A Media Fragment is a part of a multimedia object.
Temporal Fragments
sections along the time dimension of the media resource with a start and an end point.
http://www.w3.org/TR/media-frags/
Media Fragments 1
25/09/2014 13
![Page 14: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/14.jpg)
2 3
TED Talks have paragraphs:
a human-made subdivision of subtitles.
MF creation: chapters
1
25/09/2014 14
![Page 15: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/15.jpg)
Extraction of topic from TextRazor and entities from NERD
Clustering of consecutive chapters which talks about similar topics
Filtering of those fragments based on annotation relevance
2 3 MF creation: hot spots
1
The Hot Spots are those fragments whose relative relevance falls under
the first quarter of the final score distribution.
25/09/2014 15
![Page 16: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/16.jpg)
Research question 2
how to recommend related media fragments within the same video collection
1 2 3
detect segments of interest in a video?
recommend related media
fragments within the same video collection?
design a web application that provides a rich
environment for exploring a video collection?
HOW TO:
25/09/2014 16
![Page 17: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/17.jpg)
1 3
A search engine is a system able to access to information previously stored and indexed.
The search engine indexing is the process of collecting, parsing and storing data to make searches faster.
We use it for indexing annotations in our database
Search Engine indexing
2
25/09/2014 17
![Page 18: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/18.jpg)
1 3
Because they “contain” the meaning of the talk
Because they contain some very useful attributes:
• timing references (startNPT and endNPT); • uuid; • relevance references.
Annotation based index
2
WHY ANNOTATIONS?
25/09/2014 18
WHICH ANNOTATIONS? Entities and Topics
![Page 19: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/19.jpg)
1 3
ElasticSearch is an open-source search engine.
It uses Apache Lucene™ for indexing.
It aims to make full text search easy by hiding the complexities of Lucene behind a simple RESTful API.
ElasticSearch 2
25/09/2014 19
![Page 20: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/20.jpg)
1 3
ElasticSearch provides a full Query DSL based on JSON to define queries. In general, there are basic queries such as term or prefix.
HOW TO MAKE A QUERY
25/09/2014 20
ElasticSearch 2
![Page 21: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/21.jpg)
1 3
Recommendation 2
Interlinking through chapters
and topic Interlinking to
openCourseware and openUniversity
25/09/2014 21
![Page 22: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/22.jpg)
Research question 3
how to recommend related media fragments within the same video collection
1 2 3
detect segments of interest in a video?
recommend related media fragments within
the same video collection?
design a web application that provides a rich
environment for exploring a video
collection?
HOW TO:
25/09/2014 22
![Page 23: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/23.jpg)
1 2
Architecture 3
25/09/2014 23
![Page 24: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/24.jpg)
1 2
DEMO 3
25/09/2014 24
http://linkedtv.eurecom.fr/mediafragmentplayer
![Page 25: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/25.jpg)
Conclusions
25/09/2014 25
Evaluation of NER tools in the context of TED Talks HotSpot detection based on topics and entities Recommendation algorithm, hyperlinks between fragment of TED talks + external education resources Nice and responsive UI
![Page 26: HyperTED - Searching and browsing through fragments of TED Talks](https://reader034.vdocuments.mx/reader034/viewer/2022051609/547e60a3b4af9fbe158b5799/html5/thumbnails/26.jpg)
Publications
25/09/2014 26
HyperTED is one of the submitted app at the Challenge at LinkedUP - http://linkedup-challenge.org/ José Luis Redondo García, Mariella Sabatino, Pasquale Lisena and Raphaël Troncy. Detecting Hot Spots in Web Videos. In International Semantic Web Conference (ISWC’14), Demo