semantic media mining seminar - kickoff
TRANSCRIPT
Semantic Media MiningBachelor Seminar - WS 2015/16
Dr. Harald Sack / Jörg Waitelonis / Magnus Knuth /Tamara Bobic / Dinesh Reddy / Tabea Tietz
Hasso-Plattner-Institut für Softwaresystemtechnik
Semantic Media Mining
1. Tutors
2. Semantic Media Mining
3. Seminar Challenges
4. Administrative Issues
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
● Head of Research Group “Semantic Technologies”
● Senior Researcher at Hasso Plattner Institute (HPI)
○ Research Topics
■ Semantic Web Technologies
■ Ontological Engineering
■ Information Retrieval
■ Multimedia Analysis & Retrieval
■ Knowledge Mining
■ Data/Information Visualization
○ Research Projects
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningDr. Harald Sack
My apologies for not attending
the first seminar lectures!
I’m representing HPI and our
Semantic Web Technology Research at the
following conferences:
Semantic Media MiningDr. Harald Sack
Semantic Media MiningDipl. Inform. Jörg Waitelonis
● Computer Science Univ. of Jena, 2006
● 2006-2008 Start-up Activities (osotis, yovisto)
● Developer for Multimedia Portal ETH-Zürich, CH
● Since 2009 at HPI
● Research: Semantic Web, Linked Data, Multimedia-Retrieval,
Semantic Search Technologies
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningDipl.-Inf. Magnus Knuth
● Studied Information Science @ Uni Leipzig
● 2007-2010: Research Assistant @ Institute for Medical Informatics,
Statistics and Epidemiology Leipzig (imise)
● since 2010: PhD student @ HPI
○ Semantic Web, Linked Data Cleansing, Linked Data Change
Management, Knowledge Management, Read-Write-Web
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningTamara Bobic, M.Sc.
● B.Sc. in Computer Science @ Belgrade, Serbia
● M.Sc. “Life Science Informatics” @ Uni Bonn
● PhD student @ HPI (since June, 2014)
● Research interests:
Semantic Web, Fact Ranking, Recommender Systems,
Knowledge Engineering
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningDinesh Reddy, M.Sc.
● Studied Life Sc Informatics @ Uni Bonn until March, 2014
● 2012-2014 Research Assistant @ Fraunhofer Institute for Algorithms
and Scientific Computing, St. Augustin
● since May, 2014 PhD student @ HPI
○ Semantic Web Technologies, Knowledge Engineering, Linked
Data, Temporal Mining
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningTabea Tietz, B.A.
● 2014: B.A. Economics and Social Science @ Potsdam University
● since 2014: M.A. studies @ Potsdam University
● 2010 - 2015: Student coworker @ HPI
● 2014 - 2015: Scholarship @ MIZ-Babelsberg
● since 2015: Scientific coworker @ HPI
● Interests: Semantic Web, Linked Data, DBpedia, Visualization
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningSemantic Technologies and Multimedia Research Group
Seminar Semantic Multimedia, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015
http://semex.hpi.uni-potsdam.de/semex/
Semantic Media MiningSemantic Technologies and Multimedia Research Group
Seminar Semantic Multimedia, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015
http://semex.hpi.uni-potsdam.de/mggui-dev2/#search
Seminar Knowledge Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015
http://blog.yovisto.com/the-hero-of-mushroom-kingdom-turns-27-super-mario/
Semantic Media MiningSemantic Technologies and Multimedia Research Group
Semantic Media MiningSemantic Technologies and Multimedia Research Group
http://commons.dbpedia.org/
http://dbpedia.org/
http://de.dbpedia.org/
Semantic Technologies Research Group Bloghttp://s16a.org/ http://linkeddata.org/
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media Mining
1. Tutors
2. Semantic Media Mining3. Seminar Challenges
4. Administrative Issues
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
The Semantic Web
● Extension of the WWW with formal Knowledge Representations
(Ontologies)
● Information in natural language is explicitly annotated with
semantic Metadata
● Semantic Metadata encode the Meaning (Semantics) of the
information content and can be read and correctly interpreted
(=understood) by machines
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
● Semantics
○ OWL, RDFS, SKOS, ...
● Model = RDF
● Syntax
○ N3, Turtle, XML
○ RDFa, JSON-LD
● Web Platform
○ URI/IRI, HTTP
○ UNICODE, AUTH
Semantic Media Mining
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media Mining
http://dbpedia.org/resource/Neil_Armstrong
Semantic Media Mining
structured data
semantic data
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media Mininghttp://dbpedia.org/resource/Neil_Armstrong
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media Mining
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
RDF - Resource Description Framework
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
http://dbpedia.org/resource/Neil_Armstrong
Neil Armstrong
http://dbpedia.org/ontology/Astronaut
Astronaut
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Subject Property Object
rdf:type
RDF Tripel
URIs for unique Identification
http://dbpedia.org/resource/Neil_Armstrong
http://dbpedia.org/ontology/Astronaut
is a
http://dbpedia.org/ontology/Person
is a subclass of
Classes
Entities
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Classes vs. Entitieshttp://dbpedia.org/resource/Neil_Armstrong
http://dbpedia.org/ontology/Person
Classes
Entities
is a
has birthdatexsd:date
http://dbpedia.org/ontology/City
has birthplace
has birthdate“1930-08-05”
has birthplace
http://dbpedia.org/resource/Wapakoneta,_Ohio
is ais a
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
RDF – Resource Description Format
● Triple
○ Subject
○ Property
○ Object
RDF Statement
Subject + Property + Object URI URI URI or Literal
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
RDF – Turtle Serialization (1)
<http://dbpedia.org/resource/Neil_Armstrong>
<http://dbpedia.org/ontology/Astronaut> .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
S
P
O
<http://dbpedia.org/resource/Neil_Armstrong>
“1930-08-05” .
<http://dbpedia.org/ontology/birthDate>
S
P
O
Resource
Literal
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
RDF – Turtle Serialization (2)
@prefix dbr: <http://dbpedia.org/resource> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbo: <http://dbpedia.org/ontology/> .
dbr:Neil_Armstrong rdf:type dbo:Astronaut .dbr:Neil_Armstrong dbo:birthDate “1930-08-05” .
@prefix dbr: <http://dbpedia.org/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbo: <http://dbpedia.org/ontology/> .
dbr:Neil_Armstrong rdf:type dbo:Astronaut ; dbo:birthDate “1930-08-05” .
http://www.w3.org/TR/rdf11-primer/ Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media Mining
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
http://dbpedia.org/resource/Neil_Armstrong
Semantic Media Mining
1. Tutors
2. Semantic Media Mining
3. Seminar Challenges
4. Administrative Issues
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningSeminar Challenges
1. Sound2Triple - Convert audio features to RDF2. DBpedia - Mining Implicit Knowledge
3. Temporal Information Extraction
4. Important knowledge coverage in the LOD cloud
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Seminar Challenges1. Sound2Triple - Convert audio features to RDF
● detect different kinds of audible events:
○ silence
○ speech
○ music
○ changes in dynamics
○ changes in speed
○ etc.
● use existing tools, e. g. http://essentia.upf.edu/ http://www.praat.org/ http://clam-project.org/
● create a RDF annotation using the Open Annotation Model
● visualize the annotation, if possible in real-time http://www.w3.org/TR/webaudio/
○ inspired by http://ianreah.com/2013/02/28/Real-time-analysis-of-streaming-audio-data-with-Web-Audio-API.html
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Seminar Challenges1. Sound2Triple - Convert audio features to RDF
increasing volume
Open Annotation Model
http://example.org/audio.mp3#t=23,55
Media Fragment Identifier<http://example.org/A-1> oac:hasTarget <http://example.org/audio.mp3#t=23,55> .<http://example.org/A-1> oac:hasBody <http://example.org/event1> .<http://example.org/event1> cnt:chars “increasing volume” .<http://example.org/event1> cnt:characterEncoding “utf-8” .<http://example.org/event1> rdf:type cnt:ContentAsText .
RDF Triples:
Seminar Challenges1. Sound2Triple - Convert audio features to RDF
Visualization: E.g. consume the triple stream and visualize events on a timeline
event 1 event 2event 3event 4 ...
● Wikipedia Categories of Wilhelm Conrad Röntgen:
● Triples extracted from this article section in article-categories_en.ttl:<http://dbpedia.org/resource/Wilhelm_Röntgen> <http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:1845_births> ,
<http://dbpedia.org/resource/Category:1923_deaths> ,
<http://dbpedia.org/resource/Category:Wilhelm_Röntgen> ,
<http://dbpedia.org/resource/Category:People_from_Remscheid> ,
<http://dbpedia.org/resource/Category:ETH_Zurich_alumni> ,
<http://dbpedia.org/resource/Category:Experimental_physicists> ...
Seminar Challenges2. DBpedia - Mining Implicit Knowledge from DBpedia Categories
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
https://en.wikipedia.org/wiki/Wilhelm_R%C3%B6ntgenhttp://dbpedia.org/resource/Wilhelm_R%C3%B6ntgen
Seminar Challenges2. DBpedia - Mining Implicit Knowledge from DBpedia Categories
● DBpedia category memberships contain implicit information about a resource, e.
g. Wilhelm Conrad Röntgen:
● Task:
○ Learn relationships from DBpedia properties and categories
○ Extract implicit facts for category members
○ Find inconsistencies, e.g. 1845_births ⇔ dbo:birthDate “1890-12-24”
category fact
1845_births dbo:birthDate “1845-03-27”
People_from_Remscheid dbo:birthPlace dbp:Remscheid
University_of_Würzburg_faculty dbo:workInstitutions dbp:University_of_Würzburg (implicit / missing)
Nobel_laureates_in_Physics dbo:award dbp:Nobel_Prize_in_Physics (implicit / missing)
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
https://en.wikipedia.org/wiki/Wilhelm_R%C3%B6ntgenhttp://dbpedia.org/resource/Wilhelm_R%C3%B6ntgen
Seminar Challenges3. Temporal information extraction
Extraction of timelines or more generally, temporal sequences from text.
Tasks
● Temporal information extraction from Wikipedia articles● Infer new temporal knowledge from
existing temporal information● Classification of extracted events
References
Stanford Temporal Tagger: SUTimeJava tools - DateParserTemporal Mining - Ground truth DatasetE. Kuzey, G. Weikum: Extraction of temporal facts and events from Wikipedia. TempWeb 2012
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Timeline of Mark Zuckerberg
For example we have text as follows :Jennifer Hosten was born on 12 March 1990. She won the title Miss New York in 2010. A year later she also won the title Miss America. Jennifer is a native of Wichita, Kansas, and a 2011 graduate of Wichita High School East. Her father is Mark Wagner and her mother is Krista Wagner.
● Temporal information extraction from Wikipedia articles● 1990-03-12, Jennifer Hosten was born on 12 March 1990.● 2010, She won the title Miss New York in 2010.● 2011, Jennifer is a native of Wichita, Kansas, and a 2011 graduate of Wichita High School East.
● Infer new temporal knowledge from existing temporal information● 2011, A year later she also won the title Miss America
● Classification of extracted events● we can classify events to life, career, education events etc.
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Seminar Challenges3. Temporal information extraction
Seminar Challenges4. Important knowledge coverage in the LOD cloud
● DBpedia - a large-scale knowledge base extracted from Wikipedia
○ Most interlinked dataset in the Linked Open Data (LOD) graph
○ English version of DBpedia 2015 describes 5.9 million entities
with 737 million facts in the form of RDF triples
○ How much knowledge is actually there?
● Find explicit semantic relations:
○ Dirk Nowitzki -- team -- Dallas Mavericks
○ Amsterdam -- country -- Netherlands
○ Garry Kasparov -- ??? -- Chess
○ Dalai Lama -- ??? -- Buddhism
(sample with entities is provided)
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Seminar Challenges4. Important knowledge coverage in the LOD cloud
Tasks:
● Evaluate coverage of important facts in DBpedia (sample). Facts can be found:○ fully (Dirk Nowitzki -- team -- Dallas Mavericks),
○ partially (Garry Kasparov -- dc:subject -- Chess grandmasters),
○ indirectly (Cristiano Ronaldo -- birthPlace -- Funchal -- country -- Portugal)
● Complement missing information with interlinked datasets (Yago, CIA World Factbook)○ e.g. from Wikidata: Garry Kasparov -- sport -- Chess
● Propose new RDF triples for facts that were not fully found ○ statistically derive connecting properties (e.g. Dalai Lama -- religion -- Buddhism)
● Compare coverage of LOD to traditional knowledge bases
References:● DBpedia - A Crystallization Point for the Web of Data
● https://open.hpi.de/courses/semanticweb2014 (Week 2, Week 3)
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
● http://www.w3.org/wiki/LinkedData
● http://lod-cloud.net/
Semantic Media Mining
1. Tutors
2. Knowledge Mining
3. Seminar Challenges
4. Administrative Issues
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningAdministrational Issues
● Weekly hours: 4
○ plenary sessions and individual team meetings (about 30min. each)
● ECTS: 6
● Grading:
○ Implementation of a research application
○ Presentation of achieved results
■ Midterm presentation, final (poster-)presentation, team meetings
○ Written final report of achieved results (= seminar paper)
■ about 20 pages each group
■ we provide an introduction to scientific writing and a template
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningAdministrational Issues
● Teams of 3 (max. 4) students work on a common problem
● Schedule (possible changes tba):
○ 22.10.2015: Formation of student teams,
technical introduction
○ 29.10.2015: First team meetings
○ 10.12.2015: Midterm presentations (plenary session)
○ 04.02.2016: Final presentations (plenary session)
○ 31.03.2015: Deadline for seminar reports
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningAdministrational Issues
Bibliography:○ H. Sack: Linked Data Technologien - Ein Überblick, in T. Pellegrini, H. Sack, S. Auer
(Hrsg.), Linked Enterprise Data, Springer Vieweg, Heidelberg, 2014, pp. 21-62.
○ Jens Lehmann et al.: DBpedia - A Crystallization Point for the Web of Data, in
Journal of Web Semantics 7(3):154--165 (2009).
○ OpenHPI: Knowledge Engineering with Semantic Web Technologies
○ An Introduction to Audio Content Analysis: Applications in Signal Processing and
Music Informatics, Alexander Lerch, ISBN: 978-1-1182-6682-3
Blog with seminar material!:○ http://smm2016.blogspot.de/
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Semantic Media MiningAdministrational Issues
Homework:
● Find your groups of 3 (max 4) students and
sign up:
○ http://bit.ly/smm2016_doodle
● If possible, email us your group’s first and
second favorite seminar topic
Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16
Next session: 22.10.2015 - Technical Introduction