open education challenge 2014: exploiting linked data in educational applications
TRANSCRIPT
Exploiting (Linked) Web Data
in Educational Applications
Stefan Dietze
L3S Research Center
http://purl.org/dietze
@stefandietze
- Open Education Challenge, Berlin, 2014 -
28/10/14 1 Stefan Dietze
Linked Data for education
Data sharing: TED, Open Courseware, mEducator, LinkedUp, LAK….
Tutorials & workshops (eg „Linked Learning“ series)
LinkedUniversities.org and LinkedEducation.org
W3C Linked Open Education community group
Research areas
Web & data science, information retrieval, semantic web & Linked Data, data & knowledge integration
Application domains: education/TEL, Web archiving, …
Some projects
Introduction
http://www.l3s.de/
28/10/14 2
See also: http://purl.org/dietze
Stefan Dietze
Social Media
Exploiting Open Data for Education?nutshell
(Open) Educational Resources World
Wide
Web
Distance Universities
MOOCs
Linked Open Data
28/10/14 3 Stefan Dietze
How Open is Open Data?
Open Data (as in “open licensing”)
Open licensing (ODL, CC etc)
Yet: variety of approaches
APIs/feeds: SOAP, REST, etc
Diverse schemas & vocabularies
(lack of) controlled vocabularies
Reuse & interoperability?
Linked Data (technology) (as in “interoperability”)
Defacto Standard for Open Data on the Web
W3C standards:
Common HTTP interface: SPARQL
Common representation: RDF
Dereferencable URIs
Shared/linked vocabularies
Linked Open Data
5-star scheme by Sir Tim Berners Lee
28/10/14 4 Stefan Dietze
Semantic Web
Example: Google Knowledge Graph (DBpedia, Freebase, Yago etc)
W3C standards (RDF & SPARQL) for knowledge representation and querying
URIs to identify/link data
“A little semantics goes a long way” (J. Hendler1)
dbp:United_States
http://dbpedia.org/resource/Cambridge_MA
dbp:W3C
country cityOf
1 Hendler, J., The Dark Side of the Semantic Web, IEEE Intelligent Systems, Jan/Feb 2007
schema:City
typeOf
dbp:MIT
ru.dbp:Кембридж_(Массачусетс)
sameAs headquarterOf
HTTP accessibility: persistent URIs, SPARQL
FOAF
Gene Ontology
BIBO
Geo Ontology
DBpedia Ontology
Dublin Core
BBC Program
mes
Connected graph of open Web data (500+ datasets and 100 billion triples)
Persistent, dereferencable URIs & content negotiation, shared/linked vocabularies
SPARQL to query via HTTP
Other „incarnations“:
Google Knowledge Graph
Facebook Open Graph
http://schema.org
http://dbpedia.org/resource/Cambridge_MA
28/10/14 6 Stefan Dietze
LD to ensure discoverability of content/Websites (eg schema.org/microdata/RDFa)
Annotating HTML documents about (educational) material with schema.org (eg LRMI, Learning Resource Metadata Initiative)
Adopted by major sites (YouTube, LinkedIn etc) & tool support (DRUPAL, WordPress)
LD is not just for your data Schema.org for discovery of content/websites
http://schema.org
© Ramanathan V. Guha, Google, SemTech2014
28/10/14 7 Stefan Dietze
Other learning-relevant data & resources
Publications & literature
(Social) media resource metadata
Domain-specific knowledge: Bioportal, Europeana, Geonames, …
Cross-domain factual knowledge: DBpedia, Freebase, …
LD as body of knowledge for education http://linkededucation.org
http://linkeduniversities.org
28/10/14 8 Stefan Dietze
Educational datasets and vocabularies
University Linked Data: The Open University UK, http://data.open.ac.uk, Southampton University, http://education.data.gov.uk, …
Open Educational Resources metadata: mEducator, Open Learn, Open Courseware, …
Schemas: Learning Resource Metadata Initiative (LRMI, mEducator Educational Resources schema, BIBO, AAISO, …
LD as background knowledge for educational apps?
http://metamorphosis.med.duth.gr/
Title: ECG Patient case 1001 chest and limb leads
28/10/14 9 Stefan Dietze
Title: ECG Patient case 1001 chest and limb leads
„ECG“ dismabiguation on Wikipedia: 9 meanings
LD as background knowledge for educational apps?
28/10/14 10 Stefan Dietze
dbpedia.org/resource/Electrocardiagraphy
1. Understanding data: contextual disambiguation through NLP tools
2. Enrichment with factual knowledge
dbpedia:Электрокардиография
category:Cardiac_procedures
dbpedia:Willem_Einthoven
3. interlinking with related resources
bbc:ProgrammeXY
slideshare:SlidesetXY
yovisto:VideolectureXY
Title: ECG Patient case 1001 chest and limb leads
Understanding, enriching, linking data
28/10/14 11 Stefan Dietze
„Success models“: data & applications
Supporting innovative tools & applications
Evaluation methods
LinkedUp – Linking Web Data for Education
Technology transfer & community-building
Involving educators, developers, computer scientists, data engineers…
http://www.linkedup-challenge.org/
Data curation & profiling
Collecting & exposing open data for education
Profiling of Web Data
http://data.linkededucation.org
EC-funded project aimed at advancing take-up of open data and related technologies
http://www.linkedup-project.eu/events
28/10/14 Stefan Dietze 12
http://www.linkedup-project.eu/
Community-building and collaboration Joint work on tangible outcomes (datasets, applications....)
Associated Partners
Initiatives
EC Projects
Stefan Dietze
Collected & curated datasets of educational relevance
Beyond collecting: published over 50 datasets as LD together with most important content providers e.g. TED, OCW, SoLAR etc
LinkedUp catalog: most comprehensive collection of LD/Open Data for education
RDF dataset metadata
Federated queries across datasets using type mappings
Publishing & curating educational data
http://data.linkededucation.org/linkedup/catalog/
28/10/14 Stefan Dietze 14
http://data-observatory.org/lod-explorer
Supporting developers and data consumers
Devtalk blog: developer resource & community to aid developers
Webinars and tutorials
http://data.linkededucation.org/linkedup/devtalk/
Topic-based annotation and discovery of data
Data exploration & visualisation features
28/10/14 Stefan Dietze 16
LinkedUp events, training & technology transfer Bringing stakeholders together
Data Providers & Data Scientists
Developers
Community-building through events & communication channels/social media (cross-disciplinary, industry & academia)
Exploitation of project outcomes across communities: technology transfer
(Co-)organised approx. 20 events (tutorials, workshops, booths etc)
More than 30 invited talks/lectures
….
Users (Learners, Tutors, Teachers)
28/10/14 Stefan Dietze 17
May –September 2013 October 2013 – May 2014 May 2014 – October 2014
Series of Open Data Competitions to promote applications which exploit Linked Open Data
http://www.linkedup-challenge.org/
LinkedUp Challenge
23
1413
89
10
0
5
10
15
20
25
Veni Vidi Vici
submissions
shortlist
LinkedUp Challenge results
50 submissions of which 27 were shortlisted and supported (through travel grants, participation in events and rewards)
13 Veni, Vidi, Vici winners (grants: 1000 – 3000 €)
Authors from 23 distinct, mostly European countries
LinkedUp submissions & shortlist
Coatia; 4Greece; 4
Belgium; 5
Italy; 7
Germany; 11
Spain; 13
France; 14Netherlands; 15
United States; 15
United Kingdom; 21
authors
Top-10 author‘s origins
28/10/14 Stefan Dietze 21
Issues (1/3) - open data is messier than we think
SPARQL endpoint availability over time [Buil-Aranda et al 2013]
Accessibility of datasets?
Less than 50% of all SPARQL endpoints actually responsive at given point of time [Buil-Aranda2013]
“THE” SPARQL protocol? No, but many variants & subsets
Data “quality”?
…data accuracy (eg DBpedia)? [Paulheim2013]
…vocabulary reuse/links? [D’AquinWebSci13]
…schema compliance (RDFS, schemas) [HoganJWS2012]
Stefan Dietze
SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda,
Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic
Web Conference 2013, (ISWC2013).
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A.,
Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC
2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525
An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth,
A., Cyganiak, R., Polleres, A., Decker., S., Journal of Web Semantics 14, 2012
28/10/14 22
Issues (2/3) – accepting inconsistency
Analyzing Relative Incompleteness of Movie Descriptions
in the Web of Data: A Case Study, Yuan, W., Demidova, E.,
Dietze, S., Zhu, X., International Semantic Web Conference
2014 (ISWC2014)
28/10/14 Stefan Dietze 23
Issues (3/3) – licensing/legal aspects
Dataset Words Pages
DBpedia 7163 16
Flickr 10367 23
ConceptNet 7163 16
World Bank 7056 16
Nature 7024 16
LinkedIn 6104 14
Google+ 5740 13
Tumblr 5362 12
Twitter 4247 9
Facebook 4179 9
Mashing up data: legal and licensing related issues under-estimated
What license do you get when mashing up:
Attribution: copyright violation from missing (86%) or incorrect attribution (14%) information
Terms & conditions: complexity and conflicts when merging data from different sources
Potential non-compliance from evolution of (a) LOD applications and (b) underlying datasets (and their licenses)
T&C of established datasets
28/10/14 Stefan Dietze 24
Nature (CC0) + DBpedia (CC-ShareAlike) + FAO (Proprietary non-commercial) => ?
Get involved!
http://www.w3.org/community/opened
http://data.linkededucation.org/linkedup/catalog/
http://data.linkededucation.org/linkedup/devtalk/