linked data for federation of oer data & repositories
DESCRIPTION
An overview over different alternatives and opportunities of using Linked Data principles and datasets for federated access to distributed OER repositories. The talk was held at the ARIADNE/GLOBE convening (http://ariadne-eu.org/content/open-federations-2013-open-knowledge-sharing-education) at LAK 2013, Leuven, Belgium on 8 April 2013TRANSCRIPT
Motivation Data on the Web
Some eyecatching opener illustrating growth and or diversity of web data
Linked Data for Open Educational Data Sharing and Repository Federation
Stefan Dietze (L3S Research Center, DE, @stefandietze, http://purl.org/dietze)
02/04/13 Stefan Dietze
De-facto standard for sharing data on the Web
Vision: well connected graph of open Web data
W3C standards (RDF, SPARQL) to expose data
Persistent URIs to interlink datasets
Linked Data
Domain Number of
datasets Triples % (Out-)Links %
Media 25 1,841,852,061 5.82 % 50,440,705 10.01 %
Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 %
Government 49 13,315,009,400 42.09 % 19,343,519 3.84 %
Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 %
Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 %
Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 %
User-generated
content 20 134,127,413 0.42 % 3,449,143 0.68 %
295 31,634,213,770
503,998,829
Source: http://lod-cloud.net/state, September 2011
Media Ontology
FOAF
Gene Ontology
FMA Ontology
BIBO
Geo Ontology
DBpedia Ontology
Dublin Core
rNews
Option 1: LD for integration of heterogeneous APIs & data Use case: biomedical education in
=> http://metamorphosis.med.duth.gr/ Metamorphosis+ Tailored (L)CMS plugins
=> http://www.meducator3.net/
Data/services integration & retrieval/search APIs
? Educational Web Resources
Data/services integration & retrieval/search APIs Linked Educational Resources
http://linkededucation.org/meducator
Approach: 1) On the fly queries via “SmartLink” (Linked Data registry execution engine for open APIs)
2) Data lifting from heterogeneous repositories using “SmartLink” API and lifting specifications
3) Data enrichment (via DBpedia, Freebase, BioPortal) & clustering, eg to identify correlated resources
Goal: improvement of distributed (non-LD) data with public LOD vocabularies; tighter interlinking to provide coherent graph of educational data (across disparate stores)
http://purl.org/smartlink
Schemas: OAI-DC, LOM, …
Formats: XML, JSON
Interfaces: OAI-PMH, REST, SOAP
Option 1: LD for integration of heterogeneous APIs & data
Educational Web Resources
db:Viral
Infections db:Human
Papilloma Virus
db:Life
Sciences
<led:Resource-OpenLearn-2139393292>
<led:title>…viral…disease…</led:title>
…
</led:Resource-OpenLearn-2139393292>
<led:Resource-BBC-519215>
<led:title>…virus…</led:title>
…
</led:Resource-BBC-519215>
Option 1: LD for integration of heterogeneous APIs & data LD vocabularies for disambiguation & clustering
Stefan Dietze 08/04/13
<led:Resource-mEducator-2139393292>
<led:title>Virtual patient 1002,
infections & HPV</led:title>
…
</led:Resource-mEducator-2139393292>
db:Disease
Data/services integration & retrieval/search APIs Linked Educational Resources
http://linkededucation.org/meducator http://purl.org/smartlink
Schemas: OAI-DC, LOM, …
Formats: XML, JSON
Interfaces: OAI-PMH, REST, SOAP
Option 1: LD for integration of heterogeneous APIs & data Some issues/challenges
On-the-fly data integration, but issues wrt:
Annotation and description overhead: data lifting requires well-defined lifting specs for each API
Performance: distributed queries (multiple HTTP requests), on-the fly data lifting and processing
Scalability: decrease of query performance with increasing amount of repositories and/or data
Educational Web Resources
<dc:title> <akt:has-title> ?
OER
Publication
VideoLecture
LinkedUniversities
educational videos
Step 1 – Alignment of types/properties
12/03/13 7 Mathieu d‘Aquin, Stefan Dietze
Option 2: large-scale data harvesting and LD-ification Linked Data for automated cross-platform integration
6 million distinct (but linked) resources
97 million RDF triples
21.6 GB of data
Schema: http://data.linkededucation.org/ns/linked-education.rdf
SPARQL: http://data.linkededucation.org/request/linked-learning/sparql
LD and non-LD data
Step 2 – Linking of resources
<dc:title> <akt:has-title> ?
OER
Publication
VideoLecture
LinkedUniversities
educational videos
Step 1 – Alignment of types/properties
12/03/13 8 Mathieu d‘Aquin, Stefan Dietze
Option 2: large-scale data harvesting and LD-ification Linked Data for automated cross-platform integration
6 million distinct (but linked) resources
97 million RDF triples
21.6 GB of data
Schema: http://data.linkededucation.org/ns/linked-education.rdf
SPARQL: http://data.linkededucation.org/request/linked-learning/sparql
LD and non-LD data
Step 2 – Linking of resources
Larger scale data processing, but issues wrt:
Scalability and performance of data storage (potential solutions: applying distributed RDF storage, map/reduce etc)
Poor query performance (on large-scale datasets)
Redundant data maintenance => periodic data imports
Maintenance of different identifiers (in case of non-LD sources: URIs vs internal IDs)
“LinkedUp/Linked Education cloud” as (expanded) subset of LOD cloud: CKAN – “The DataHub” (http://datahub.io) for data collection in dedicated group “linked-education”
Public RDF vocabulary of datasets (“Linked Education Catalog”) (classification of datasets according to, eg, represented types, disciplines, data quality)
Additional integration datasets: dataset links and coreferences => providing a unified view on educational data => Linked Education Graph
Infrastructure, unified (SPARQL) endpoint & APIs for distributed/federated querying
Option 3: dataset cataloging and query federation LinkedUp approach [ http://linkedup-project.eu ]
Educational Datasets
Stefan Dietze 08/04/13
LinkedUp LinkedUp Dataset Catalog Data Interlinking & Correlation
Linked Education Cloud & Catalog
http://datahub.io/group/linked-education
http://data.linkededucation.org/linkedup/catalog/
Option 3: dataset cataloging and query federation Sparse knowledge / metadata about datasets
http://datahub.io/dataset/lak-dataset
Resource Types?
Topics & disciplines?
Quality & availability?
http://datahub.io/group/linked-education
Option 3: dataset cataloging and query federation Co-occurence of (mapped) types
Stefan Dietze 08/04/13
Option 3: dataset cataloging and query federation Dataset graph (according to type co-occurence)
Stefan Dietze 08/04/13
Approach
Enriching sample resources from each dataset with DBpedia entities/categories
Linking resources to LOD entities & categories via
Option 3: dataset cataloging and query federation Detection of topics and dataset similarities
Top-ranked categories/topics in Linked Education Catalog &
their frequency
Stefan Dietze 02/04/13
DBpedia Category Total Management 180 Academia 151 Social_sciences 131 Philosophy_of_science 125 Design 120 Sociology_index 117 Systems_science 117 Anthropology 116 Universities_and_colleges 116 Economics 114 Scientific_method 111 Cognitive_science 110 Systems 107 Sociological_terms 104 Neuropsychological_assessment 100 Concepts_in_metaphysics 96 Developmental_psychology 93 Political_philosophy 89 Cybernetics 88 Education 87 Philosophy_of_education 86 Arts 77 Critical_thinking 73 Biology 71 Political_science_terms 71
Summary and outlook
Summary
Different ways of using LD for federation of OER repositories
Linked Education data catalog (http://linkedup-project.eu, http://data.linkededucation.org/linkedup/catalog/): Linked Data-based catalog of open educational datasets (gradual addition of metadata about, eg, types, topics etc)
On the way: exposing non-LD educational data according to LD priniciples (eg LAK dataset)
Future work
Data interlinking: complementary dataset of links between datasets and actual data/resources
Query federation and dedicated APIs
Exploitation in innovative educational scenarios and applications => LinkedUp Challenge (http://linkedup-challenge.org)
Stefan Dietze 08/04/13
40.000 EUR price budget
Large network of organisations in LD & TEL
Dedicated data and support
Series of affiliated events at major conferences (www2013, ESWC2013, OKCON, LAK2013…)
LAK Challenge / LA & Linked Data Tutorial in a nutshell
Stefan Dietze
http://www.solaresearch.org/events/lak/lak-data-challenge/
http://linkedu.eu/event/lak2013-linkeddata-tutorial/
Thank you!
Contact http://purl.org/dietze | @stefandietze
See also (general)
http://linkedup-project.eu
http://linkedup-challenge.org
http://linkededucation.org
http://linkeduniversities.org
See also (data)
http://datahub.io/group/linked-education
http://data.linkededucation.org/linkedup/catalog /
http://www.solaresearch.org/resources/lak-dataset/
http://datahub.io/dataset/meducator
Stefan Dietze 08/04/13