towards the integration of research group website into the web of data
DESCRIPTION
Presentation done at CAEPIA 2011TRANSCRIPT
Towards the Integration of a Research GroupWebsite into the Web of Data
Mikel Emaldi David Bujan Diego Lopez de Ipina{m.emaldi, dbujan, dipina}@deusto.es
Deusto Institute of Technology - DeustoTech
November 2011
Motivation Our Solution Linked Data Extension Conclusions Future Work
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Table of Contents
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Motivation Our Solution Linked Data Extension Conclusions Future Work
Motivation
The desire of offering our research group website’s(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured
We chose our publications section as first attempt
Almost 100 publicationsPossibility to link them to external datasetsWe saw the oportunity of centralize group’s FOAF files
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Motivation
The desire of offering our research group website’s(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured
We chose our publications section as first attempt
Almost 100 publicationsPossibility to link them to external datasetsWe saw the oportunity of centralize group’s FOAF files
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Table of Contents
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Motivation Our Solution Linked Data Extension Conclusions Future Work
First Approach
First Approach
A solution based on Python web-script (mod python)
The core code of Joomla! was to be modified
Here there was a major problem:
When a security update was installed, Joomla! used to destroyour custom code
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Component
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Plugin
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
It offers a feasible solution for analyze published publicationsand to generate correspondent Linked Data
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Joomla! Content Example
TALISMAN+: Intelligent System for Follow-Up andPromotion of Personal AutonomyDavid Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.
Download
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular ExpressionDBLP SPARQL EndpointGoogle Scholar search engine
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular Expression
DBLP SPARQL EndpointGoogle Scholar search engine
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular ExpressionDBLP SPARQL Endpoint
Google Scholar search engine
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular ExpressionDBLP SPARQL EndpointGoogle Scholar search engine
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex I
User defines a regular expression to parse its content
User has to define used ontologies and their prefixes into theadmin control panel
The regex tags are clearly understandable
The ontology properties to be mapped are tagged between {}Every delimiter (also the {}) is identified by a \The term {dummy} can be used to ignore content
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex II
David Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.
Download
{dc : c r e a t o r , sep ( , )}\ . \{dc : t i t l e }\.\{swrc : s e r i e s }\. \{swrc : l o c a t i o n }\.\{dc : d a t e }\. \{b i b o : a b s t r a c t} \Download\$
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex II
David Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.
Download
{dc : c r e a t o r , sep ( , )}\ . \{dc : t i t l e }\.\{swrc : s e r i e s }\. \{swrc : l o c a t i o n }\.\{dc : d a t e }\. \{b i b o : a b s t r a c t} \Download\$
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
DBLP I
Digital Bibliography & Library Project
> 1.3 million articles
SPARQL endpoint at:
http://dblp.l3s.de/d2r/sparql/
http://dblp.l3s.de/d2r/snorql/
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
DBLP II
DBLP SPARQL endpoint is used to search data aboutpublications
SELECT DISTINCT ?uri ?p ?o WHERE {?uri dc:title“title-of-article”ˆˆ<http://www.w3.org/2001/XMLSchema#string>}
Data is enriched with our own data and saved into the RDFstore
We also link members FOAF’s to DBLP authors data<http://www.morelab.deusto.es/resource/dipina> owl:sameAs<http://dblp.l3s.de/d2r/resource/authors/Diego Lopez-de-Ipina> ;
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar I
A simple way to broadly search for scholarly literature
http://scholar.google.com
It exports data in diferent formats
BibTeXEndNoteRefManRefWorksWenXiangWang
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
An HTTP request using an specific cookie to retrieve BibTeXdata
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
BibTeX data is retrieved
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
Mapping from BibTeX data to RDF
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
FOAF
Every member of our group has its own FOAF file
http://www.morelab.deusto.es/resource/member-alias
Every publication is linked to its author’s URI<http://www.morelab.deusto.es/resource/imhotep-an-approach-to-user-and-device-conscious-mobile-applications> dc:creator<http://www.morelab.deusto.es/resource/dipina>
This is done automatically looking for author’s nicknames
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Flowchart
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Joseki + SDB
Joseki
A SPARQL server for JenaStorage into RDF files and relational databasesIt allows SPARQL UpdatesIt is private for our system
SDB
A component of JenaIt provides:
Scalable storageQuery of RDF datasets using conventional SQL databases
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Joseki + SDB
Joseki
A SPARQL server for JenaStorage into RDF files and relational databasesIt allows SPARQL UpdatesIt is private for our system
SDB
A component of JenaIt provides:
Scalable storageQuery of RDF datasets using conventional SQL databases
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Pubby
Pubby adds Linked Data interfaces to SPARQL endpointsIt allows content negotiation among these formats:
HTMLRDF/XMLN3
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Snorql
An AJAXy front-end for exploring RDF SPARQL endpoints
More usable than Joseki
It is MoreLab’s public SPARQL endpoint
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Table of Contents
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Motivation Our Solution Linked Data Extension Conclusions Future Work
Admin Overview
Dataset Creation:
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
Admin Overview
Ontology Prefix Definition:
Regex Definition:
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Motivation Our Solution Linked Data Extension Conclusions Future Work
User Overview
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Table of Contents
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Motivation Our Solution Linked Data Extension Conclusions Future Work
Conclusions
This solution integrates our data into Web of Data easily
Provides a reusable solution
Opens the door to more extendable solutions
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Table of Contents
1 Motivation
2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Motivation Our Solution Linked Data Extension Conclusions Future Work
Future Work
Link our datasets with more external datasets
DBPediaGeonames
RDF and SPARQL search form
Externalize linked data sources
Building the Extension modularly
Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
Towards the Integration of a Research GroupWebsite into the Web of Data
Mikel Emaldi David Bujan Diego Lopez de Ipina{m.emaldi, dbujan, dipina}@deusto.es
Deusto Institute of Technology - DeustoTech
November 2011