towards the integration of research group website into the web of data

47
Towards the Integration of a Research Group Website into the Web of Data Mikel Emaldi David Buj´ an Diego L´ opez de Ipi˜ na {m.emaldi, dbujan, dipina}@deusto.es Deusto Institute of Technology - DeustoTech November 2011

Upload: mikel-emaldi-manrique

Post on 09-Jul-2015

538 views

Category:

Technology


0 download

DESCRIPTION

Presentation done at CAEPIA 2011

TRANSCRIPT

Page 1: Towards the Integration of Research Group Website into the Web of Data

Towards the Integration of a Research GroupWebsite into the Web of Data

Mikel Emaldi David Bujan Diego Lopez de Ipina{m.emaldi, dbujan, dipina}@deusto.es

Deusto Institute of Technology - DeustoTech

November 2011

Page 2: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 3: Towards the Integration of Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Page 4: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Motivation

The desire of offering our research group website’s(http://www.morelab.deusto.es) data as Linked Data

Our web is supported by Joomla! CMS

The data is unstructured

We chose our publications section as first attempt

Almost 100 publicationsPossibility to link them to external datasetsWe saw the oportunity of centralize group’s FOAF files

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 5: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Motivation

The desire of offering our research group website’s(http://www.morelab.deusto.es) data as Linked Data

Our web is supported by Joomla! CMS

The data is unstructured

We chose our publications section as first attempt

Almost 100 publicationsPossibility to link them to external datasetsWe saw the oportunity of centralize group’s FOAF files

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 6: Towards the Integration of Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Page 7: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

First Approach

First Approach

A solution based on Python web-script (mod python)

The core code of Joomla! was to be modified

Here there was a major problem:

When a security update was installed, Joomla! used to destroyour custom code

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 8: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Solution Overview

Joomla! Extension

A solution based on an Extension for Joomla!

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 9: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Solution Overview

Joomla! Extension

A solution based on an Extension for Joomla!

Component

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 10: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Solution Overview

Joomla! Extension

A solution based on an Extension for Joomla!

Plugin

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 11: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Solution Overview

Joomla! Extension

A solution based on an Extension for Joomla!

It offers a feasible solution for analyze published publicationsand to generate correspondent Linked Data

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 12: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Joomla! Content Example

TALISMAN+: Intelligent System for Follow-Up andPromotion of Personal AutonomyDavid Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.

The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.

Download

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 13: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Overview

Data is extracted throught three ways:

User defined Regular ExpressionDBLP SPARQL EndpointGoogle Scholar search engine

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 14: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Overview

Data is extracted throught three ways:

User defined Regular Expression

DBLP SPARQL EndpointGoogle Scholar search engine

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 15: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Overview

Data is extracted throught three ways:

User defined Regular ExpressionDBLP SPARQL Endpoint

Google Scholar search engine

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 16: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Overview

Data is extracted throught three ways:

User defined Regular ExpressionDBLP SPARQL EndpointGoogle Scholar search engine

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 17: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Regex I

User defines a regular expression to parse its content

User has to define used ontologies and their prefixes into theadmin control panel

The regex tags are clearly understandable

The ontology properties to be mapped are tagged between {}Every delimiter (also the {}) is identified by a \The term {dummy} can be used to ignore content

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 18: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Regex II

David Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.

The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.

Download

{dc : c r e a t o r , sep ( , )}\ . \{dc : t i t l e }\.\{swrc : s e r i e s }\. \{swrc : l o c a t i o n }\.\{dc : d a t e }\. \{b i b o : a b s t r a c t} \Download\$

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 19: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Regex II

David Ausın, Diego Lopez-de-Ipina, Jose Bravo, Miguel Angel Valero, Francisco Florez. TALISMAN+:Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop onAmbient Assisted Living - IWAAL 2011. Malaga, Spain. June 2011.

The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to researchand demonstrate innovative solutions transferable to society which offer services and products based oninformation and communication technologies in order to promote personal autonomy in prevention andmonitoring scenarios. It will solve critical interoperability problems among systems and emergingtechnologies in a context where heterogeneity brings about accessibility barriers not yet overcome anddemanded by the scientific, technological or social-health settings.

Download

{dc : c r e a t o r , sep ( , )}\ . \{dc : t i t l e }\.\{swrc : s e r i e s }\. \{swrc : l o c a t i o n }\.\{dc : d a t e }\. \{b i b o : a b s t r a c t} \Download\$

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 20: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

DBLP I

Digital Bibliography & Library Project

> 1.3 million articles

SPARQL endpoint at:

http://dblp.l3s.de/d2r/sparql/

http://dblp.l3s.de/d2r/snorql/

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 21: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

DBLP II

DBLP SPARQL endpoint is used to search data aboutpublications

SELECT DISTINCT ?uri ?p ?o WHERE {?uri dc:title“title-of-article”ˆˆ<http://www.w3.org/2001/XMLSchema#string>}

Data is enriched with our own data and saved into the RDFstore

We also link members FOAF’s to DBLP authors data<http://www.morelab.deusto.es/resource/dipina> owl:sameAs<http://dblp.l3s.de/d2r/resource/authors/Diego Lopez-de-Ipina> ;

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 22: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Google Scholar I

A simple way to broadly search for scholarly literature

http://scholar.google.com

It exports data in diferent formats

BibTeXEndNoteRefManRefWorksWenXiangWang

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 23: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Google Scholar II

The data from GS is extracted via BibTeX scrapping

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 24: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Google Scholar II

The data from GS is extracted via BibTeX scrapping

An HTTP request using an specific cookie to retrieve BibTeXdata

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 25: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Google Scholar II

The data from GS is extracted via BibTeX scrapping

BibTeX data is retrieved

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 26: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Google Scholar II

The data from GS is extracted via BibTeX scrapping

Mapping from BibTeX data to RDF

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 27: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

FOAF

Every member of our group has its own FOAF file

http://www.morelab.deusto.es/resource/member-alias

Every publication is linked to its author’s URI<http://www.morelab.deusto.es/resource/imhotep-an-approach-to-user-and-device-conscious-mobile-applications> dc:creator<http://www.morelab.deusto.es/resource/dipina>

This is done automatically looking for author’s nicknames

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 28: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Data Extraction

Flowchart

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 29: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 30: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 31: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 32: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 33: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 34: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 35: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Joseki + SDB

Joseki

A SPARQL server for JenaStorage into RDF files and relational databasesIt allows SPARQL UpdatesIt is private for our system

SDB

A component of JenaIt provides:

Scalable storageQuery of RDF datasets using conventional SQL databases

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 36: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Joseki + SDB

Joseki

A SPARQL server for JenaStorage into RDF files and relational databasesIt allows SPARQL UpdatesIt is private for our system

SDB

A component of JenaIt provides:

Scalable storageQuery of RDF datasets using conventional SQL databases

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 37: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Pubby

Pubby adds Linked Data interfaces to SPARQL endpointsIt allows content negotiation among these formats:

HTMLRDF/XMLN3

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 38: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

System Architecture

Snorql

An AJAXy front-end for exploring RDF SPARQL endpoints

More usable than Joseki

It is MoreLab’s public SPARQL endpoint

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 39: Towards the Integration of Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Page 40: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Admin Overview

Dataset Creation:

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 41: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Admin Overview

Ontology Prefix Definition:

Regex Definition:

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 42: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

User Overview

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 43: Towards the Integration of Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Page 44: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Conclusions

This solution integrates our data into Web of Data easily

Provides a reusable solution

Opens the door to more extendable solutions

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 45: Towards the Integration of Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our SolutionFirst ApproachSolution OverviewData ExtractionSystem Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Page 46: Towards the Integration of Research Group Website into the Web of Data

Motivation Our Solution Linked Data Extension Conclusions Future Work

Future Work

Link our datasets with more external datasets

DBPediaGeonames

RDF and SPARQL search form

Externalize linked data sources

Building the Extension modularly

Mikel Emaldi, David Bujan, Diego Lopez de Ipina DeustoTech - Internet

Towards the Integration of a Research Group Website into the Web of Data

Page 47: Towards the Integration of Research Group Website into the Web of Data

Towards the Integration of a Research GroupWebsite into the Web of Data

Mikel Emaldi David Bujan Diego Lopez de Ipina{m.emaldi, dbujan, dipina}@deusto.es

Deusto Institute of Technology - DeustoTech

November 2011