methodological guidelines for publishing linked...

Post on 01-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Methodological Guidelines for Publishing

Linked Data

Boris Villazón-Terrazas bvillazon@isoco.com

@boricles Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: OEG

2

Main References

2

Wood, David (Ed) Linking Government Data - 2011!

Methodological Guidelines for Publishing Government Linked Data!

Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez!

Best Practices for Publishing Linked Data!

W3C Editor’s Draft – Government Linked Data Working Group!

Bernadette Hyland, Boris Villazón-Terrazas, Ghislain Atemezing!

https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html!

Cookbook for Open Government Linked Data!

W3C Editor’s Draft – Government Linked Data Working Group!

Bernadette Hyland, Boris Villazón-Terrazas!

http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook!

3

ToC »  Introduction

» Guidelines for Publishing Linked Data

» Uses Cases

4

Publishing Linked Data

?

Process that involves a high number of steps, design decisions and technologies.

5

Iterative and Incremental Linked Data Life Cycle

6

Iterative and Incremental Linked Data Life Cycle

7

Specification

§  Identification and analysis of the data sources

§  URI design

§  Definition of the license

8

Specification Identification and analysis of the data sources

After we have identified and selected the government data sources

§  Search and compile all the available data and

documentation about those resources

§  Identify the schema of those resources including conceptual components and their relationships

§  Identify the items in the domain, i.e., things whose properties and relations are described in the data sources

9

Specification URI Design

§  Use meaningful URIs, instead of opaque URIs, when possible

§  Separate TBox (ontology model) from ABox (instances) URIs.

-  Base URI http://data.gov.bo/ http://health.data.gov.bo/

-  TBox URIs http://data.gov.bo/ontology/{class|property}

-  ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque

10

Iterative and Incremental Linked Data Life Cycle

11

Modelling Reuse available vocabularies

Search for suitable vocabularies

Linked Open Vocabularies

are there suitable

vocabularies?

Build the vocabulary by reusing available

vocabularies

Yes

No

12

Modelling Reuse available non-ontological resources

Search for suitable non-ontological resources

Highly reliable Web Sites

Domain-related sites

Government Catalogs

are there suitable

resources?

Build the vocabulary by transforming available

resources

Yes

No

Build the vocabulary from scratch

Boris Villazón-Terrazas, A Method for Reusing and Re-Engineering Non-Ontological Resources for Building Ontologies. IOS Press 2012

*

13

Iterative and Incremental Linked Data Life Cycle

14

Generation

§  Transformation

§  Data cleansing / curation

§  Linking

15

Generation Transformation

§  Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity

§  Some tools -  CSV and spreadsheets

•  RDF extension of Google Refine, XLWrap, RDF123, NOR2O

-  RDB •  D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML

-  XML •  GRDDL, ReDeFer

http://www.w3.org/wiki/ConverterToRdf

16

Generation Transformation – RDB2RDF

§  A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

§  W3C RDB2RDF Working Group R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/ RDB2RDF Implementation Report - http://www.w3.org/TR/rdb2rdf-test-cases/

16

transformation description

transformation engine

17

Generation Transformation – Spreadsheets to RDF

17

Industry Production Index

Province

Year

NOR2O  

18

Generation

§  Tool for generating RDF from geospatial information

§  The geometry could be available in GML or WKT

https://github.com/boricles/geometry2rdf

Transformation – Geospatial to RDF

19

Generation Transformation – MARC21 to RDF

19

§  A MARC Mappings and RDF generator

MARiMbA  

Classification

Annotation

Mapping templates

Relationships

MARiMbA  

Domain experts

20

Generation Linking

Identify suitable data sets as linking targets

Look for similar datasets in http://thedatahub.org/

Discover relationships between data items

Silk Framework LIMES

Validate the relationships discovered

http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/

Look for our resources in tools like sig.ma

21

Iterative and Incremental Linked Data Life Cycle

22

Publication

§  Dataset publication

§  Metadata publication

§  Dataset discovery

23

Publication Dataset publication

§  Tools for storing RDF/SPARQL endpoint/Linked Data frontend

-  Virtuoso Universal Server, Jena, Sesame, 4Store, YARS, OWLIM, Talis Platform, Fuseki, Pubby, Linked Data API

§  Store the RDF data in different graphs -  http://example.com/graph/ontology -  http://example.com/graph/dataset -  http://example.com/graph/links

24

Publication Metadata Publication

§  VoID allows to express metadata about RDF datasets

§  The PROV Ontology

http://www.w3.org/TR/void/ http://www.w3.org/TR/prov-o/

25

Publication Dataset discovery

§  Register the dataset into CKAN Registry, thedatahub.org

§  Generate sitemap files for your dataset, by using sitemap4rdf

§  Submit the sitemap location to Google and Sindice

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/

26

Publication Dataset discovery - Example

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/

27

Iterative and Incremental Linked Data Life Cycle

28

Exploitation

Streaming resources

29

Exploitation

•  Faceted browser interface.

•  Geospatial visualization using Google Maps and Open Street Maps.

•  Visualization of geometries (LineStrings, Polygons, etc) when using the GeoLinkedData data model.

•  Visualization of statistical data using SCOVO / RDF Data Cube.

map4rdf

https://github.com/boricles/linked-data-visualization-tools

SPARQL

Triplestore

30

ToC »  Introduction

» Guidelines for Publishing Linked Data

» Uses Cases

31

http://geo.linkeddata.es

1. Specification 2. Modelling

3. Generation 4. Publication & Exploitation

32

http://aemet.linkeddata.es/browser_en.html

1. Specification 2. Modelling

3. Generation 4. Publication & Exploitation

Python scritps

250 weather stations (pressure, humidity, etc)

Data from the stations in CSV files in a FTP server

33

http://bne.linkeddata.es/graphvis/ http://datos.bne.es/

1. Specification 2. Modelling

3. Generation 4. Publication &

Exploitation

MARC 21 XML records

MARiMbA  

Classification

Annotation

Mapping templates

Relationships

MARiMbA  

Domain experts

34

http://webenemasuno.linkeddata.es

1. Specification 2. Modelling

3. Generation 4. Publication & Exploitation

Scenario in the context of tourism and travelling, where the content is aggregated from different platforms. Heterogeneous content (images, travel guides, posts,

videos, news)

35

Methodological Guidelines for Publishing Linked Data

Boris Villazón-Terrazas bvillazon@isoco.com

@boricles Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: OEG

top related