getlod - linked open data and spatial data infrastructuresontologies used by getlod have been...
TRANSCRIPT
GetLOD - Linked Open Data and
Spatial Data Infrastructures
W3C Linked Open Data LOD2014
Roma, 20-21 February 2014
Stefano Pezzi, Massimo Zotti, Giovanni Ciardi, Massimo Fustini
Context
Geoportal & OpenData Portal
SDI management
Towards Linked Open geoData
GetLOD: Open GeoData Solution
Agenda
2
Context
• Local and interoperable Geo-Information (GI) is
crucial for an increasing number of added value
services provided by private companies on top of
“open government data”
• Actually, local governments are playing an emerging
role as they represent authoritative sources for high-
quality certified data for interlinking external
information, and for smart cities applications
• In Europe main drivers for interoperable and open
data are INSPIRE and Public Sector Information
Directives and Open Data strategies, at various levels.
Context
• Usually geographical datasets are provided as “quick-
and-dirty”, simple and flat predefined files, with
heterogeneous data models, semantics, content, as
well
• Four critical issues:
– Local data should be published on different infrastructures;
– SDI and LOD infrastructures are not interoperable;
– Two parallel workflows and risk of additional workload and
data quality;
– GI lacks persistent URIs and information cannot easily be
linked on record level.
GetLOD is an open and reusable solution for publishing
geographic data on the Web as Linked Open Data,
according to the standard RDF / XML.
GetLOD thus ensures the Web publication of geospatial
data and its related metadata as open and linkable
data, starting from traditional cartographic webservices
GetLOD: Open GeoData Solution
5
Geoportal & OpenData Portal
• The Geoportal represents an important part of the
Open Data policy of the Region Emilia-Romagna.
• With a strong integration with the regional Open
Data portal, the Geoportal is a provider of
(geo)data in favor of the portal dati.emilia-
romagna.it
ER Geoportal
• The ER Geoportal allows the diffusion, the distribution
and the use of geospatial data, information and
geographical services both to the public and the
staff of the local and national government.
• It is compliant to the latest regional, national (AGiD)
and international (INSPIRE, CEN, ISO, OGC)
standards in terms of interoperability.
ER Geoportal
Home Page of the ER Geoportal
ER Geoportal
The data catalog
ER Geoportal
Example of ISO Metadata
Regional SDI management
Moka is a suite to organize
the Geographic Information
System and to develop
applications that provide GIS
services to citizens,
professionals and businesses.
Regione Emilia–Romagna
organizes his SDI with
Moka Catalog and builds
GIS applications with
Moka CMS (Content
Management System).
SDI and LOD will interoperate
through Moka
In Regione Emilia – Romagna SDI and LOD infrastructures
will interoperate through Moka.
• Moka Content Management System organizes SDI
and builds GIS applications (web, desktop, apps for
smartphone).
• Moka Catalog organizes the whole SDI
• Moka builds open data using “GetLOD” services.
SDI and OpenData will
interoperate through Moka
How Moka (CMS GIS) helps users to create OpenData
through GetLOD
In Moka Catalog user
selects the geodata to
be published as Open
Data
Moka Catalog invokes GetLOD
services to create Open Data
Open Data are catalogued
in Moka repository
From Moka users can manage the
update of Open Data
1
5
4
3 2
In Regione Emilia – Romagna Moka will
Catalog
OpenData
Oraganize
SDI
• GeoData
• RDBMD and tables
• Web services
• Metadata (RNDT, Metadata RER)
• OpenData
• Functions and appications
Create applications
with data and
OpenData
• Web Applications
• Desktop Applications
• Apps for smartphone
• Uses GetLOD
services
Create
OpenData
SDI and OpenData will
interoperate through Moka
Data, if isolated, have little value.
The value of data increases when different data sets, produced and published independently by different individuals, can be crossed freely – by third parties.
The generation of dataset in RDF format (Linked Data) increases the value of the data allowing connections among themselves and with external dataset!
Towards Linked Open geo-Data
15
Towards Linked Open geo-Data
• Free data is not enough!
In order to offer a really useful service to citizens,
institutions and companies, you need to aggregate,
process data and offer them as services.
• The creation of an "ontology network" of the Geoportal
data allows to move from one conceptual dataset to
another.
• Ontologies are considered one of the pillars of the
Semantic Web: a number of on-going initiatives in EU
Member States and EU projects (such as InGeoCloudS,
GeoKnow and SmartOpenData) are creating RDF
vocabularies based on the INSPIRE data models.
17
L'integrazione a livello di dati
Applicazione sopra il modello concettuale esplicito Applicazione sopra il modello concettuale esplicito Applicazione sopra il modello concettuale esplicito Applicazione sopra il modello concettuale esplicito
L'integrazione a livello di dati
Applicazione sopra il modello concettuale esplicito
Integration at the level data
Application over conceptual model
Towards Linked Open geo-Data
The focus of GetLOD is on the governance of Linked
Open Data from authoritative sources: data about
addresses and buildings derive from municipal registers
(e.g. building permits) provided by more than 200
municipalities, 9 provinces and gathered by the Region
in the DBTR (Regional Topographic DB).
GetLOD: Open GeoData Solution
18
Open and reusable solution
It is integrated with the Spatial Data Infrastructure
thanks to the standards defined by the Open
Geospatial Consortium (OGC) WFS and CSW.
It allows to publish the geographic open data
both as RDF (Linked Open Data )
and, as a side effect, in other non-linkable
interchange formats Shapefile and GML )
GetLOD: Open GeoData Solution
19
20
GeoRepository
GI Middleware
MD 19115
RDF
dump
TripleStore
www
OGC server
OGC WFS
GI Data & Metadata
LOD Back-end
MD server
OGC
CSW
Download
Triple server
LOD Front-end
JAVA
API
connettori
mapping
file GetLOD
MD catalogo
Open Data Catalogo
catalogazione
Ricerca
API
GetLOD: Open GeoData Solution
GetLOD is substantialy a batch RDFizer that extracts
data from OGC services and transforms them in
RDF/XML.
It’s a java application that can execute scheduled
transformation jobs.
A mapping file between GML elements and an ontology
concepts controls the transformation.
The core transformer is based on Apache Velocity.
Data as well metadata are transformed in RDF graphs.
GetLOD: Open GeoData Solution
21
GetLOD has a plug-in architecture for what concernes
the output destination of data extracted, so you can:
• Create a dump file;
• Transfer the file and index it on the ER custom open
data portal;
• Load the data into standard (CKAN) open data
portals using APIs.
• Load the data into a SPARQL endpoint
• ...
GetLOD: Open GeoData Solution
22
Ontologies used by GetLOD have been derived directly
from the conceptual model of the Topographic DB or,
better, from the dissemination model of the DBTR.
We did not start from scratch, asking ourself “what is a
building?”. In this way, the mapping of concepts was
fairly direct.
Nevertheless, existing ontologies has been reused where
possible, especially for geometry.
Particular attention has been paid thinking at the real
use of geometry in LOD data and we made some
reasoning and drewed some conclusions…
GetLOD: Open GeoData Solution
23
Geometry
1. LOD data are especially used in mash up apps that
likely use common maps APIs
WGS84 instead of official regional SRS
GetLOD: Open GeoData Solution
24
Geometry
2. If XML is verbose, RDF is really prolix
In LOD context, location usually is more important
than shape
No complex geometry, but only simple & derived
centroids for buildings, bounding box for Administrative Units
GetLOD: Open GeoData Solution
25
Geometry
3. OGC service are still there, but let’s use them only
when we need them.
Link to WFS GetFeature for Full geometry
If an app need to draw the shape of a particular building, RDF
carries the GetFeatureByID query as the value of a specific
ontology predicate.
GetLOD: Open GeoData Solution
26
Geometry
4. Standards are important, but does anyone already
use them?
Use “OGC GeoSparql”, but even “WGS84”
Redundancy is not a problem in LOD
GetLOD: Open GeoData Solution
27
In order to extract spatial LOD from SDI, some basic
principle must be adopted in the SDI data model;
fortunately DBTR already was almost compliant:
1. Unique and persistent identifier for every geographic
object
2. Hystorical management and object’s life cycle well
defined
But some things could be better:
1. UUID are not URI friendly
2. Codelist vs dictionaries
GetLOD: Open GeoData Solution
28
Not all geographical object are noteworthy: it makes no
sense to convert to LOD a contour line or a land cover
polygon.
Only spatial object that can be thought as individual
that can evolve in time (change and eventually die)
and can be referred by other objects in the same or
other datasets can be correctly converted to LOD.
A lifeless object does not really die, that’s why you
should define its life cycle, that is which are the events
that terminate its individual identity.
GetLOD: Open GeoData Solution
29
Interlinking
The data that GetLOD extracts do not have interlinks for
the moment.
Interlinks are important but since we are talking about
datasets coming from authoritative sources, interlinks
that lead to general dataset like Geonames do not add
particular value.
Interlinks should be created also from other PA datasets
towards these reference data.
GetLOD: Open GeoData Solution
30
GetLOD: Open GeoData Solution
31
1. Identification &
dataset selection
6. validation
7. release
2. cleaning up
3. analysis & modeling
4. enrichmenet
5. external linking
LOD Life Cycle
Source“Linee Guida per l’Interoperabilità Semantica attraverso i Linked Open Data” (Agenzia per l'Italia Digitale)
GetLOD: solution that implements the entire LOD Life Cycle
GetLOD: Open GeoData Solution
32
ShapeFile
GML
XML metadata ISO 19115 (RNDT compliant)
XML describing OpenData
RDF for Data
RDF for Metadata ISO 19115 (RNDT
compliant)
XML describing Linked Open Data
Generate OpenData
Generate Linked OpenData
GetLOD: Open GeoData Solution
33
GetLOD: Open GeoData Solution
34
GetLOD RDF Browser integrates:
GeoER-API
RDF Administrative boundaries, Buildings, Road Toponyms, Civic Numbers
Events from E-R Culture (http://dati.emilia-romagna.it/dato/item/37-37-eventi-e-r-cultura.html?goback=.gmp_3816349.gde_3816349_member_203780094)
Query Endpoint SPARQL
GetLOD: Demo
35
In 2014 we will focus on:
• Interoperability: In Regione Emilia – Romagna SDI
and LOD infrastructures will interoperate through
Moka
• Interlinking: to compare entities from different
datasets available as LOD and calculate
similarities through textual, geographical and
temporal distance to match
• Natural browsing: to integrate the existing map
viewer with navigation and browsing of Linked
Open Data
GetLOD: Evolution in 2014
36
GetLOD: Evolution in 2014
37