Virtual Meeting 2
15 April 2016
ISA Programme Action 1.1
StatDCAT-AP
Opening, agenda, tour de table
Agenda
1. Opening, agenda, tour de table
2. Objectives of the meeting
3. Modelling approach
4. Implementation options
5. Examples of mappings
6. Next steps
Tour de table
Objectives of the meeting
Intended outcome
• Discuss, agree modelling approach
o Mapping from local implementations through SDMX-based intermediate format to DCAT-AP with extensions
• Discuss, agree possible implementation options
o SDMX Metadata Set, SDMX Data Set, SDMX XML Structure Message
• Discuss and comment on some examples
• Establish basis for further work
Modelling approach
Overview of approach
StatDCAT-AP/RDF SDMX intermediate/XMLLocal data
(SDMX or other)
StatDCAT-AP from SDMX intermediate
• StatDCAT-AP, in line with DCAT-AP
o Export allows harvesting by general data portals
o Extension for specific aspects of statistical data
• SDMX intermediate
o Common layer across statistical systems
o Modelled in SDMX
o Aligning closely with StatDCAT-AP
o Use of common tools for export (XML to RDF) and validation
SDMX intermediate from local data
• SDMX intermediate format
o Common mapping target
o Using structure and terminology familiar to statistical data providers
• Local data
o No requirement to change local approach, only need to define extraction as an add-on
o Local SDMX implementations may share common tools for extraction
Discussion items
• Advantages, disadvantages of SDMX-based intermediary structure as common layer
• Opportunities for common tools, e.g. export, validation
Implementation options
Preamble
• SDMX Structural Metadata o Can support (i.e. map to) most DCAT-AP classes
o Cannot alone support all DCAT-AP mandatory and recommended properties
o Cannot support many of the DCAT-AP optional properties
• SDMX has Additional Mechanisms to Support DCAT-AP Classes and Propertieso Annotations (structural metadata) representing DCAT Properties
o Metadata Structure with Attributes representing DCAT-AP Classes and Properties
o Data Structure with Attributes representing DCAT-AP Classes and Properties
• Organisations are free to choose whatever mechanism best suits their needs
Preamble
• StatDCAT-AP could recommend the use of an intermediary mechanism that would hold all DCAT classes and propertieso Using SDMX constructs
o Organisations could then output DCAT-AP metadata to these constructs which would then be converted to the DCAT-AP RDF format
o Conversion software components could then be developed using the SDMX common component architecture for use in any SDMX system
However
Choices for SDMX Component for Intermediary Mechanism
• SDMX-ML Structure Message – using SDMX Annotations for DCAT properties not supported by SDMX structural metadata
• Metadata Set – where the metadata attributes contain the DCAT property values
• Data Set - where the data attributes contain the DCAT property values
SDMX-ML Structure Message with Annotations
• Advantageso Annotation is the syntax extensibility mechanism for SDMX structural
metadata
o Is supported by SDMX Registry (Structural Metadata repository)
o Annotations are avaialble for nearly all of the SDMX structural metadata components
o Can be multiple annotations for a specific component (e.g. a specific Dataflow (DCAT Dataset))
- Annotation Type would be StatDCAT-AP
- Annotation Title would be the DCAT Property
- Annotation Text or Annotation URL content would be the content of the DCAT property
- Annotation Text can be multi-lingual
SDMX-ML Structure Message with Annotations
• Disadvantageso Annotations cannot be
- coded (representation is restricted to text and URL)
- hierarchical (but there is a mechanism to achieve this)
- validated by SDMX validators (e.g. that the Title is valid)
A specific validator would need to be developed
- be given mandatory and optional status (all Annotations are optional)
o Could create unnecessary “noise” when exchanging structural metadata with other organisations if this is the source of the metadata in an SDMX Registry-compliant metadata source
Reference Metadata – how does it work
linked to the object by a simple “reference”
to the object
can be stored and exchanged without being embedded in
the data or metadata message
In the SDMX model, data or structural objects can have additional metadata added
Metadata Attributese.g.
Keyword=PopulationKeyword=FranceDocumentation=HTTP://ec.europa.ec/….
Specific object reference such as an identified Dataflow
Metadata Set Contains
Allowed Content of Reference Metadata is defined in a Metadata Structure Definition
Collection of
metadata
Reference Metadata
• Advantageso Attributes can be
- assigned any type of representation (e.g. coded, text, HTML, boolean etc.)
- hierarchical
- validated
- usage status can be mandatory or optional
o Attribute set can reference any object that can be identified (e.g. Dataflow, Provision Agreement, Category Scheme)
o Is separate from the structural metadata so does not affect the structural metadata components
o If present, a Metadata Attribute can be “presentational”, just giving structure to child attributes
• Disadvantageso Not always well understood by SDMX users (may result in some reluctance to
use this mechanism
o Not widely used
Data set
Data
Object
Collection of metadata
Reference
Associated
linked to the object by dimension values to identify the object
e.g. Agency, Object Type, Object Id
Specific object reference such as an identified Dataflow
Data Attributese.g.
Keyword=PopulationDocumentation=HTTP://ec.europa.ec/….
Allowed Content of a Dataset is defined in a Data Structure Definition
Data set• Advantages
o Well understood component in SDMX
o Attributes can be
- coded
- validated
- assigned a Usage status (mandatory or conditional)
- Have a variety of representations e.g. coded, text, integer etc.
o Set of attributes can reference any object that can be identified by a multi-dimensional key (e.g. Agency/Object Type/Object Id for a Dataflow, Provision Agreement, Category Scheme)
o Is separate from the structural metadata so does not affect the structural metadata components
• Disadvantageso Attributes cannot be hierarchical
o Cannot be repetitive
o HTML content is not supported explicitly (i.e. the representation assigned to the Attribute in the structure does not include the option for HTML, but nevertheless the content could be HTML – e.g. if “text” is the assigned representation)
Intermediary Mechanism
DCAT-AP
Intermediary File or Data StreamSDMX
Structural Metadata Repository
Other Metadata Sources
SDMX Data Reader
SDMX Data Writer
Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set
These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher
Organisation
Discussion items
• Advantages, disadvantages of Structural Metadata Message, versus Metadata Set, versus Data Set
• Operational challenges
• Tools
Mapping SDMX to DCATExample
Mapping to DCAT• Many SDMX Systems have Structural Metadata that define the
data in a data repository
• This Structural Metadata may be held in an SDMX Registry but could be curated elsewhere
• The Example Mapping is made in the context of the SDMX Information Model
High Level Mapping DCAT Classes to SDMX Classes
Data Flow
Data Provider
Data Provider Scheme
Provision Agreement
Registered Data Source
Category Scheme
Category
DCAT DATASET
DCAT DISTRIBUTION
DCAT CATEGORY SCHEME
Category Scheme
Category
DCAT Catalogue
Agency
DCAT Agent
These are SDMX Classes
Category
SDMX Structural Metadata - Example
Data Flow
Data Provider
Data Provider Scheme
Provision Agreement
Registered Data Source
Category Scheme
Category
DCAT DATASET
DCAT DISTRIBUTION
DCAT CATEGORY SCHEME
Category Scheme
Category
DCAT Catalogue
Agency
DCAT Agent
ESTAT: DF_HC58ESTAT:Cens01_neisco
MDR Themes
ESTAT
ESTATCensus by Education and Occupation
http://localhost:8080/FusionRegistry/ws/rest/
Population and Society
Mapping SDMX to DCATExample use of different SDMX artefacts for the
Intermediary Representation
Reminder: Intermediary Mechanism
DCAT-AP
Intermediary File or Data StreamSDMX
Structural Metadata Repository
Other Metadata Sources
SDMX Data Reader
SDMX Data Writer
Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set
These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher
Organisation
Mapping
• Shows examples of classes and mandatory and recommended properties for o Catalogue
o Category Scheme
o Dataset
o Distribution
o Agent
SDMX-ML Structure
(using Annotations for additional metadata)
Intermediary File or Data Stream
Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set
SDMX-ML: DCAT Catalogue
Mapping: Title is the DCAT propertyType identifies the Annotation as a StatDCAT-AP Property of the Object (e.g. Category Scheme) URL is any URIText is the value of the property (if not a URI). It can be multi-lingual
Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystemlicence dct:license dct:LicenseDocument
Property URI Rangedataset dcat:dataset dcat:Datasetdescription dct:description rdfs:Literalpublisher dct:publisher foaf:Agenttitle dct:title rdfs:Literal
SDMX-ML: DCAT Catalogue
Note that the can be any URI
Property URI Range
themesdcat:themeTaxonomy
skos:ConceptScheme
Property URI Rangedataset dcat:dataset dcat:Dataset
SDMX-ML: DCAT Category Scheme
This maps directly to the SDMX Category Scheme. No additional metadata required
Property URI Rangetitle dct:title rdfs:Literal
Property URI Range
preferred label
skos:prefLabel rdfs:Literal
SDMX-ML: DCAT Dataset
Property URI Range
dataset distribution
dcat:distribution dcat:Distribution
keyword/ tag dcat:keyword rdfs:Literal
publisher dct:publisher foaf:Agent
theme/ category
dcat:theme, subproperty of dct:subject
skos:Concept
Property URI Range
description dct:description rdfs:Literal
title dct:title rdfs:Literal
SDMX-ML: DCAT Dataset
Property URI Range
contact point dcat:contactPoint vcard:Kind
SDMX-ML: DCAT Distribution
Property URI Range
access URL
dcat:accessURL
rdfs:Resource
Property URI Rangedescription dct:description rdfs:Literal
format dct:formatdct:MediaTypeOrExtent
licence dct:licensedct:LicenseDocument
SDMX Metadata Set
(using Metadata Attributes for all metadata)
Intermediary File or Data Stream
Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set
Metadata Structure Definition: Attributes
SDMX: Metadata Report Catalogue
Property URIhomepage foaf:homepagelanguage dct:languagelicence dct:license
Property URI
dataset dcat:datasetdescription dct:descriptionpublisher dct:publishertitle dct:title
Property URI
themesdcat:themeTaxonomy
SDMX: Metadata Report – Category Scheme
Property URItitle dct:title
Property URI
preferred label
skos:prefLabel
SDMX: Metadata Report - Dataset
Property URI
dataset distribution dcat:distribution
keyword/ tag dcat:keyword
publisher dct:publisher
theme/ categorydcat:theme, subproperty of dct:subject
Property URI
description dct:description
title dct:title
Property URI
contact point dcat:contactPoint
SDMX: Metadata Report - Distribution
Property URI
access URL
dcat:accessURL
Property URIdescription dct:description
format dct:format
licence dct:license
SDMX-ML Data Set
(Using Data Attributes for all metadata)
Intermediary File or Data Stream
Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set
Data Structure Definition
(SDMX) Dataset - Catalogue
Property URIhomepage foaf:homepagelanguage dct:languagelicence dct:license
Property URIdataset dcat:datasetdescription dct:descriptionpublisher dct:publishertitle dct:title
Property URI
themesdcat:themeTaxonomy
(SDMX) Dataset – Category SchemeProperty URItitle dct:title
Property URIpreferred label
skos:prefLabel
(SDMX) Dataset - Dataset
Property URI
dataset distribution dcat:distribution
keyword/ tag dcat:keyword
publisher dct:publisher
theme/ categorydcat:theme, subproperty of dct:subject
Property URI
description dct:description
title dct:title
Property URI
contact point dcat:contactPoint
(SDMX) Dataset - Distribution
Property URI
access URL
dcat:accessURL
Property URIdescription dct:description
format dct:format
licence dct:license
Q & A
More issues? Comments, questions?
Next steps
Developing mapping and guidelines
• Mapping of intermediate format to StatDCAT-AP will take place on Google spreadsheet, open for participation and comment
• Provider-specific extraction guidelines towards intermediate format to be developed in parallel, by Eurostat and other interested parties – more can be added later
Future planning
• December 2015: invitations to stakeholders, set up collaboration infrastructure
• January 2016: collect requirements and suggestions
• 5 February 2016: Familiarisation Webinar
• February 2016: first draft based on initial analysis and issues raised
• 11 March 2016: first virtual WG meeting to discuss first draft
• 15 April 2016: second meeting; to discuss draft mapping and implementation options
• 6 May 2016: second draft available for review, incorporating comments and further development
• 13 May 2016: third meeting (face-to-face plus Adobe Connect) in Rome; to discuss mapping issues in practice
• End of May 2016: third draft, including full mapping proposal and usage of controlled vocabularies
• Early June 2016: fourth meeting (virtual, Doodle); to agree schedule for public review
• July and August 2016: public review period
• Early September 2016: fifth meeting (virtual); to discuss and resolve public comments received
• Mid-September 2016: publication of StatDCAT-AP version 1
Next meeting 13 May 2016 13:00-14:30 CEST
Find more information: https://joinup.ec.europa.eu/node/149828
Save the date!
Stay tuned at: https://joinup.ec.europa.eu/node/148436
Join the SEMIC group on LinkedIn
Follow @SEMICeu on Twitter
Join the SEMIC community on Joinup
Project Officers [email protected]
Get involvedVisit our initiatives
Mapping SDMX to DCATDCAT Catalogue and Category (Topic) Scheme
Linking Catalogue to DCAT Datasets and Category (Topic) Scheme
• A Categorisation links a Category to an SDMX Object (such as a Datflow(DCAT dataset))
• Any one Category can link to many such objects (via multiple Categorisations)
• Any object can link to many Categories (via multiple Categorisations)
SDMX Model for Categorising Objects
SDMX Object
Category Scheme
CategoryCategorisation
Linking Catalogue to DCAT Datasets and Category (Topic) Scheme
Here
• One Category links to all of the Dataflows (Datasets) of the DCAT catalogue
• One category links to the Category Scheme of (DCAT) topics (can link to many of these if required)
Dataflow
Dataflow
Categorisation
Categorisation
Dataflow
Category Scheme(Catalogue)
Category(Dataflows)
Categorisation
Category(Topic Schemes)
CategorisationCategory Scheme
(Topics)
DCAT Catalogue in SDMX
Property URI Rangedataset dcat:dataset dcat:Datasetdescription dct:description rdfs:Literalpublisher dct:publisher foaf:Agenttitle dct:title rdfs:Literal
Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystem
licence dct:licensedct:LicenseDocument
release date dct:issuedrdfs:Literal typed as xsd:date or xsd:dateTime
themesdcat:themeTaxonomy
skos:ConceptScheme
update/ modification date
dct:modifiedrdfs:Literal typed as xsd:date or xsd:dateTime
Mandatory
Recommended
DCAT Catalogue SDMX Structural Metadata (XML)
Here, the Categorisation links the TOPIC_THEMES Category to the Category Schemes of topics
SDMX Category Scheme
SDMX Categorisation
SDMX Categorisation
Here the Categorisations link the DATASET Category to the Datasets
Supported by Organisation-Preferred Extensibility Mechanism
Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystemlicence dct:license dct:LicenseDocument
release date dct:issuedrdfs:Literal typed as xsd:date or xsd:dateTime
themesdcat:themeTaxonomy
skos:ConceptScheme
update/ modification date
dct:modifiedrdfs:Literal typed as xsd:date or xsd:dateTime
Recommended
Property URI Rangetitle dct:title rdfs:Literal
Property URI Rangepreferred label
skos:prefLabel
rdfs:Literal
DCAT Category Scheme
Mandatory
DCAT Category
Mandatory
SDMX Structural Metadata (XML)
Mapping SDMX to DCAT Category Scheme
Mapping SDMX to DCATDCAT Dataset
Property URI Range
contact point
dcat:contactPoint vcard:Kind
dataset distribution
dcat:distribution dcat:Distribution
keyword/ tag
dcat:keyword rdfs:Literal
publisher dct:publisher foaf:Agent
theme/ category
dcat:theme, subproperty of dct:subject
skos:Concept
Property URI Range
description dct:description rdfs:Literal
title dct:title rdfs:Literal
DCAT Dataset
Mandatory
Recommended
SDMX Structural Metadata (Registry GUI)
SDMX Agency
SDMX Dataflow
SDMX Category
SDMX Provision Agreement
Here is the link from the Category (topic) to the Dataflow (Dataset)
Property URI Range
contact point dcat:contactPoint vcard:Kind
dataset distribution
dcat:distribution dcat:Distribution
keyword/ tag dcat:keyword rdfs:Literal
publisher dct:publisher foaf:Agent
theme/ category
dcat:theme, subproperty of dct:subject
skos:Concept
Recommended
Supported by Organisation-Preferred Extensibility Mechanism
Mapping SDMX to DCATDCAT Distribution
SDMX Model for Provision Agreement (DCAT Distribution)
Data Flow
Data Provider
Data Provider Scheme
Provision Agreement
Registered Data Source
Census by Education and Occupation
http://localhost:8080/FusionRegistry/ws/rest/
DCAT DISTRIBUTION
ESTAT
These are SDMX Classes
Property URI Range
access URL
dcat:accessURL
rdfs:Resource
Property URI Rangedescription dct:description rdfs:Literal
format dct:formatdct:MediaTypeOrExtent
licence dct:licensedct:LicenseDocument
DCAT Distribution
Mandatory
This is the web service
SDMX Structural Metadata (XML)
Provision Agreement
Registered Data Source
Recommended
Supported by Organisation-Preferred Extensibility Mechanism
Recommended
Property URI Range
description dct:description rdfs:Literal
format dct:formatdct:MediaTypeOrExtent
licence dct:license dct:LicenseDocument