statdcat-ap · 2017. 10. 3. · intended outcome • discuss, agree modelling approach o mapping...

Post on 19-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Virtual Meeting 2

15 April 2016

ISA Programme Action 1.1

StatDCAT-AP

Opening, agenda, tour de table

Agenda

1. Opening, agenda, tour de table

2. Objectives of the meeting

3. Modelling approach

4. Implementation options

5. Examples of mappings

6. Next steps

Tour de table

Objectives of the meeting

Intended outcome

• Discuss, agree modelling approach

o Mapping from local implementations through SDMX-based intermediate format to DCAT-AP with extensions

• Discuss, agree possible implementation options

o SDMX Metadata Set, SDMX Data Set, SDMX XML Structure Message

• Discuss and comment on some examples

• Establish basis for further work

Modelling approach

Overview of approach

StatDCAT-AP/RDF SDMX intermediate/XMLLocal data

(SDMX or other)

StatDCAT-AP from SDMX intermediate

• StatDCAT-AP, in line with DCAT-AP

o Export allows harvesting by general data portals

o Extension for specific aspects of statistical data

• SDMX intermediate

o Common layer across statistical systems

o Modelled in SDMX

o Aligning closely with StatDCAT-AP

o Use of common tools for export (XML to RDF) and validation

SDMX intermediate from local data

• SDMX intermediate format

o Common mapping target

o Using structure and terminology familiar to statistical data providers

• Local data

o No requirement to change local approach, only need to define extraction as an add-on

o Local SDMX implementations may share common tools for extraction

Discussion items

• Advantages, disadvantages of SDMX-based intermediary structure as common layer

• Opportunities for common tools, e.g. export, validation

Implementation options

Preamble

• SDMX Structural Metadata o Can support (i.e. map to) most DCAT-AP classes

o Cannot alone support all DCAT-AP mandatory and recommended properties

o Cannot support many of the DCAT-AP optional properties

• SDMX has Additional Mechanisms to Support DCAT-AP Classes and Propertieso Annotations (structural metadata) representing DCAT Properties

o Metadata Structure with Attributes representing DCAT-AP Classes and Properties

o Data Structure with Attributes representing DCAT-AP Classes and Properties

• Organisations are free to choose whatever mechanism best suits their needs

Preamble

• StatDCAT-AP could recommend the use of an intermediary mechanism that would hold all DCAT classes and propertieso Using SDMX constructs

o Organisations could then output DCAT-AP metadata to these constructs which would then be converted to the DCAT-AP RDF format

o Conversion software components could then be developed using the SDMX common component architecture for use in any SDMX system

However

Choices for SDMX Component for Intermediary Mechanism

• SDMX-ML Structure Message – using SDMX Annotations for DCAT properties not supported by SDMX structural metadata

• Metadata Set – where the metadata attributes contain the DCAT property values

• Data Set - where the data attributes contain the DCAT property values

SDMX-ML Structure Message with Annotations

• Advantageso Annotation is the syntax extensibility mechanism for SDMX structural

metadata

o Is supported by SDMX Registry (Structural Metadata repository)

o Annotations are avaialble for nearly all of the SDMX structural metadata components

o Can be multiple annotations for a specific component (e.g. a specific Dataflow (DCAT Dataset))

- Annotation Type would be StatDCAT-AP

- Annotation Title would be the DCAT Property

- Annotation Text or Annotation URL content would be the content of the DCAT property

- Annotation Text can be multi-lingual

SDMX-ML Structure Message with Annotations

• Disadvantageso Annotations cannot be

- coded (representation is restricted to text and URL)

- hierarchical (but there is a mechanism to achieve this)

- validated by SDMX validators (e.g. that the Title is valid)

A specific validator would need to be developed

- be given mandatory and optional status (all Annotations are optional)

o Could create unnecessary “noise” when exchanging structural metadata with other organisations if this is the source of the metadata in an SDMX Registry-compliant metadata source

Reference Metadata – how does it work

linked to the object by a simple “reference”

to the object

can be stored and exchanged without being embedded in

the data or metadata message

In the SDMX model, data or structural objects can have additional metadata added

Metadata Attributese.g.

Keyword=PopulationKeyword=FranceDocumentation=HTTP://ec.europa.ec/….

Specific object reference such as an identified Dataflow

Metadata Set Contains

Allowed Content of Reference Metadata is defined in a Metadata Structure Definition

Collection of

metadata

Reference Metadata

• Advantageso Attributes can be

- assigned any type of representation (e.g. coded, text, HTML, boolean etc.)

- hierarchical

- validated

- usage status can be mandatory or optional

o Attribute set can reference any object that can be identified (e.g. Dataflow, Provision Agreement, Category Scheme)

o Is separate from the structural metadata so does not affect the structural metadata components

o If present, a Metadata Attribute can be “presentational”, just giving structure to child attributes

• Disadvantageso Not always well understood by SDMX users (may result in some reluctance to

use this mechanism

o Not widely used

Data set

Data

Object

Collection of metadata

Reference

Associated

linked to the object by dimension values to identify the object

e.g. Agency, Object Type, Object Id

Specific object reference such as an identified Dataflow

Data Attributese.g.

Keyword=PopulationDocumentation=HTTP://ec.europa.ec/….

Allowed Content of a Dataset is defined in a Data Structure Definition

Data set• Advantages

o Well understood component in SDMX

o Attributes can be

- coded

- validated

- assigned a Usage status (mandatory or conditional)

- Have a variety of representations e.g. coded, text, integer etc.

o Set of attributes can reference any object that can be identified by a multi-dimensional key (e.g. Agency/Object Type/Object Id for a Dataflow, Provision Agreement, Category Scheme)

o Is separate from the structural metadata so does not affect the structural metadata components

• Disadvantageso Attributes cannot be hierarchical

o Cannot be repetitive

o HTML content is not supported explicitly (i.e. the representation assigned to the Attribute in the structure does not include the option for HTML, but nevertheless the content could be HTML – e.g. if “text” is the assigned representation)

Intermediary Mechanism

DCAT-AP

Intermediary File or Data StreamSDMX

Structural Metadata Repository

Other Metadata Sources

SDMX Data Reader

SDMX Data Writer

Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set

These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher

Organisation

Discussion items

• Advantages, disadvantages of Structural Metadata Message, versus Metadata Set, versus Data Set

• Operational challenges

• Tools

Mapping SDMX to DCATExample

Mapping to DCAT• Many SDMX Systems have Structural Metadata that define the

data in a data repository

• This Structural Metadata may be held in an SDMX Registry but could be curated elsewhere

• The Example Mapping is made in the context of the SDMX Information Model

High Level Mapping DCAT Classes to SDMX Classes

Data Flow

Data Provider

Data Provider Scheme

Provision Agreement

Registered Data Source

Category Scheme

Category

DCAT DATASET

DCAT DISTRIBUTION

DCAT CATEGORY SCHEME

Category Scheme

Category

DCAT Catalogue

Agency

DCAT Agent

These are SDMX Classes

Category

SDMX Structural Metadata - Example

Data Flow

Data Provider

Data Provider Scheme

Provision Agreement

Registered Data Source

Category Scheme

Category

DCAT DATASET

DCAT DISTRIBUTION

DCAT CATEGORY SCHEME

Category Scheme

Category

DCAT Catalogue

Agency

DCAT Agent

ESTAT: DF_HC58ESTAT:Cens01_neisco

MDR Themes

ESTAT

ESTATCensus by Education and Occupation

http://localhost:8080/FusionRegistry/ws/rest/

Population and Society

Mapping SDMX to DCATExample use of different SDMX artefacts for the

Intermediary Representation

Reminder: Intermediary Mechanism

DCAT-AP

Intermediary File or Data StreamSDMX

Structural Metadata Repository

Other Metadata Sources

SDMX Data Reader

SDMX Data Writer

Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set

These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher

Organisation

Mapping

• Shows examples of classes and mandatory and recommended properties for o Catalogue

o Category Scheme

o Dataset

o Distribution

o Agent

SDMX-ML Structure

(using Annotations for additional metadata)

Intermediary File or Data Stream

Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set

SDMX-ML: DCAT Catalogue

Mapping: Title is the DCAT propertyType identifies the Annotation as a StatDCAT-AP Property of the Object (e.g. Category Scheme) URL is any URIText is the value of the property (if not a URI). It can be multi-lingual

Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystemlicence dct:license dct:LicenseDocument

Property URI Rangedataset dcat:dataset dcat:Datasetdescription dct:description rdfs:Literalpublisher dct:publisher foaf:Agenttitle dct:title rdfs:Literal

SDMX-ML: DCAT Catalogue

Note that the can be any URI

Property URI Range

themesdcat:themeTaxonomy

skos:ConceptScheme

Property URI Rangedataset dcat:dataset dcat:Dataset

SDMX-ML: DCAT Category Scheme

This maps directly to the SDMX Category Scheme. No additional metadata required

Property URI Rangetitle dct:title rdfs:Literal

Property URI Range

preferred label

skos:prefLabel rdfs:Literal

SDMX-ML: DCAT Dataset

Property URI Range

dataset distribution

dcat:distribution dcat:Distribution

keyword/ tag dcat:keyword rdfs:Literal

publisher dct:publisher foaf:Agent

theme/ category

dcat:theme, subproperty of dct:subject

skos:Concept

Property URI Range

description dct:description rdfs:Literal

title dct:title rdfs:Literal

SDMX-ML: DCAT Dataset

Property URI Range

contact point dcat:contactPoint vcard:Kind

SDMX-ML: DCAT Distribution

Property URI Range

access URL

dcat:accessURL

rdfs:Resource

Property URI Rangedescription dct:description rdfs:Literal

format dct:formatdct:MediaTypeOrExtent

licence dct:licensedct:LicenseDocument

SDMX Metadata Set

(using Metadata Attributes for all metadata)

Intermediary File or Data Stream

Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set

Metadata Structure Definition: Attributes

SDMX: Metadata Report Catalogue

Property URIhomepage foaf:homepagelanguage dct:languagelicence dct:license

Property URI

dataset dcat:datasetdescription dct:descriptionpublisher dct:publishertitle dct:title

Property URI

themesdcat:themeTaxonomy

SDMX: Metadata Report – Category Scheme

Property URItitle dct:title

Property URI

preferred label

skos:prefLabel

SDMX: Metadata Report - Dataset

Property URI

dataset distribution dcat:distribution

keyword/ tag dcat:keyword

publisher dct:publisher

theme/ categorydcat:theme, subproperty of dct:subject

Property URI

description dct:description

title dct:title

Property URI

contact point dcat:contactPoint

SDMX: Metadata Report - Distribution

Property URI

access URL

dcat:accessURL

Property URIdescription dct:description

format dct:format

licence dct:license

SDMX-ML Data Set

(Using Data Attributes for all metadata)

Intermediary File or Data Stream

Choices• SDMX-ML Structure• SDMX Metadata Set• SDMX Data Set

Data Structure Definition

(SDMX) Dataset - Catalogue

Property URIhomepage foaf:homepagelanguage dct:languagelicence dct:license

Property URIdataset dcat:datasetdescription dct:descriptionpublisher dct:publishertitle dct:title

Property URI

themesdcat:themeTaxonomy

(SDMX) Dataset – Category SchemeProperty URItitle dct:title

Property URIpreferred label

skos:prefLabel

(SDMX) Dataset - Dataset

Property URI

dataset distribution dcat:distribution

keyword/ tag dcat:keyword

publisher dct:publisher

theme/ categorydcat:theme, subproperty of dct:subject

Property URI

description dct:description

title dct:title

Property URI

contact point dcat:contactPoint

(SDMX) Dataset - Distribution

Property URI

access URL

dcat:accessURL

Property URIdescription dct:description

format dct:format

licence dct:license

Q & A

More issues? Comments, questions?

Next steps

Developing mapping and guidelines

• Mapping of intermediate format to StatDCAT-AP will take place on Google spreadsheet, open for participation and comment

• Provider-specific extraction guidelines towards intermediate format to be developed in parallel, by Eurostat and other interested parties – more can be added later

Future planning

• December 2015: invitations to stakeholders, set up collaboration infrastructure

• January 2016: collect requirements and suggestions

• 5 February 2016: Familiarisation Webinar

• February 2016: first draft based on initial analysis and issues raised

• 11 March 2016: first virtual WG meeting to discuss first draft

• 15 April 2016: second meeting; to discuss draft mapping and implementation options

• 6 May 2016: second draft available for review, incorporating comments and further development

• 13 May 2016: third meeting (face-to-face plus Adobe Connect) in Rome; to discuss mapping issues in practice

• End of May 2016: third draft, including full mapping proposal and usage of controlled vocabularies

• Early June 2016: fourth meeting (virtual, Doodle); to agree schedule for public review

• July and August 2016: public review period

• Early September 2016: fifth meeting (virtual); to discuss and resolve public comments received

• Mid-September 2016: publication of StatDCAT-AP version 1

Next meeting 13 May 2016 13:00-14:30 CEST

Find more information: https://joinup.ec.europa.eu/node/149828

Save the date!

Stay tuned at: https://joinup.ec.europa.eu/node/148436

Mapping SDMX to DCATDCAT Catalogue and Category (Topic) Scheme

Linking Catalogue to DCAT Datasets and Category (Topic) Scheme

• A Categorisation links a Category to an SDMX Object (such as a Datflow(DCAT dataset))

• Any one Category can link to many such objects (via multiple Categorisations)

• Any object can link to many Categories (via multiple Categorisations)

SDMX Model for Categorising Objects

SDMX Object

Category Scheme

CategoryCategorisation

Linking Catalogue to DCAT Datasets and Category (Topic) Scheme

Here

• One Category links to all of the Dataflows (Datasets) of the DCAT catalogue

• One category links to the Category Scheme of (DCAT) topics (can link to many of these if required)

Dataflow

Dataflow

Categorisation

Categorisation

Dataflow

Category Scheme(Catalogue)

Category(Dataflows)

Categorisation

Category(Topic Schemes)

CategorisationCategory Scheme

(Topics)

DCAT Catalogue in SDMX

Property URI Rangedataset dcat:dataset dcat:Datasetdescription dct:description rdfs:Literalpublisher dct:publisher foaf:Agenttitle dct:title rdfs:Literal

Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystem

licence dct:licensedct:LicenseDocument

release date dct:issuedrdfs:Literal typed as xsd:date or xsd:dateTime

themesdcat:themeTaxonomy

skos:ConceptScheme

update/ modification date

dct:modifiedrdfs:Literal typed as xsd:date or xsd:dateTime

Mandatory

Recommended

DCAT Catalogue SDMX Structural Metadata (XML)

Here, the Categorisation links the TOPIC_THEMES Category to the Category Schemes of topics

SDMX Category Scheme

SDMX Categorisation

SDMX Categorisation

Here the Categorisations link the DATASET Category to the Datasets

Supported by Organisation-Preferred Extensibility Mechanism

Property URI Rangehomepage foaf:homepage foaf:Documentlanguage dct:language dct:LinguisticSystemlicence dct:license dct:LicenseDocument

release date dct:issuedrdfs:Literal typed as xsd:date or xsd:dateTime

themesdcat:themeTaxonomy

skos:ConceptScheme

update/ modification date

dct:modifiedrdfs:Literal typed as xsd:date or xsd:dateTime

Recommended

Property URI Rangetitle dct:title rdfs:Literal

Property URI Rangepreferred label

skos:prefLabel

rdfs:Literal

DCAT Category Scheme

Mandatory

DCAT Category

Mandatory

SDMX Structural Metadata (XML)

Mapping SDMX to DCAT Category Scheme

Mapping SDMX to DCATDCAT Dataset

Property URI Range

contact point

dcat:contactPoint vcard:Kind

dataset distribution

dcat:distribution dcat:Distribution

keyword/ tag

dcat:keyword rdfs:Literal

publisher dct:publisher foaf:Agent

theme/ category

dcat:theme, subproperty of dct:subject

skos:Concept

Property URI Range

description dct:description rdfs:Literal

title dct:title rdfs:Literal

DCAT Dataset

Mandatory

Recommended

SDMX Structural Metadata (Registry GUI)

SDMX Agency

SDMX Dataflow

SDMX Category

SDMX Provision Agreement

Here is the link from the Category (topic) to the Dataflow (Dataset)

Property URI Range

contact point dcat:contactPoint vcard:Kind

dataset distribution

dcat:distribution dcat:Distribution

keyword/ tag dcat:keyword rdfs:Literal

publisher dct:publisher foaf:Agent

theme/ category

dcat:theme, subproperty of dct:subject

skos:Concept

Recommended

Supported by Organisation-Preferred Extensibility Mechanism

Mapping SDMX to DCATDCAT Distribution

SDMX Model for Provision Agreement (DCAT Distribution)

Data Flow

Data Provider

Data Provider Scheme

Provision Agreement

Registered Data Source

Census by Education and Occupation

http://localhost:8080/FusionRegistry/ws/rest/

DCAT DISTRIBUTION

ESTAT

These are SDMX Classes

Property URI Range

access URL

dcat:accessURL

rdfs:Resource

Property URI Rangedescription dct:description rdfs:Literal

format dct:formatdct:MediaTypeOrExtent

licence dct:licensedct:LicenseDocument

DCAT Distribution

Mandatory

This is the web service

SDMX Structural Metadata (XML)

Provision Agreement

Registered Data Source

Recommended

Supported by Organisation-Preferred Extensibility Mechanism

Recommended

Property URI Range

description dct:description rdfs:Literal

format dct:formatdct:MediaTypeOrExtent

licence dct:license dct:LicenseDocument

top related