eurostat standardisation within the ess: sdmx present and future luxembourg, october 2015 marco...
TRANSCRIPT
![Page 1: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/1.jpg)
Eurostat
Standardisation within the ESS: SDMX present and future
Luxembourg, October 2015
Marco PellegrinoEurostat, Statistical Office of the European Union
![Page 2: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/2.jpg)
Eurostat
Outline
• Evolution of SDMX
• Standards integration- Examples
• Opportunities and challenges- All good standards change
2
![Page 3: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/3.jpg)
Eurostat
3
A model to describe statistical data and metadata
A standard for automated communication from machine to machine
A technology supporting standardised IT tools
A common language for statistics Statisticians agree to use a common description for data and metadata The data exchange process is then driven by this common description Data descriptions are made available for everybody who wants to
understand and reuse the data
SDMX provides
![Page 4: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/4.jpg)
EurostatEurostat
Why do we need a model?
• To define and describe statistical processes in a coherent way
• To standardize process terminology
• To compare and benchmark processes within and between organisations
• To identify synergies between processes
• To inform decisions on systems architectures and organisation of resources
4
![Page 5: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/5.jpg)
EurostatEurostat
5
The SDMXComponents
Describe statistics in a standard way Objects and their relationships
Data Structure Definition (DSD), Concepts, Code List
Central management and standard access SDMX Registry, SDMX Web Services
Cross Domain Concepts Cross Domain Code Lists Statistical Domains Metadata Common Vocabulary
Push Provider generates and sends file to receiver
Pull Provider opens web service to data Receiver downloads regularly
Hub Special case of pull: receiver downloads on end user request
![Page 6: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/6.jpg)
Eurostat
• The same information is needed for exchange between different steps in a statistical production process.
• The use of SDMX throughout the process, in combination with a metadata registry (central storage of definitions, classifications, etc.) makes it more efficient and coherent to implement changes, e.g. in definitions
• Metadata-driven systems
6
Broadening the scope of SDMX
![Page 7: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/7.jpg)
Eurostat
Standard metadata layer for the description and use of data and metadata throughout the process
7
Broadening the scope of SDMX
![Page 8: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/8.jpg)
Eurostat
GSBPM and SDMX: towards a more complete picture
8
![Page 9: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/9.jpg)
Eurostat
SDMX and standards integration
• SDMX promotes an incremental movement towards a data and metadata sharing model with the production of comparable and accurate statistics.
• The increasing use of SDMX:a) improves the quality of the statistical processb) enables simplified exchange and dissemination processes, improving timeliness and accessibility
• Statistical integration goes hand-in-hand with technical integration and standardisation.
9
![Page 10: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/10.jpg)
Eurostat
Building bridges
10
…not walls
![Page 11: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/11.jpg)
Eurostat
11
Building bridges
![Page 12: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/12.jpg)
Eurostat
SDMX and Linked Open Data
• Based on RDF - Resource Description Framework - a family of specifications published by W3C allowing for machine-actionable, semantically rich linking of things found on the Web.
• Main RDF vocabulary for statistical data: → Data Cube VocabularySimplified version of the SDMX model covering data structures
12
https://open-data.europa.eu/en/linked-data
Building bridges
![Page 13: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/13.jpg)
SDMX Data Structure Definition
RDF Data Cube Vocabularyhttp://www.w3.org/TR/2014/REC-vocab-data-cube-20140116
SDMX Data Set structured by
dim
ensio
nality
![Page 14: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/14.jpg)
Latest Version
The RDF Data Cube VocabularyW3C Recommendation 16 January 2014This version:
http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/
14
![Page 15: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/15.jpg)
5 star-schema of Linked Open Data
★ Make your stuff available on the Web (whatever format) under an open license.
★★ Make it available as structured data (e.g., Excel instead of image scan of a table).
★★★ Use non-proprietary formats (e.g., CSV instead of Excel).
★★★★ Use URIs to denote things, so that people can point at your stuff.
★★★★★ Link your data to other data to provide context.
Slide 15
![Page 16: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/16.jpg)
The Data Cube Vocabulary
DataCube is a W3C recommendation, and has gained some momentum
Data producers using SDMX can also publish in the Data Cube Vocabulary (DCV)
As with any other RDF publication, the applications processing the RDF must understand the DCV data model to make sense of the data
Therefore applications wishing to process any additional information added to the DCV triples need to understand the model of the attached data
16
![Page 17: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/17.jpg)
The SDMX Perspective
If you are using SDMX today (GESMES or XML), what does this mean?
Most DataCube implementation today is being done by organizations that don’t use SDMX-ML
For statistical organisations there is an increasing interest in RDF and there is a need to be able to integrate DataCube as an alternative query and delivery sourced originally from existing SDMX-based systems
17
![Page 18: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/18.jpg)
SDMX and RDF: Scenario
RDF File
StatisticalDissemination
System
DataCube Writer
SDMX-ML File
SDMX-ML to RDF Transformer
Either
Or
Using SDMX Component Architecture
SDMXWriter Interface
18
![Page 19: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/19.jpg)
Scenario : Publish RDF triples as flat files
Publish to a server exposed to the web
Packaged in a meaningful way using named graphs• Data by data set• Structures (all in one file or codelists and concepts in one file
and DSDs in another file
Considerations• Needs to be kept up to date (either republish as a replace or as
an incremental update)• Simple Approach but not easily queryable (discovery and
linking tools typically work with SPARQL endpoints)
19
![Page 20: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/20.jpg)
SDMX and RDF: Scenario
Triple Store(DataCube)
StatisticalDissemination
System
RDF Service
SPARQL
SDMX-MLFile
SDMX-ML File to RDF Transformer
Either
Or
DataCube Writer
SDMXWriter Interface
20
![Page 21: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/21.jpg)
Scenario : Populate a SPARQL endpoint
•Deploy RDF triple in a “triple store”• Dedicated database system that natively understands SPARQL
queries• Supported by many RDF tools, some supporting a variety of
flavours of RDF (XML, TURTLE, N-Triples)• Data could be updated at the level of dataflow
Considerations• Good support for linking (the reason for LOD)• Good support for cross dataflow queries• Data with some common dimensions
21
![Page 22: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/22.jpg)
Considerations
• If RDF is treated as a completely separate syntax, then the burden of data management is doubled
• If it is treated as a delivery format (just another data writer) then it is relatively easy to implement
• Up-front cost for tools development
• Low ongoing maintenance
• The benefits of RDF-based technology are realized in a cost-effective manner
22
![Page 23: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/23.jpg)
Eurostat
Data validation “Technical”
- Covered by SDMX today
- Format Check (SDMX-ML)- Codes exist (SDMX DSD)- Codes used correctly(Dataflow & Constraint)
“Statistical Domain”- Not yet covered by SDMX (VTL)
- Value check- Time series- Revisions- Validation expressions
Building bridges
![Page 24: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/24.jpg)
Eurostat
VTL: Validation and Transformation Language
24
Standard language for defining validation and transformation rules• Validation (now)• Transformation (partially now, to be enriched at a later stage)
Main goals• Define and preserve validation and transformation rules • Exchange and share rules• Apply rules in industrialized processes • Apply to several standards (e.g. SDMX, DDI, GSIM) thanks to a
generic information model
![Page 25: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/25.jpg)
Eurostat
DDI is split into 2 branches:• DDI-Codebook (DDI-C): DDI-C is a light-weight version of the standard, intended
primarily to document simple survey data.• DDI-Lifecycle (DDI-L or DDI 3+): DDI-L is designed to document and manage
data across the entire life cycle, from conceptualization to data publication and analysis and beyond. DDI-L is currently being evaluated in several statistical organizations across the world.
The DDI Lifecycle standard provides a data model for describing surveys in a very detailed fashion using XML.• This can support many parts of the process of survey management particularly in
the case of households surveys. E.g. exchange between question banks and data collection applications, generation of collection instruments, …
25
DDI: The Data Documentation Initiative
![Page 26: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/26.jpg)
Eurostat
DDI: The DDI data lifecycle model
26
![Page 27: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/27.jpg)
Eurostat
SDMX and DDI• SDMX can provide:• Metadata describing the structure
of dimensional data• Stand-alone metadata sets
(“reference metadata”)• Formats for dimensional data• A model of data reporting and
dissemination• Standard registry interfaces,
providing a catalogue of resources• Guidelines for deploying standard
web services• A way of describing statistical
processes
27
DDI Lifecycle can provide a very detailed set of metadata, covering:
• Surveys and processing of microdata
• Structure of data files, including hierarchical files and complex relationships
• Archiving of data files and their metadata
• Tabulation and processing of data into tables
• Link between microdata variables and resulting aggregates
Building bridges
![Page 28: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/28.jpg)
Eurostat
SDMX and DDI: similarities and differences
• Both standards use a similar model for identifiable, versionable and maintainable artefacts
• Both standards use “schemes”, as packages for lists of items, and XML “schemas”
• Both standards are designed to support reuse
• DDI has much more detailed metadata at the level of the study domain, and provides more complete descriptions of the processing of data
• SDMX provides more architectural components to support registration, reporting/collecting and exchange, and has a solid information model
28
![Page 29: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/29.jpg)
29
![Page 30: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/30.jpg)
Other relevant standards
Geospatial standards
DDI
SDMX
GSIMConceptual model
Implementationstandards
30
![Page 31: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/31.jpg)
Eurostat
Opportunities and challenges• SDMX is interacting well with other standards (GSIM, DDI,
RDF Linked Open Data, JSON) and this “complementarity” opens us new perspectives for the innovation of statistical processes.
• Common data validation and processing procedures are required (from structural validation to content).
• Better metadata-driven statistical production systems, with the use of standards throughout the processes in combination with a metadata registry.
• Better maintenance and developments of SDMX (e.g. support to use cases, new functions, more formats, etc.) using the wealth of its Information Model.
31
![Page 32: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/32.jpg)
Eurostat
All good standards change
32
Version 1.0
Version 2.0
Version 2.1
September 2004 April 2011 November 2005
Version 2.0
SDMX-EDISDMX-MLSDMX Registry
Version 1.0
GESMES/TS
• Too much change may discourage adoption
But…
• not giving users the functionalities they want would also discourage adoption
![Page 33: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/33.jpg)
Eurostat
Where do we want SDMX to be, in 2020?
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where–” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
“–so long as I get SOMEWHERE,” Alice added as an explanation.
“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”
(Alice’s Adventures in Wonderland, Chapter 6)
33
![Page 34: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/34.jpg)
EurostatEurostat
Where are we?
• Dramatic changes in the environment of official statistics producers (e.g. data deluge)
• Modernization of statistical information system seen as a question of survival for the sector of official statistics
• Standardization viewed as a key enabler for modernization
• Standards-based industrialization of statistical production
34
![Page 35: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/35.jpg)
Eurostat
SDMX 2020
Main challenges for the years to come:• Strengthening implementation • Facilitating data consumption • Supporting statistical process innovation • Enhancing communication • Investing on training and capacity-building
Action Plan
SWG/TWG's work plan
![Page 36: Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfe91a28abf838cb70be/html5/thumbnails/36.jpg)
Eurostat
Thanks for your attention!
36
SDMX present and future
« If you are not sure where you are goingyou will finish someplace else »