standards landscape for micro and aggregated data › sdd › sdmx 2013 session 4.5 -...
TRANSCRIPT
Standards landscape for micro and aggregated data
How the standards-based industrialization of statistical production
fits into the picture (SDMX, DDI, GSBPM, GSIM,…)
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
1
Marco Pellegrino, Eurostat
Outline
1. Where are we?
2. The landscape: a portfolio of used standards
3. SDMX and DDI: a set of use cases
4. Conclusions and way forward
2 11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
Where are we?
Dramatic changes in the environment of official statistics producers
(e.g. data deluge)
Modernization of statistical information system seen as a question
of survival for the sector of official statistics
Standardization viewed as a key enabler for modernization
"Standards-based” industrialization of statistical production
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
3
Standardization
21 June 2013 4
• Without a standardized concept of statistical production,
we will not see:
– Economies of scale across statistical institutes internationally -
shared solutions
– Good vendor support for the industry
– Harmonization of statistical data (leading to more comparable
data)
– Reusable, interoperable data for users
• Some major standards have emerged:
– Statistical Data and Metadata Exchange (SDMX)
– Data Documentation Initiative (DDI)
– RDF, Linked Open Data (LOD)
A portfolio of standards
SDMX Preferred standard for exchange and sharing of data and metadata in the global
statistical community (UNSC, 2008) – Widely used in the ESS for aggregated data
DDI: Data Documentation Initiative Standard for the documentation of data, initially focused on archiving micro-data in
the area of social sciences – widely used in national data archives – extended to
support the full life-cycle of data
RDF W3C standard for web-based discovery, dissemination, and linking – an
alternative to XML
RDF vocabularies based on SDMX (data cube), DDI, and the Neuchatel
classification model
JSON Web-developer-friendly alternative to XML. JSON version of SDMX
XBRL Standard for reporting accounting information and banking supervision data
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
5
Characterizing the Standards: SDMX
• Describes the structure of aggregate/dimensional data
(“structural metadata”)
• Provides formats for the dimensional data
• Provides a model of data reporting and dissemination
• Provides a way of describing and formatting stand-alone
metadata sets (“reference metadata”)
• Provides standard registry interfaces, providing a
catalogue of resources
• Provides guidelines for deploying standard web services
for SDMX resources
• Provides a way of describing statistical processes
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
6
Characterizing the Standards: DDI
DDI Lifecycle can provide a very detailed set of metadata
covering:
– The study or series of studies
– Many aspects of data collection, including surveys and
processing of microdata
– The structure of data files, including hierarchical files and
those with complex relationships
– The lifecycle events and archiving of data files and their
metadata
– The tabulation and processing of data into tables (Ncubes)
• Allows for a link between the microdata variables and
the resulting aggregates
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
7
Characterizing the Standards: RDF
• Allows for after-the-fact linking of any types of resources on the Web – Data can be linked without the one knowing of the other’s
existence
– Linking press releases and speeches with relevant data from a statistical organization
• Based on “triples” of subject, predicate, object enabling data to be linked
• Powerful querying language for distributed searches on the web
• Very popular with “open data” and “open government” initiatives
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
8
Characterizing the Standards: XBRL
• XML-based standard using linked taxonomies of various
types
• No formal model
– Communities standardise the taxonomies to support their needs
– Mapping to other models requires an understanding of the
implied model of the community
• Good tools are required to hide this complexity
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
9
SDMX and DDI together
• People have been discussing the use of SDMX and DDI
together for some time (many technical similarities)
• Now, we are at the stage where implementations are
being investigated and prototyped
• This is done in the context of the Generic Statistical
Business Process Model (GSBPM)
– Idea of “industrialized” statistical production
– Strong emphasis on process management
DDI DDI SDMX SDMX
SDMX
SDMX-DDI dialogue
Launched in 2010 with 3 goals:
To avoid duplication of efforts and thus avoid confusion about which
standards should be used for specific types of applications
To provide reassurance to the user communities of DDI and SDMX
that the end-to-end statistical process can be managed, and that
standards bodies are considering the needs of users
To provide specific technical guidance about the use cases and
implementation of the standards for specific purposes
Endorsed by DDI Alliance and SDMX Sponsors / Secretariat
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
12
Analysis of use cases for SDMX and DDI
Set of use cases where the two standards are compared:
1. Survey data collection
2. Administrative and register data
3. Combined use of DDI and SDMX
4. Micro-data access and on-demand tabulation of micro-data
5. Metadata and quality reporting
SDMX experts (TWG) and national experts involved
E.S.S. Cross-cutting Project on Information Models and Standards (IMS)
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
13
DDI offers a very rich model
for the documentation of
micro-data
SDMX offers a very
integrated exchange
platform for statistical
outputs (IT architectures,
tools, web services)
DDI and SDMX
The combined use of both standards could allow a higher level of integration of the complete production process
But: The devil is in the detail!
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
14
Generic Statistical Information Model (GSIM)
Common Generic
lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
conce
ptu
al
pra
ctic
al
Common Generic
lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
conce
ptu
al
pra
ctic
al
SDMX, DDI, RDF,
ISO-11179, etc.
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
15
Other relevant standards
DDI SDMX
GSIM Conceptual model
Implementation
standards
11-13 September 2013 SDMX Global Conference 2013
OECD, Paris
16
Summary
• To enable a modernized statistical production, standards
are the key
• Standards at different levels are being used in an
increasingly coherent way
• GSBPM and GSIM provide conceptual models and
facilitate communication
• SDMX, DDI and other standards provide implementation
models which can be used in a coordinated way
• There are now more technologies than just GESMES and
XML: a coherent overall model is critical
17 11-13 September 2013 SDMX Global Conference 2013
OECD, Paris