eurostat 4. sdmx: main objects for data exchange 1 raynald palmieri eurostat unit b5: “central...
TRANSCRIPT
Eurostat
4. SDMX: Main objects for data exchange
1
Raynald PalmieriEurostatUnit B5: “Central data and metadata services”
SDMX Basics course, 27-29 October 2015
Eurostat
The SDMXComponents
2
Describe statistics in a standard way Objects and their relationships
Data Structure Definition (DSD), Concepts, Code List
Central management and standard access SDMX Registry, SDMX Web Services
Cross Domain Concepts Cross Domain Code Lists Statistical Domains Metadata Common Vocabulary
Push Provider generates and sends file to receiver
Pull Provider opens web service to data Receiver downloads regularly
Hub Special case of pull: receiver downloads on end user request
Eurostat
Who?
What?
When?Who?
Where?How?
What?
Describing the data exchange
Eurostat
Dataflows - classification
4
Sub categories
Statistical Tables = data flows
Category Tourism
Eurostat
SDMX Implementation steps
5
DSD sharing
DataflowsConcepts & Code
lists
SDMX Data Structure Definition
Eurostat
SDMX Implementation steps
6
Provision agreement
Dataflows
Data Structure
Data Provider?
DataflowsDataflows
Table 1Table 2
Table 3
Definition of flows
Definition of table structures
Data Structure
Data Structure
Eurostat
Dataflows - classification
7
Tourism
Capacity
Occupancy
Night_Spent
Arrival_of_residents
Occupancy_rate
DataflowsCategories
Eurostat
Concepts & Codelists : Tourism Example
• What do we want to exchange?• Statistical tables
8
Eurostat
Preparation phaseSDMX Implementation steps
9
DSD sharing
DataflowsConcepts & Code
lists
SDMX Data Structure Definition
Eurostat
I ndicator
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
Number of touristic establishmentsin I taly, annual data
A100Hotels and similar
B010Tourist Campsites
B020Holiday dwellings
2529
Tourism establishmentsItaly Annual data
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
A100Hotels and similar
B010Tourist Campsites
B020Holiday dwellings
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
Number
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
A100Hotels and similar
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
A100Hotels and similar
B010Tourist Campsites
I ndicator
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
Model of the statistical table
Eurostat
• Sources• Existing data set tables
From websiteFrom applications
• Data Collection InstrumentsQuestionnaires/Excel spreadsheets
• Handbooks, User Guides• Database Tables• Existing Data Structure Definitions
From other organisations• Legislation/Regulation
• Identify the Concepts• A concept is a unit of knowledge created by a
unique combination of characteristics (SDMX Information Model)
Model of the statistical table:What do we need to do first?
Eurostat
TIME
COUNTRY
FREQUENCY
TOURISM_ACTIVITY
OBS_VALUE
E
UNIT
TOURISM_INDICATOR
P
Concept Identifier Concept name FormatFREQUENCY Frequency A1COUNTRY Country A2TOURISM_INDICATOR Tourism Indicator AN4TOURISM_ACTIVITY Tourism Activity AN4TIME Time Period N4OBS_VALUE Observation N15UNIT Unit AN2OBS_STATUS Observation status A1
Identifying the concepts
OBS_STATUS
Eurostat
ID CS_TOURISMVersion 1.0Name (English) List of statistical concept for Tourism tables
(French) Liste des concepts statistiques pour les tables TourismeDescription (English) Concept list to be used for all Tourism tables
(French) Liste des concepts valable pout toutes les tables Tourisme
Concept Scheme:
Concept Identifier Concept name Format Code listFREQUENCY Frequency A1 CL_FREQCOUNTRY Country A2 CL_AREATOURISM_INDICATOR Tourism Indicator AN4 CL_TOUR_INDICATORTOURISM_ACTIVITY Tourism Activity AN4 CL_TOUR_ACTIVITYTIME Time Period N4OBS_VALUE Observation N15UNIT Unit AN2 CL_UNITOBS_STATUS Observation status A1 CL_OBS_STATUS
Concept Scheme
Eurostat
Identify/Define Code Lists
• Purpose of a Code List• Constrains the value domain of concepts
when used in a structure like a data structure definition
• Defines a shortened language independent representation of the values
• Gives semantic meaning to the values, possibly in multiple languages
• Agreeing on harmonised code lists is an important aspect of defining a data structure definition
Eurostat
ID: CL_AREA
Version: 1,0
Maintenance Agency: ESTAT
Name: (English) List of géographical ISO codes
Code ID Name (English)AT AustriaBE BelgiumDE GermanyES SpainFR FranceIT ItalyPT Portugal
Code lists
Concepts & Codelists : Tourism Example
Partial code lists can also be exchanged (v2.1).
The content of the partial code list is specified in a
Constraint.
Code list is maintainable SDMX container.
Each code is defined uniquely by an ID, a
maintenance agency, and a version. The name can be
provided in several languages.
15
SDMX Code List
Eurostat
Exercise: Deriving a concept scheme from a table
Concept Identifier Concept name Format Code list
Exercice
Eurostat
Deriving a concept scheme from a table
Concept Identifier Concept name Format Code list
FREQUENCY Frequency A1 CL_FREQUNIT Unit AN2 CL_UNITGEO Country A2 CL_GEONACE_R2 Economic Activity AN11 CL_NACE_R2WASTE Type of Waste AN15 CL_WASTEHAZARD Hazardous Waste AN5 CL_HAZARDOUSOBS_VALUE Observation N15TIME Time Period N4
Proposed solution
Eurostat
Data Set Structure
• Computers need to know the structure of data in terms of:• Dimensionality• Additional metadata• Measures (Observation)• Concepts• Valid content
Code ListsNon coded format (integer, date, text)
Eurostat
Concepts play roles in a Data Structure
• Comprises– Concepts that identify the observation value– Concepts that add additional metadata about
the observation value (as a value or the context of the value)
– Concept that is the observation value– Any of these may be
• coded• text• date/time• number• etc.
Dimensions
Attributes
Measure
Representation
Eurostat
TIME
COUNTRYFREQUENCY
TOURISM_ACTIVITY
OBS_VALUE
P
EOBS_STATUS
DIMENSIONS ATTRIBUTES MEASURES
UNIT
TOURISM_INDICATOR
DERIVING A DATA STRUCTURE FROM A TABLE
Eurostat
ID TOURISM_AVersion 1.0Name (English) Strucutre of the Tourism table
(French) Strucutre de la table TourismeDescription (English) Data Structure Definition for Tourism activity
(French) Définition de la structure de données pour l'activité Touristique
Data Structure Definition:
DATA STRUCTURE DEFINITION
Eurostat
DATA STRUCTURE DEFINITION - Summary
DSDConcept Scheme
Code listsReference Reference
Reference
Eurostat
DATA STRUCTURE DEFINITION - Design
Data Structure Wizard
• Java desktop application• Graphical Interface• For DSD designers• Maintenance of SDMX v2.0/2.1
data and meta data structures• Web service to query/submit
SDMX registries
Eurostat
Publishing DSDs: SDMX Registry
Graphical User
Interface
Web service
Eurostat
Exercise: Consult a DSD
URL Registry ( Test purpose):https://webgate.acceptance.ec.europa.eu/sdmxregistry/
DSD: WASTE_GENER
Eurostat
Exercise: Browse the different objects of the DSD
Codelists:• CL_FREQ• CL_GEO_EUCCEFTA• CL_WASTE• CL_HAZARD• CL_NACE_R2_WASTE
Concept Scheme:• CS_WASTE
DSD:• WASTE_GENER
Eurostat
SDMX Implementation steps
27
DSD sharing
DataflowsConcepts & Code
lists
SDMX Data Structure Definition
Eurostat
DSD Sharing: Tourism Example
28
Table/Concept FREQ
TOU
RISM
_IN
DiCA
TOR
TOU
RISM
_ACT
IVIT
Y
DE
ST
DU
RA
TIO
N
CO
UN
TR
Y
PU
RP
OS
E
TIM
E
TIN
FO
UN
IT
UN
IT_
MU
LT
IPL
IER
OB
S_
ST
AT
US
tour_cap_nat A x x x x x x xtour_cap_bed A x A003 x x x 1000 xtour_dem_toq Q x O x x x x x xtour_dem_exq Q O x x x x x x x
Eurostat
How to achieve DSD sharing? Use of Constraints
The Constraint can define one or both of:• the Codes in a Code List that are applicable
Ex: (A, M, W, Q) -> (A)
• the list of series keys that are applicable
Can be used to constrain the DSD for which a sub set of the DSD content is meaningful. Constraints are usually linked to the dataflows or the provision agreements.
29
FREQ COUNTRY TOURISM_INDICATOR
TOURISM_ACTIVITY
A IT A003 B100
Eurostat
Table/Concept FREQ
TOU
RISM
_IN
DiCA
TOR
TOU
RISM
_ACT
IVIT
Y
DE
ST
DU
RA
TIO
N
CO
UN
TR
Y
PU
RP
OS
E
TIM
E
TIN
FO
UN
IT
UN
IT_
MU
LT
IPL
IER
OB
S_
ST
AT
US
tour_cap_nat A x x x x x x xtour_cap_bed A x A003 x x x 1000 xtour_dem_toq Q x O x x x x x xtour_dem_exq Q O x x x x x x x
Constraints – Example
DSD_TOUR_CAP_XS
DSD_TOUR_DEM_XS30
Eurostat
SDMX Dataset
P
E
DSDDefine the structure
Dataset = XML file describing the table content according to the DSD.
Eurostat
Syntaxes for SDMX datasets
• Based on a common Information Model• SDMX-EDI (GESMES/TS)
EDIFACT syntaxTime-series oriented – One format for Data
Sets• SDMX-ML
XML syntaxDifferent formats for Data SetsEasier validation (XML based)
Eurostat
Equivalent formatsEquivalent formats
Generic SDMX-ML
Cross-sectional SDMX-ML
Compact SDMX-ML
Can be expanded to other formats (e.g. CSV, GESMES)
Can be expanded to other formats (e.g. CSV, GESMES)
Based on the
same IM
Based on the
same IM
SDMX-ML formats Conversions
Eurostat
Element Example id TEST0000 test true truncated false name FISH_AQ_TEST prepared 2010-30-01T09:30:47+01:00 senderid ESTAT sendername Eurostat sendercontactname G. Smith sendercontactdepartment Statistics sendercontactrole Response sendercontacttelephone 0210 2222222 sendercontactfax 0210 00010999 sendercontactx400 sendercontacturi www.sdmx.org sendercontactemail [email protected] receiverid NSI_GB receivername CSO receivercontactname P. Mustermann receivercontactdepartment Statistics receivercontactrole Statistician receivercontacttelephone 02101234567 receivercontactfax 02103810999 receivercontactx400 receivercontacturi www.sdmx.org receivercontactemail [email protected] datasetagency ESTAT datasetid FISH_AQX datasetaction Append extracted 2010-30-01T09:30:47+01:00 reportingbegin 2008-01-01T00:00:00 reportingend 2008-12-31T00:00:00 source DH lang en
SDMX data common header
Eurostat
SDMX 2.0 vs 2.1
Eurostat
Equivalent representations for reporting DatasetsEquivalent representations for reporting Datasets
Version 2.0 Version 2.1
4 data messages, each with a distinct format.
GenericData
CrossSectional DataCompact Data
UtilityData
Therefore, there are now 4 data messages which are based on two general formats:
• GenericData GenericTimeSeriesData
• StructureSpecificData StructureSpecificTimeSeriesData
Phased out
SDMX-ML formats
Eurostat
Data structure Definition (DSD)
• Support for non-time-series data structuresMeasure DimensionDSD
Code lists
Code lists
Code lists
DimensionsAnd
Measure dimension
Attributes
Measures
Concepts
DSD
Version 2.0 Version 2.1
Measure Dimension
Dimensions
Attributes
Primary Measure
Concepts
Concept Scheme
Code lists
Code lists
Concept role explicit element
Eurostat
The same Constraint can be “used” to constrain
multiple objects
Constraint
Version 2.0 Version 2.1
Dataflow
Provision agreement
Constraint
Constraint
Registry Constraint
Dataflow
Provision agreement
DSD
Constraint is embedded in the
object it constrains
Constraint is onlyavailable for use
in a Registry context
Constraint is independently
maintained
Eurostat
Code List
Common
Code listConstraint 1 P
artia
l
DSD DSD
Constraint 2
Version 2.1
Eurostat
Questions