slide 1 eurostat unit b3 – statistical information technologies sdmx training for users 29...
Post on 14-Dec-2015
217 Views
Preview:
TRANSCRIPT
Slide 1Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX training session on basic principles, data structure definitions and data file implementation
29 November 2007
Slide 2Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
A - Introduction
Slide 3Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Purpose of the training session
• Provide understanding of the basic SDMX principles (DSD and Dataset Implementation)
• Provide knowledge to the SDMX Standard and its XML implementation
• Present ESTAT tools as case studies illustrating their scope and usage
Slide 4Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Current practices
Current practices on data and metadata exchange:– Legal Framework (Commission Regulations, Council
Regulations, etc.)– Data and metadata files, questionnaires, quality
reports, etc.– Format (paper form, EDIFACT, XML, Structured Files,
etc.)– Media (Email, file upload, Web-form, removable media,
dial-up, etc.)
Slide 5Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
The need for a standard…
• Enhance electronic data and metadata exchange
• Enhance availability of statistical data and metadata information for the users
• Promote interoperability between different systems
• Improve the quality of transmitted data (Timeliness & Punctuality, Accessibility & Clarity, Accuracy, Comparability)
Slide 6Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX (Statistical Data and Metadata eXchange)
Initiative on the standardisation of the statistical data and metadata exchange process.
• 7 Sponsors (BIS, ECB, ESTAT, IMF, OECD, UN, WB)
• “Push” and “pull” mode
• Use of XML technologies to promote interoperability
• Basic principles: Data Structure Definitions (DSD) & Metadata Structure Definitions (MSD)
SDMX registries
Data on the WEB using SDMX
Slide 7Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX (cont.)• Exchange and Sharing of statistical information
– Statistical data
– Statistical metadata Structural metadata
Reference metadata
• Emphasis on macro-data (aggregated statistics)
• Promotes a “data sharing” model – low-cost
– high-quality of transmitted data
– interoperability between (otherwise) incompatible systems
Slide 8Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
B – SDMX Core Elements
SDMX Training29 November 2007
Slide 9Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Year MonthTurnover
index Status Confidentiality2002 January 84.5 actual free2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free
2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free2003 February 88.3 provisional free2003 March 96.1 provisional free
Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)
Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.
(monthly data - Base year: 2000)
EXAMPLE
DATASET1
Slide 10Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Male Female
131 35 16624057 23871 4792829400 28345 577454799 4187 8986
2572350 2683230 52555802562077 2674534 5236611
17423319
10837 10581 214186038 6331 12369
Number1338329283
Rate1.8
Years82.3 75.5 78.3
Data prepared to be transmitted to the European Commission (including EUROSTAT)
Life expectancy at birth
Number of persons
Marriages
Total fertility rate
ImmigrantsEmigrants
Divorces
Population on 01/01/2006Population on 01/01/2005Deaths under 1 yearBirths outside marriage
Statistical adjustmentDeathsBirthsNet migration
SexTotal
Demographic Characteristic
Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (annual,
provisional data - 1st revision).
EXAMPLE
DATASET2
Slide 11Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX Information Model
• The SDMX Information Model (SDMX-IM) is a conceptual model from which syntax specific implementations are developed.
• The SDMX-IM provides for the structuring not only of data, but also of “reference” metadata!
• The model is constructed as a set of structures which assist in the understanding, re-use and maintenance of the model.– Data Structure Definition and Metadata Structure Definition– Dataflows - Datasets – Data Provisioning– …
Slide 12Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Structures in the SDMX-IM
Structure Components
Concept Scheme Concept
Code List Code
Category Scheme Category
Organisation Scheme • Organisation
• Organisation Role
- DataProvider
- DataConsumer
- MaintainanceAgency
Data Structure Definition (DSD) • Dimensions
• Attributes
• Measures
• Groups
Slide 13Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Structures in the SDMX-IM (cont.)
Fundamental parts:1. Structural metadata (DSD, concepts, code lists)2. Observational data (organised set of numeric observations)3. Reference metadata
Definitions:• Data Structure Definition (DSD): set of structural metadata needed
to understand the dataset structure
• Dataflow Definition: a description of the dataset which identifies, categorises and constraints the allowable content of the dataset
• Dataset:– an organised collection of statistical data– the ‘container’ of a Data Flow Definition for an instance of the data.
Slide 14Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Structures in the SDMX-IM (cont.)
• Code lists – Codes: list of predefined values to be used within the DSD– Codelists enumerate a set of values to be used in the representation of several
structural components of SDMX.• Concept Schemes – Concepts: a statistical characteristic used within a
DSD– Additional properties can be defined for concepts:
• Provide Name/Description in various locales• Assign default representation (coded or uncoded)• Define semantic hierarchies of concepts
• Category Schemes – Categories: Category schemes are made up of a hierarchy of categories (subject matter domains), which in SDMX may include any type of useful classification for the organization of data and metadata– A Dataflow may be linked to many Categories
Slide 15Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
DSD components
• Dimension (e.g. frequency, reference area):– Classificatory variable used for identification of subsets or single
observations– Definition of the key descriptor for reporting Datasets
• Attribute (e.g. title, observation status):– Add additional metadata about the observations– Can be attached at four possible levels (Observation, Time Series /
Cross-Sectional data, Group, Data Set)• Measure (e.g. turnover index, # of births, # of deaths):
– Data (uncoded / unclassified) that can be reported (The observation value)
– Primary (Time Series) or Cross-Sectional (Cross-sectional data)• Groups:
– Grouping of dimensions in order to attach group attributes (e.g. sibling group)
Slide 16Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Data Structure Definition
Examples:– Time Series dataset
• STS domain: Turnover Index for Retail Trade and repair DSD
– Cross-Sectional dataset• Demography domain: Rapid questionnaire DSD
Slide 17Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Year MonthTurnover
index Status Confidentiality
2002 January 84.5 actual free
2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free
2003 February 88.3 provisional free2003 March 96.1 provisional free
Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)
Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.
(monthly data - Base year: 2000)
STS Sample Dataset
Dimensions
Measure
Attributes
Dimensions
Slide 18Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
STS DSD componentsDataflow: STSRTD_TURN_M
Concept Concept ID Code List Valuereference period TIME_PERIOD Month/Yearreporting country REF_AREA CL_AREA_EE EL - Greecebase year STS_BASE_YEAR CL_STS_BASE_YEAR 2000type of index STS_INDICATOR CL_STS_INDICATOR TOVV - Turnover deflated (volume of sales)activity STS_ACTIVITY CL_STS_ACTIVITY Retail tradeadjustment ADJUSTMENT CL_ADJUSTMENT No (Neither seasonally or working day adjusted)frequency FREQ CL_FREQ monthly datatitle TITLE Title of the exchanged datasetstatus OBS_STATUS CL_OBS_STATUS actual/provisional dataconfidentiality OBS_CONF CL_OBS_CONF Free (free of publication data)decimals DECIMALS CL_DECIMALS 1 - One
Measures Turnover index OBS_VALUE observations
Groups Time series Set of ordered monthly data (01/02-12/02)
Dimensions
Attributes
Slide 19Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Male FemaleNumber of persons
Statistical adjustment 131 35 166
Deaths 24057 23871 47928
Births 29400 28345 57745
Net migration 4799 4187 8986
Population on 01/01/2006 2572350 2683230 5255580
Population on 01/01/2005 2562077 2674534 5236611
Deaths under 1 year 174
Births outside marriage 23319
Immigrants 10837 10581 21418
Emigrants 6038 6331 12369Number
Divorces 13383
Marriages 29283Rate
Total fertility rate 1.8Years
Life expectancy at birth 82.3 75.5 78.3
Data prepared to be transmitted to the European Commission (including EUROSTAT)
Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (annual provisional
data - 1st revision).
Demographic Characteristic
Sex
Total
Demography Sample DatasetM
ea
su
res
Dimensions
Attributes
Slide 20Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Demography DSD componentsDataflow: DEMOGRAPHY_RQ
Concept Concept ID Codel List Valuesreference period TIME_PERIOD 01-2005 to 12-2005reporting country COUNTRY CL_COUNTRY Finlandsex SEX CL_SEX male / femaledeomographic characteristic DEMO CL_DEMO # of births, # of deaths etc.frequency FREQ CL_FREQ annual datatitle TITLE Title of the exchanged datasetstatus OBS_STATUS CL_OBS_STATUS provisional datareference table TAB_NUM RQFI05V1version REV_NUM 1st revisionStaistical adjustment ADJT number of personsdeaths DEATHST number of personsbirths LBIRTHST number of personsnet migration NETMT number of personspopulation on 01/01/06 PJAN1T number of personspopulation on 01/01/05 PJANT number of personsdeaths under 1 year DEATHUN1 number of personsbirths outside marriage LBIRTHOUT number of personsimmigrants IMMIT number of personsemigrants EMIGT number of personsdivorces DIV pure numbermarriages MAR pure number
total fertility rate TFRNSI decimal indexlife expectancy at birth LEXPNSIT number of years
Groups SectionSet of annual demographic characteristics from FI (01/05-12/05)
Measures
Dimensions
Attributes
Slide 21Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Data Provisioning
• A Data Provider can provide data/metadata for many Dataflows using an agreed data structure.
• Dataflows may incorporate data coming from more than one Data Provider.
• Provision Agreement which data providers are supplying what data to which data flows.
•The Dataflow may be linked to 1 or more Categories (subject matter domains) from different Category Schemes.
Slide 22Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Identification, Versioning & Maintenance
• Identification: every structural element must have a semantic identifier (e.g. CL_UNIT)
• Versioning: a specific element may have different versions (updates of the element)
• Maintenance: some structures must be maintained by an organisation – Unique identification: id+version+agency
• id: CL_UNIT version:1.0 agency: ESTAT• id: CL_UNIT version:1.0 agency: ECB
• Internationalization: the use of multiple languages for describing any element
• SDMX-IM covers aggregate data and metadata in all domains (not domain-specific)
Slide 23Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX High level View
CategoryScheme
Data or Metadata Structure Definition
Category
can have child categories
comprises subject or reporting categories
Data or Metadata
Flow
Data Provider
Provision Agreement
uses specific data/metadata structure
can be linked to categories in multiple category schemes
conforms to business rules of the data/metadata flow
can get data from multiple data providers
can provide data or metadata for many data or metadata flows using agreed data or metadata structure
is registered forRegistered
Data or MetadataSet
Data or Metadata
Set
Slide 24Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Tools Demonstration
Slide 25Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX Registry
• A repository for keeping– Structural metadata (e.g. CodeLists, ConceptSchemes,
DSDs)– Provisioning information (e.g. Dataflows, Provision
agreements)
• Repository is accessible via a Web Service accepting SDMX-ML messages
• Graphical User Interface (GUI) for user interaction over the Web
Slide 26Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Data Structure Wizard
• DSW – “standalone” application (replacing AccessDB tool)
• Main functionalities– Manage data structures (create, modify, delete, query)
– Import/Export SDMX-ML structures (validate structure messages)
– Import/Export GESMES/TS structure files
– Create Data messages
– Query SDMX Registry
– Submit data structures to SDMX Registry
Slide 27Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Example - DSD creation using the DSW
Slide 28Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Example• Dimensions
– Frequency (CL_FREQ) – Reference Area (CL_AREA_EE)– Time period– Product (CL_PRODUCT)
• Attributes– Compilation (uncoded, @group)– Confidentiality (CL_OBS_CONF, @observation)– Status (CL_OBS_STATUS, @observation)– Availability (CL_AVAILABILITY, @series)
• Group
Slide 29Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
C – SDMX-ML Data sets
SDMX Training29 November 2007
Slide 30Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Syntaxes for SDMX data
• Based on a common Information Model– SDMX-EDI (GESMES/TS)
• EDIFACT syntax• Time series oriented – One format for Data Sets
– SDMX-ML• XML syntax• Four different formats for Data Sets• Easier validation (XML based)
• Tools enable us to use the desired format
Slide 31Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX-ML Data Messages
Equivalent representations for reporting Datasets:– Generic message: one schema, not domain-specific– Compact message: format for large-volume
exchange of data, schema is specific to a DSD– Utility message: format for advanced validation,
schema is specific to a DSD– Cross-Sectional message: format for non-time-
series data, schema is specific to a DSD
Slide 32Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
The SDMX-ML Time-Series format
• Used for representing time-series data
• Contain related metadata as defined in DSDs
• Three different (equivalent) representations available– Generic message– Compact message– Utility message
Slide 33Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Generic DatasetTime Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual
p:provisional
Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)
Slide 34Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Compact Dataset
Time Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual
p:provisional
Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)
Slide 35Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Utility Dataset
Time Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual
p:provisional
Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)
Slide 36Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
The SDMX-ML Cross-Sectional data format
• Used for representing non time-series data
• Contain related metadata as defined in DSDs
• Two different representations available– Generic message– Cross-Sectional message
Slide 37Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Cross-Sectional Dataset
Topic Male Female Total
Statistical adjustment 131 35 166
Deaths 24057 23871 47928
Births 29400 28345 57745
Net migration 4799 4187 8986
Population on 01/01/2006 2572350 2683230 5255580
Population on 01/01/2005 2562077 2674534 5236611
Deaths under 1 year 174
Births outside marriage 23319
Immigrants 10837 10581 21418
Emigrants 6038 6331 12369Divorces 13383
Marriages 29283
Total fertility rate 1.8Life expectancy at birth 82.3 75.5 78.3
Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (revised annual provisional data).
Slide 38Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Conversions• Equivalent formats
– Can convert from any SDMX-ML format to another
– Based on the same IM– Exceptions:
• If a Cross-Sectional DSD does NOT contain time dimension
– Conversions:• Between the SDMX-ML formats
• Can be expanded to other formats (e.g. CSV, GESMES)
Slide 39Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
D – Producing SDMX-ML Data sets
SDMX Training29 November 2007
Slide 40Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Reporting and Dissemination Guidelines
• Define and classify all the underlying concepts of a dataset• Provide the specification of the DSD:
– Name & identifier
– List of statistical concepts
– List of metadata concepts
– List of code lists
• Provide the related Dataflows (e.g. STSRTD_TURN_M,
DEMOGRAPHY_RQ)• List the Mandatory attributes (e.g. reference area, frequency), and
the Conditional ones
Slide 41Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Message Implementation Guidelines (MIG)
• Comprises:– DSD details (id, version, agencyID)
– Dimensions (concepts, representations, dimension types -e.g. frequency, entity, count, etc.-, attachment level )
– Measure (primary or cross-sectional)
– Attributes (concept, representation, assignment status –mandatory or conditional-, attachment level, attribute type, attachment measure)
– Groups (subset of dimensions)
Slide 42Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Structure of a MIG document
1. DSD table
2. Dataflows table
3. Referenced concept schemes
4. Referenced Code Lists
5. Detailed explanation of the Generic SDMX-ML sample dataset
6. Detailed explanation of the Compact (or Cross-Sectional) SDMX-ML sample dataset
Slide 43Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Example - Data Set creation using the DSW
Slide 44Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX Converter• Main Functionality
– Reading the input message• parsing of the message • populating the data model of the tool (based on the
SDMX v2.0 information model )– Writing the converted message
• uses the data model to write the output message in the required target format.
• Information retrieved from the Registry– Data flow ID is used to retrieve the data flow definition
from the Registry. – The DSD is retrieved from the data flow definition and is
used to acquire the DSD
Slide 45Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
SDMX Converter (cont.)• Tool utility:
– You may already have data in other format than SDMX-ML (e.g. CSV, GESMES/TS)• CSV Compact SDMX-ML
– You may want further validation of your data• Compact SDMX_ML Utility SDMX_ML
• Conversions:– From CSV to any type– From SDMX-ML to any type– From SDMX-EDI to any type
Slide 46Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007
Conversion Example
top related