fao countrystat model of statistical data and metadata exchange (sdmx) · figure 3: sdmx workflow...

18
FAO CountrySTAT Model of "Statistical Data and Metadata Exchange (SDMX)" A Cross-Country SDMX Implementation in "Union Économique et Monétaire Ouest Africaine (UEMOA)" GTFS/INT/928/ITA Component: "Etude sur le système d’information agricole régional dans l’espace UEMOA" FAO Statistics Division Rome - Italy, 31 August 2006 Disclaimer: The designations employed and the presentation of material in this information product do not imply the expression of any opinion whatsover on the part of the Food and Agriculture Organization of the United Nations concerning the legal or development status of any country, territory, city or area of it's authorities, or concerning the delineation of it's frontiers or boundaries.

Upload: vuhanh

Post on 15-Feb-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

FAO CountrySTAT Model of

"Statistical Data and Metadata Exchange (SDMX)"

A Cross-Country SDMX Implementation in "Union Économique et Monétaire Ouest Africaine

(UEMOA)"

GTFS/INT/928/ITA Component: "Etude sur le système d’information agricole

régional dans l’espace UEMOA"

FAO Statistics Division

Rome - Italy, 31 August 2006

Disclaimer: The designations employed and the presentation of material in this information product do not imply the expression of any opinion whatsover on the part of the Food and Agriculture Organization of the United Nations concerning the legal or development status of any country, territory, city or area of it's authorities, or concerning the delineation of it's frontiers or boundaries.

Contents

1 Introduction............................................................................................................................................. 1 2 Project Goals........................................................................................................................................... 2 3 Initial Scope ............................................................................................................................................ 2 4 SDMX Project Information Flows.......................................................................................................... 3

4.1 CountrySTAT to National Production Server ............................................................................... 4 4.2 National Production Server to FAO SDMX Registry ................................................................... 5 4.3 FAO SDMX Registry Notification to Regional Server ................................................................. 7 4.4 Regional Server to RegionSTAT................................................................................................... 8

5 Project Results ........................................................................................................................................ 9 5.1 Summary of Results ...................................................................................................................... 9 5.2 Demonstration of Results ............................................................................................................ 10

5.2.1 Overview............................................................................................................................ 10 5.2.2 Data at the Regional Level................................................................................................. 10 5.2.3 Data at the Country Level .................................................................................................. 11 5.2.4 New Data at the Regional Level ........................................................................................ 13

6 The CountrySTAT RegionSTAT SDMX Project and Demonstration of Feasibility............................ 15 Figures Figure 1: Member States of the UEMOA Region. ........................................................................................... 2 Figure 2: SDMX Workflow Overview between CountrySTAT, SDMX Registry, RegionSTAT.................... 3 Figure 3: SDMX Workflow at National Server Level...................................................................................... 5 Figure 4: SDMX Workflow between National Level and SDMX Registry ..................................................... 6 Figure 5: SDMX Workflow between SDMX Registry and RegionSTAT ....................................................... 7 Figure 6: SDMX Workflow at Regional Server Level ..................................................................................... 8 Figure 7: Query process to a RegionSTAT matrix – before country update .................................................. 10 Figure 8: Table generated as query result from a RegionSTAT matrix – before country update.................. 11 Figure 9: Query process to a CountrySTAT matrix ....................................................................................... 11 Figure 10: Table generated as query result from a CountrySTATSTAT matrix ............................................ 12 Figure 11: Graphical Control Panel for SDMX Publication – before country update.................................... 12 Figure 12: Graphical Control Panel for SDMX Publication – after country update....................................... 13 Figure 13: Query process to a RegionSTAT matrix – after country update ................................................... 14 Figure 14: Table generated as query result from a RegionSTAT matrix – – after country update................. 14

1 Introduction This document provides a functional overview of the immediate-term project and its first results being undertaken by FAO's CountrySTAT Programme 1 to leverage SDMX 2 formats and registry technology to improve accessibility and timeliness of data reporting from the country level. The programme will potentially involve as many as four countries using CountrySTAT today, and will also involve a regional organization using RegionSTAT. Although this project facilitates exchange through only a small part of the total information chain, it will illustrate how similar approaches and tools could be used and integrated to make data collection easier at the FAO level from both regions and countries as well. The data involved in this project will be a set of straightforward tables, based on existing standard structures as collected and disseminated by FAO today.3

1 http://CountrySTAT.fao.org 2 http://www.sdmx.org 3 http://faostat.fao.org

1

2 Project Goals This project has several concrete goals: (1) To illustrate how SDMX support can be deployed within existing tools to facilitate data collection, with an emphasis on timeliness and quality, while not increasing the reporting burden at the country level. This means that a production prototype needs to be built, which - although limited in scope - provides a realistic example handling real data sets. (2) To facilitate through demonstration of the system an understanding and discussion of issues related to SDMX deployment within the UN family of organizations and other institutions looking at adoption of SDMX4. (3) To develop the basic integration of SDMX into the CountrySTAT and RegionSTAT systems, as well as promoting the capability to handle SDMX-ML formats in the underlying PC Axis Engine5.

3 Initial Scope The initial scope of the project is to demonstrate how the implementation of the revised reporting in CountrySTAT can speed up the movement of the data through the reporting chain in a number of the countries in the UEMOA Region (Union Économique et Monétaire Ouest Africaine6). This region comprises eight states, as illustrated in figure 1.

Figure 1: Member States of the UEMOA Region.

Consequently, in the descriptions that follow, references to CountrySTAT mean the CountrySTAT implementation for one or more of the countries mentioned above, and reference to RegionSTAT means the Regional Statistical Information System for UEMOA. The present FAO SDMX Registry Project 7 is a fine example for development and

4 SDMX is the Statistical Data and Metadata Exchange initiative, sponsored by seven international and regional institutions: the BIS, ECB, Eurostat, IMF, OECD, UN, and the World Bank. It is an initiative which creates and promotes the use of open standards in the realm of statistical data and metadata. The focus of the initiative is on aggregate statistics, which are important to these organisations and their constituents in decision-making and policy setting, as well as for research purposes. 5 http://www.pc-axis.scb.se 6 http://www.uemoa.int 7 FAO Project Nr. GTFS/INT/928/ITA: “Support to Regional Economic Organizations (REOs) for the implementation of their Regional Programmes for Food Security”

2

implementation of methods as regional trust fund project (from May to December 2006) from Cooperazione Italiana8.

4 SDMX Project Information Flows The overall project involves a flow of data between two end-point applications: CountrySTAT, which is the originating application for the reported data, and RegionSTAT, which is the application used by the receiving organization. Between these two end-point applications are national and regional publication servers, and facilitating the communications between these is an instance of an SDMX Registry, termed the "FAO SDMX Registry". The overall information flow can be seen in the diagram below. Note that the reader is assumed to have an understanding at the high level of the current publishing functionality of the two tools.

CountrySTATRegionSTAT

National Publication Server(s)

Regional Publication Server

FAO SDMX Registry

Flow of FAO CountrySTAT-RegionSTAT Implementation

1

23

4

3

Figure 2: SDMX Workflow Overview between CountrySTAT, SDMX Registry, RegionSTAT

Between each of these are sets of exchanges and processes, which are illustrated in more detail below, as numbered from (1) to (4). It should be noted that CountrySTAT and RegionSTAT Systems have client programs using the PX-Web publication engine,

8 The Directorate General for Development Cooperation of the Italian Ministry of Foreign Affairs (DGCS). http://www.esteri.it/eng/4_28_66.asp

3

running on Windows Operating System. The .PX file format9 is native to these systems. The Publication Servers are Windows Internet Information Servers which form an existing part of the CountrySTAT and RegionSTAT Systems - what is proposed here is an extension of their functionality. The FAO SDMX Registry is an implementation of the existing freeware tool from Metadata Technology Ltd10. More details are provided below. For the purposes of this project the publication servers are hosted at Metadata Technology, Ltd. and the SDMX Registry is hosted at Open Data Foundation, to streamline the development process. While we refer to the National Publication Server and the Regional Publication Server here, these may in fact be different physical servers from the ones used to host non-SDMX publication processes for CountrySTAT and RegionSTAT today. Note that each country involved in the project will have a separate instance of the National Publication Server as shown here. There will be only a single Regional Publication Server, because only one region is involved in the pilot project.

4.1 CountrySTAT to National Production Server The interactions between the CountrySTAT client and the National Publication Server are modeled on those of the existing publication process in CountrySTAT. Currently, when the .PX file is ready for publication, a button is clicked which triggers the upload of the .PX file, along with an .ASP page which contains configuration information about how the data are to be published on the website, etc. For the purposes of SDMX-based dissemination, a similar process is used. A button will be added to the CountrySTAT user interface which says “Publish SDMX,” and this will trigger an upload of the associated .PX file to the National Publication Server, along with an .ASP file containing necessary configuration information. (It is possible that .ASP is more than what is needed to support this scenario, in which case a simpler type of configuration format may be substituted.)

9 http://www.pc-axis.scb.se/document/ESYSPX2005.DOC 10 http://www.metadatatechnology.com/tools.html

4

Figure 3: SDMX Workflow at National Server Level

It should be noted that the work within CountrySTAT to prepare the data will have to be conformant with a standard template, reflecting the agreed structural metadata for the selected tables. For this project, only the validation existing within CountrySTAT will be used on the client side. This is an area where improvements could be made in future, but are outside the scope of this project. Validation of the data on the National Publication Server will be discussed in the next section.

4.2 National Production Server to FAO SDMX Registry Once the .PX file and .ASP configuration information for SDMX Publication have been uploaded to the National Publication Server, a series of processes will take place.

5

National Publication Server

FAOSDMX Registry

• Transform .PX file to SDMX-ML• Validate data file against SDMX structural metadata• Publish (SDMX-ML) fileto website• Register SDMX-ML file at FAO SDMX Registry

(publish SDMX-ML data) (register

data)2

Figure 4: SDMX Workflow between National Level and SDMX Registry

The first process is transformation from the .PX format to an SDMX-ML11 format based on an agreed set of structural metadata for the tables being published. The SDMX-ML format for this project will be what is termed in SDMX the “Generic Data” format. The structural metadata will be expressed as an SDMX Key Family (or Data Structure Definition) which will be available to the processor. If transformation into SDMX-ML fails, then a notification will be sent to the user via e-mail, requesting that a corrected .PX file be submitted. Once the .PX file has been transformed into SDMX-ML, it will be subject to a validation process which guarantees that it can be successfully registered at the FAO SDMX Registry. This is known in SDMX terminology as “Utility Validation”, and will leverage already-existing freeware tools, available as part of the package from Metadata Technologies, Ltd., mentioned above. If this validation fails, then an e-mail will be sent to the user explaining the reason for failure and requesting that a corrected .PX file be submitted. In future, the level of validation could be increased to provide more meaningful content checks on the data to be reported. For the purposes of this project, however, validation will be limited to making sure that all the data is valid according to the agreed structural metadata. If the data passes validation, then the processor will publish the SDMX-ML file for the data to a public location on the National Publication Server.

11 http://www.sdmx.org/standards/Developer_1_0.aspx

6

For the purposes of this project, the Open Data Foundation12 will host the FAO SDMX Registry.

4.3 FAO SDMX Registry Notification to Regional Server The registration of the published data set will trigger a notification to be sent to the Regional Publication Server. This notification is performed using an RSS (Really Simple Syndication) mechanism. There will be an SDMX Agent on the Regional Publication Server, which will invoke the RSS feed and process the resulting XML file. When the agent identifies that new data sets have been registered, it will trigger a series of processes as described below.

3

Regional Publication Server

FAOSDMX Registry

National Publication Server

(notification ofregistration)

(SDMX-ML file pulled from website)

Agent

Agent Functions•Queries registry for relevant information (e.g.new data sets registered and the URLs)•Retrieves data set from National Server

Figure 5: SDMX Workflow between SDMX Registry and RegionSTAT

First, it will retrieve the published SDMX-ML data file from the National Publication Server of the registering country, and store this in a known location on the Regional Publication Server in its SDMX-ML form. These data files are persisted versions of the registered SDMX-ML files, which are updated every time any of the countries registers a new SDMX-ML file. They function as a cache of data used only to support the

12 http://www.opendatafoundation.org

7

aggregation processes on the Regional Publication Server, and are visible only to the aggregation process described in the next step. Once collected, the aggregation and publication processes described in the next step will be triggered.

4.4 Regional Server to RegionSTAT The first process in this step is the aggregation of all data files of a similar type within the region into a single SDMX-ML file, with a rough aggregate calculated where possible, and included in the SDMX-ML file. This will produce totals for the region. This SDMX-ML file is then subject to a transformation into .PX format. The resulting .PX file is copied to a known location on the Regional Publication Server, where it is accessible to the user, and can be addressed with a URL.

RegionSTAT

Regional Publication Server

4

Agent Functions•Multiple national files are aggregated (from SDMX-ML)•SDMX-ML file transformed to .PX•Production process is run, with outputs published to staging server• E-mail notification, with links, sent to RegionSTAToperator

RegionSTAToperator works on file, and publishes

(E-mail notification)

Agent

Figure 6: SDMX Workflow at Regional Server Level

Once the publication process to produce the draft website, based on the new data, is complete, an e-mail is sent to the RegionSTAT user, containing the URLs of the new .PX file, and the URLs of the draft HTML produced. This allows the RegionSTAT user to check the numbers and make any needed corrections, etc., before the data is published to the public website, etc.

8

5 Project Results

5.1 Summary of Results The FAO CountrySTAT SDMX pilot project proved that the reporting from country to region can be achieved automatically and with no additional burden on the reporting country. In addition, as the FAO SDMX Registry (known here as the “FAO Registry” or simply “Registry”) is at the heart of the upward reporting chain, any organization that is allowed access to the data published by the country, can access the data in the same way as RegionSTAT does and with no additional reporting burden at the country level. The project used the FAO Registry as the core of the reporting system, and it is probable that the extent to which the Registry was integrated into the workflow is, at this point in time, breaking new boundaries in SDMX implementations. The Registry was not only used for data set registration and query, but the Data Structure Definitions made available by the Registry are used to support transformations between the native CountrySTAT files and the SDMX-ML, and to support validation of the content of the transformed CountrySTAT file. The FAO SDMX Registry was used to support the following functions. SDMX Registry set up - Store structural metadata in terms of the data structure definition for the agricultural

production quantities and acreage, and corresponding code lists. - Store provisioning metadata to support the data set registration process (i.e. who

(which organization) reports what data (which data flow))

The SDMX Publish Application - Transformation of the CountrySTAT file to SDMX-ML using the Data Structure

Definition in the Registry – this transformation process can support any CountrySTAT file providing the Concept names in the CountrySTAT are the same as those in the Data Structure Definition

- Transformation of codes between different coding schemes if required using the structural metadata derived from the Registry

- Validation of the data set contents using the structural metadata - Automated data set registration direct from the publishing application (this is new

feature built into the FAO Registry conformant with SDMX Registry architecture) The RegionSTAT Agent - Notification of data set registrations via RSS - Transformation of SDMX-ML file to RegionSTAT file format using the Data

Structure Definition derived from the Registry (in particular, to supply the code descriptions for coded dimensions and attributes)

- Aggregating the data into a regional table for data sets conforming to the same Data Structure Definition

9

The project also broke new ground in providing a robust and flexible integration between the underlying system used by CountrySTAT, PC-Axis, and the SDMX standards. The transformations developed cover both transformation from PC-Axis format to SDMX-ML and from SDMX-ML to PC-Axis format. PC-Axis is used by many National Statistical Offices13.

5.2 Demonstration of Results

5.2.1 Overview The most typical workflow is for a country to amend data or to add new data. For the purpose of the demonstration a complete new set of data from a country was added to the regional system, because, although this is not a typical scenario, it is the scenario that gives the clearest visibility of the changes at the regional level, and works in a way that is identical to more normal scenario of publishing new data for existing data sets.

5.2.2 Data at the Regional Level The screen shots below show the data held at the regional level for three countries in the UEMOA region, together with UEMOA totals.

Figure 7: Query process to a RegionSTAT matrix – before country update

Making some selection reveals the following data held at the regional level.

13 http://www.pc-axis.scb.se/PXwebsites.asp

10

Figure 8: Table generated as query result from a RegionSTAT matrix – before country update

5.2.3 Data at the Country Level A country publishes data to the web using a “Publish” function in the CountrySTAT system. This creates a website for their data. The screen shot below shows some data for Senegal.

Figure 9: Query process to a CountrySTAT matrix

11

Figure 10: Table generated as query result from a CountrySTATSTAT matrix

For the country to publish the data in a data sharing scenario so that it is automatically harvested by RegionSTAT it is as simple as clicking on a button – in an way identical to that used for publishing the data to the web as a web site.

Figure 11: Graphical Control Panel for SDMX Publication – before country update

The “Publish SDMX” button invokes the transformation from the CountrySTAT format to the SDMX-ML format, validation of the file in terms of the Data Structure Definition (e.g. correct use of coding systems), and the registration of the data set in the FAO Statistical Registry. The system responds with the success or otherwise of these operations. In order for the user to be aware of activity and also for general information, there are two status windows: one showing the relevant contents from the FAO Registry concerning all

12

the data sets registered for the region; the other showing the RegionSTAT contents, together with the last update date. Users will see the contents of this window change when the SDMX published data has reached RegionSTAT.

Figure 12: Graphical Control Panel for SDMX Publication – after country update

5.2.4 New Data at the Regional Level The RegionSTAT agent is informed of new data set registration by using an RSS (Really Simple Syndication) mechanism, which is a simple way of making a query on a web resource. Whilst this mechanism is not a part of the SDMX Registry standards, it is commonly used to provide simple interfaces to web services, and is sufficient for the demonstration project. If new data sets have been registered these are retrieved from the web location and integrated into the RegionSTAT system, including the creation of regional aggregates. The updated RegionSTAT now shows the addition of the Senegal data.

13

Figure 13: Query process to a RegionSTAT matrix – after country update

Making the same selections as before reveals the new regional view of the data.

Figure 14: Table generated as query result from a RegionSTAT matrix – after country update

14

6 The CountrySTAT RegionSTAT SDMX Project and Demonstration of Feasibility

It should be noted that although this CountrySTAT RegionSTAT SDMX Project only covers exchanges between the reporting country and a regional data collector, the same basic implementation could be used to report data directly to the FAO from either the country level or the region, or even between ministries at the country level. The basic strategy here is to use the existence of tools to facilitate data reporting – in this case, CountrySTAT and PC-Axis – to easily leverage the standards-based framework offered by SDMX Registries and XML formats. Because these are standard, other applications which understand the standards could also leverage the data and structural metadata provided as part of this project. This would, of course, be based on these applications/organizations being given permission to access the reported data. Subject to the availability of additional resources, these areas could be explored further, so that CountrySTAT can contribute significantly to handling and publishing food and agriculture statistics by bringing together national statistical data producers under one framework umbrella. Policy makers and analysts, who are constantly in need of reliable statistical information and primary information, can go to a central place for their data requirements and read it in their national languages Because CountrySTAT fully complies with international standards and classifications, national statistics can be fully integrated with statistics from other national and international databases like FAOSTAT. FAO CountrySTAT Contacts

- FAO Statistics Division Director: Haluk Kasnakoglu +39.06570-53827 [email protected]

- FAO Statistics Division Senior Statistician: Naman Keita +39.06570-53827 [email protected]

- FAO CountrySTAT Programme Manager: Kafkas Caprazli +39.06570-54916 [email protected]

- FAO CountrySTAT Website http://www.fao.org/CountrySTAT

- FAO CountrySTAT Programme email: [email protected]

15

FAO Job Number: AH465/E

ftp://ftp.fao.org/docrep/fao/009/ah465e/ah465e00.pdf