combining metadata standards: approaches and benefits

29
Combining Metadata Standards: Approaches and Benefits Arofan Gregory Open Data Foundation

Upload: shamus

Post on 09-Feb-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Combining Metadata Standards: Approaches and Benefits. Arofan Gregory Open Data Foundation. Overview. Recent events of interest The Standards: Comparison and Explanation Emerging Implementation Approaches DDI and SDMX SDMX and the Semantic Web Technologies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Combining Metadata Standards: Approaches and Benefits

Combining Metadata Standards: Approaches and Benefits

Arofan GregoryOpen Data Foundation

Page 2: Combining Metadata Standards: Approaches and Benefits

Overview

• Recent events of interest• The Standards: Comparison and

Explanation• Emerging Implementation Approaches

– DDI and SDMX– SDMX and the Semantic Web Technologies– Classifications & Multiple Standards

• Ideas about Future Work

Page 3: Combining Metadata Standards: Approaches and Benefits

Recent Events of InterestNote: Some of these events/implementations

have been or will be described in detail in other papers – they are only mentioned here.

• Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop)– SDMX 2.0 – DDI 3 field-level mapping work

started– Topic: DDI and the Semantic Web???

Page 4: Combining Metadata Standards: Approaches and Benefits

Recent Events of Interest (2)• Semantic Web and SDMX

– ONS hosted 2-day meeting in the UK, February 2009 (produced draft “SDMX-RDF”)

– Banca d’Italia has a prototype project– New project launched at University of Tillburg in the

Netherlands (RDF expression of OECD SDMX data)• Australian Bureau of Statistics (ABS) starts

looking at SDMX and DDI to support data production lifecycle– Prototype implementations– Some other NSIs also very interested

Page 5: Combining Metadata Standards: Approaches and Benefits

Recent Events of Interest (3)

• Classifications and ISO/IEC 11179– Australia: Government agencies looking to

exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI

– Statistics Canada: Evaluation of IMDB (ISO/IEC 11179-based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)

Page 6: Combining Metadata Standards: Approaches and Benefits

What Does This Mean?

• Not a complete list of events/implementations, but…

• Indicates the interest we are seeing in the combined use of standards!– These are not just experiments!– Organizations are looking at implementation

in a serious way now

Page 7: Combining Metadata Standards: Approaches and Benefits

Characterizing the Standards• SDMX:

– Data structures and formats– Reference metadata structures and formats– Web-services architecture based on registry services– Content-oriented gudelines

• ISO/IEC 11179:– Model for managing concepts and data elements– Metadata registries and lifecycle

• ISO 19115:– Standard metadata model for geographies– Used by DDI as geographical model

Page 8: Combining Metadata Standards: Approaches and Benefits

Characterizing the Standards (2)

• Dublin Core:– Citation metadata– Widely used in the Semantic Web– Used natively by DDI for citations

• Semantic Web/ “Linked Data” / RDF– See “Open Issues on the Semantic Web”

• DDI 3:– Will give more detail, as it is not as familiar to

the METIS community…

Page 9: Combining Metadata Standards: Approaches and Benefits

Characterizing the Standards (3)

• DDI 1.*/2.* was a standard used by archives and data libraries– Based on a “codebook” model– Used by some NSIs, especially in the developing world because

of the IHSN Metadata Management Toolkit– Used by the European network of data archives, CESSDA– Used by many data archives in North America

• Documentation of a single “Study” (survey)– Designed to help researchers find and use microdata

• DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle

Page 10: Combining Metadata Standards: Approaches and Benefits

DDI 3 Lifecycle Model

Notice: This is very like a high-level view of the METIS model!

Page 11: Combining Metadata Standards: Approaches and Benefits

Characterizing the Standards (4)

• DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle– Focus is on upstream metadata capture and reuse

• Describes tabulation/aggregation of microdata• Provides support for comparison across surveys,

detailed geography, data processing, register data

• Aggregate “NCube” model aligned with SDMX• No architecture/web services support (yet)

Page 12: Combining Metadata Standards: Approaches and Benefits

An Observation…

• It is easy to say that two standards are “aligned”– Many of these standards were intentionally

aligned as they were developed• It is much more difficult to understand how

to use them in combination effectively…

Page 13: Combining Metadata Standards: Approaches and Benefits

Approaches and Benefits

• SDMX and DDI– DDI microdata production/SDMX aggregate

dissemination– Using SDMX data in DDI-based systems (combining

aggregates and microdata)– Combined SDMX/DDI supporting the entire data

lifecycle– DDI register data reported to SDMX collection system

• SDMX and the Semantic Web• Classifications and the Standards

Page 14: Combining Metadata Standards: Approaches and Benefits

Inputdata

Surveys

RegistersCleaning, editing,estimation, aggregation,etc.

Disseminationdata

DDI 3 Metadata

Website/Web Service

SDMX-MLData, Metadata, Structure

Page 15: Combining Metadata Standards: Approaches and Benefits

DDI – SDMX: Benefits

• The benefits of this approach are those found by using the standards generally– Supports “metadata-driven” system for data

production throughout the lifecycle (DDI)– Metadata-rich dissemination format, preferred

by data collectors (SDMX)– Shared tools; SDMX registry services, Web

Services for discovery and use of aggregates

Page 16: Combining Metadata Standards: Approaches and Benefits

SDMX – DDI: Integrating Aggregates and Microdata

• Scenario is common in some research– Economic data is often only available as

aggregates– Challenge is to combine aggregates and other

microdata

Page 17: Combining Metadata Standards: Approaches and Benefits

SDMX Web Service

Data archive/repository

Surveys

Registers

(DDI 3)

(DDI 3)

SDMX-to-DDI 3 Transform

Processing to produceIntegrated data and Metadata (DDI 3)

Page 18: Combining Metadata Standards: Approaches and Benefits

SDMX – DDI: Benefits

• Allows for easy use of official statistics by researchers– Solves problems of combining aggregates

and microdata• Note: This does not involve dis-

aggregation of published data– Structural transformation only, to allow DDI 3

systems to process aggregates easily

Page 19: Combining Metadata Standards: Approaches and Benefits

DDI + SDMX: The Data Lifecycle

• Uses a metadata model capable of expression as either SDMX or DDI, depending

• Provides support for process management– Uses many features of SDMX (process

model, structure sets, reporting taxonomies, etc.)

• Uses SDMX architecture/services model– Designed to allow incorporation of other

standards

Page 20: Combining Metadata Standards: Approaches and Benefits

Process-management system

SDMX Registry

Data and metadata repositories/application databases

Input datastore

Dissemination data store

Surveys

Registers

(DDI 3)

(DDI 3)

All registry interactions use SDMX

(BPML)

(SDMX)

Web site/Print/Web Services

(SDMX, DDI, etc.)

Interactions between systems are DDI orSDMX Web Services,as appropriate

Page 21: Combining Metadata Standards: Approaches and Benefits

SDMX + DDI: Benefits

• Leverages Web-Services technologies (registry, event triggers, etc.) for efficient automation, migration, flexibility

• Choice of tools is broad– Use the “best” format for any given task

• All the benefits of DDI-SDMX case• Good support for process management as

well as data management

Page 22: Combining Metadata Standards: Approaches and Benefits

SDMX and the Semantic Web Technologies

• Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc.)

• Note that Semantic Web technologies only apply to dissemination– Not designed to support data production

• Terms:– “Raw data” in an SW context does not mean “raw

data”– “Data” in an SW context means “anything that can be

described using RDF” – not numeric data

Page 23: Combining Metadata Standards: Approaches and Benefits

Assumptions

• Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms)

• Implementation of an “SDMX-RDF” in standard SDMX dissemination packages

Page 24: Combining Metadata Standards: Approaches and Benefits

Dissemination data store (SDMX)

(SDMX-driven production system)

SDMX Web Service

Internal (production environment) External (dissemination to Web)

(SDMX-ML)

“SDMX-RDF”Transform Triplestore

(SDMX-RDF)

(SPARQLQueries)(RDF)

Page 25: Combining Metadata Standards: Approaches and Benefits

SDMX and the Semantic Web: Benefits

• Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc.

• Uses existing standards/models and best practices to do “heavy lifting” (data production)

• Puts a lot of reliable, quality data into the “Linked Data Web”– Helps address issues of provenance

Page 26: Combining Metadata Standards: Approaches and Benefits

Warning

• RDF is verbose!• 4.5 Megs of GESMES/TS = 45 Megs of

“compact” SDMX-ML XML = 420 Megs of RDF triples

• This may encourage the on-demand production of RDF data from web services, rather than static files

Page 27: Combining Metadata Standards: Approaches and Benefits

Standards and Classifications

• Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI)– This is an easy thing to do– It is very useful: promotes re-use,

comparability, etc.– Could apply to Semantic Web RDF

expressions as well as XML-based standards

Page 28: Combining Metadata Standards: Approaches and Benefits

Ideas for Future Work• Endorse SDMX – DDI mappings now being

produced• Develop an “SDMX-RDF” (?) or…• Develop a harmonized statistical model for

expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (?)– Encourage tools developers to implement it in

standard dissemination packages• Publish standard classifications in standard

formats

Page 29: Combining Metadata Standards: Approaches and Benefits

Summary

• Combined use of standards is becoming a reality

• Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users