update and thoughts on directions for metadata work
TRANSCRIPT
Update and Thoughts on Directions for Metadata Work
Carol Hert
March 17, 2003
Our Metadata ActivitiesUser study to understand metadata necessary for integration tasks (we’re finding needs for metadata not available in agencies)Ongoing efforts to understand DDI and ISO11179 for deploying in end-user toolsIdentification of host of other relevant standards (open archives, business XML, Z39.50, …)Marked-up tables using DDIAttempting to acquire particular metadata
Metadata Aspects for GovStatConceptual Tasks Determining elements and attributes to be used
in wrapping data and contextual info (an XML DTD presumably)
User study et al. to determine appropriate content “thought” experiments with implementations related to
elements, attributes, and their values Developing conceptual metadata model for SKN
Practical Tasks Finding the actual metadata content to be
“wrapped” via the elements finding data with metadata to port into tools
Today’s Presentation
Focus on the Conceptual TasksStatus report on potentially relevant
standards and projectsConsidering the user tools and the
public intermediary
Start strategizing on directions to pursue further
Concept. Task 1: Identifying Elements, Attributes, and
ValuesCurrent Contenders for Elements, Attributes (and some values)DDI (and its implementations) ISO11179 (and its implementations)Hybrids
Corporate Metadata Repository (CMR) from Oracle
Data cubes for Tables from NESSTAR, DDI
DDI
Data set is the basic elementData archives perspective-designed primarily for people who archive data sets and those who will retrieve and reuse those datasetsDoes capture information on variables, values, etc. Still actively working on specifications for tables (see Ryssevik memo 3/6/2003)
DDI Issues Doesn’t have good mechanism for relating
surveys and instances of those surveys-each data set is considered as stand-alone
Hard to compare across variables and time-series
Elements for tables still in development and other data presentations (such as news releases, graphics) not well developed
Currently working backwards to a conceptual model for the metadata
DDI Implementations of Note
Counting California Virtual Data Center (Harvard/MIT) NESSTAR/FASTER
Developed CRISTAL datacubes and FasterCubes
Minnesota Population Center Developed WendyCubes for data cubes WendyCubes and FasterCubes being merged
Data Ferrett (Census)
ISO11179
from the data producers’ perspective (Dan argues that it doesn’t take any perspective)
Able to relate survey instances, etc.
Isn’t capable of handling the full range of metadata we might need, nor can it handle data representations such as news releases, webpages, etc. (same problem with DDI)
ISO11179 Implementations
StatCanadaDan G. has reservations about this
implementation and feels it doesn’t meet the standard (more as I understand the problem better)
Is CMR the answer?
CMR as a registry to describe data, data processes, data quality and which links to datasets and dataCMR incorporates all of ISO11179, and DDI, in addition can support a variety of metadata types (those news releases)CMR not open source, cost unknown (software cost and Oracle consultants)Two good contacts for us Dan has gotten for BLS Sarah Nusser acquiring for Iowa State
Seque to Conceptual Task 2
My original goal was to determine what metadata elements would be necessary for a given end-user tool (e.g. the SIG) and determine which standard(s) could provide necessary functionality (enabling metadata to get from agencies to the user tools)
I started by looking at the SIG and also at DDI implementations to see what functionalities we could acquire
The Plot Thickens
Two new questions emerged from these activitiesWhat functions/information (data &
metadata) would be necessary in SKNWhat other standards efforts should be
considered in creating the SKN?
The SKN Architecture
Agency with mutliplemetadata
respositories
agency backend data and metadata
agency backend data and metadata
Distributed public intermediary:
variable/concept level, XML-based incorporating
ISO11179 and DDI, providing java-based
statistical literacy tools to user interface
Statistical Ontology
firewall
Domain ExpertsEnd User
Communities
Domain Ontologies
I n
t e r
f a
c e
sU
s e
r
end user
end user
end user
end user
end users: interactwith data frominformation/conceptperspective, not justagency perspective
end user
end user
end user
Agency data with integrated metadata
INTERNAL TO AGENCIES PUBLIC INTERMEDIARY
POSSIBLE SKN USER TOOLS/FUNCTIONS
TRANSFERS
Agency data production
Data archives
standards, projects and their functions
CMR;Proprietary metadata repositories;Presentation formats (html, xml, pdf, etc.);Database formats (ACCESS, ALMIS );DDI Datacubes NESSTAR/Faster CRISTAL;XML for Analysis;Common Warehouse Metadata Model;Statistical disclosure (SDC in Nesstar); StatCan ISO imp.
DDI (and DDI for datacubes) NESSTAR/Faster CRISTAL
Middleware (whatever that includes) NEOOM from Nesstar/Faster From Virtual Data Center (VDC): federated metadata harvesting, repository exchange and caching, federated authentication and authorization, naming
Searching: Z39.50 Data analysis, Bookmarking, Downloading datasets (nesstar);Cataloging, archiving functions (VDC);Online search, data conversion, exploration, data analysis (VDC);Glossary (The Neuchatel Group) Statistical Interactive Glossary (SIG—our project) Ontologies (ISI/Columbia for gas);Relation Browsers;Online Help
Z39.50(used by VDC) Open Archives (VDC) DC, MARC, DDI metadata import and export (VDC) SOAP HTTP RDF (Nesstar) ASN.1
Information/Metadata Needed Task(s) using this metadata
ISO1179 or DDI map
Comments Source of metadata
Term name (s) Search, presentation, and anchor for linking presentation
ISO11179 data element name (if term is a variable or concept)
Agency content,
GovStat ontology
Definition Provide content ISO11179 data element definition (if term is variable)
Agency documentation, statistics experts, statistics texts, GovStatontology
Examples, demonstrations, etc. Provide Content None within ISO11179 or DDI (though data elements in both might be usable)
Values: Audio, video, static text/graphic;Under user control;Links to more specific agency documentation
Agency content supplemented by designer
Context specificity level of definition (e.g. statistic, table, agency)
Provides specific explanations
None within ISO11179 or DDI
Under user control—needs context information
User’s current webpage or table,GovStat ontology
Format Type of presentation None within ISO11179 or DDI
Some terms may be better explained in some formats user control
Current user interaction or preset preferences, computing capabilities,GovStat ontology
New Strategic Direction for Us?
Specification of metadata necessary throughout SKN?Will require specification of interactions
among components of SKNAnd perhaps the specification of specific
standards
An example of a possible interaction
User via interface “I want data on gasoline price indices in the state of MD”
Query transferred to intermediary.
Intermediary query agent has business rule requiring check of terms so forwards the term “indices” to the SIG
Example continued
SIG responds with 3 definitions of index (specificity of definition) and multiple display optionsIntermediary business rule indicates to take most general and to use the term “index” in queries sent to agency data sourcesEtc.
New Strategic Direction for Us?
Specification of functions (and related information) necessary throughout SKN?Will require specification of interactions
among components of SKN (possible queries, acceptable responses, bindings among agents, etc.)
And perhaps the specification of specific standards