Download - The Earth System Grid (ESG)
The Earth System Grid (ESG)
METADATA SCHEMAS IN ESG
DOE SciDAC ESG Project Review
Argonne National Laboratory, Illinois
May 8-9, 2003
May 8, 2003 Earth System Grid 2
Introduction
• ESG initial focus is on climate model data, particularly PCM/CCSM data (netCDF format).
• Consequently, our work so far has concentrated upon developing or evaluating metadata schemas suited for this kind of data, specifically: “ESG schema” for expressing collection-level metadata NcML schema for file-level metadata THREDDS schema for data cataloguing and browsing
Part I
ESG Schema
May 8, 2003 Earth System Grid 4
ESG schema: history
• Purposedly developed by ESG to fulfill the specific needs of the PCM/CCSM modeling community (through ESG liason Gary Strand)
• Several other standards were evaluated before developing our own, none of them was found to be completely satisfactory: Dublin Core (not rich enough for scientific data) ISO (too complex to be imposed on data providers), CLRC and DIF (almost ok, but not flexible enough to allow
capturing some details that are important to PCM/CCSM).
• Initial draft developed in conjunction with UK eScience office, still collaborating towards common schema or interoperability
May 8, 2003 Earth System Grid 5
ESG schema: requirements
Information that needed to be captured in the metadata:• Model run description (including run scenario and time period)• Model configuration notes
Active/inactive components (atmosphere, ocean, ice) Pointers to documentation of model components (usually on the
web). Input forcing datasets (which ozone dataset, sulfate dataset, etc.) At what site the model binary was built, perhaps even the compiler
options that were used. Site where the model was run. Persons that carried out the model integration and submission
• Related model experiments - VERY IMPORTANT! "Sibling" runs (for ensembles of runs) "Parent" run (the run from which this particular experiment started) "Child" runs (runs descended from this run)
May 8, 2003 Earth System Grid 6
ESG schema: requirements
• References to visualizations (MPGs and so on) using this model data.• References to to published journal articles/papers/presentations that
have used this experiment's data.• Miscallenous notes• Aknowledgment of funding agencies
May 8, 2003 Earth System Grid 7
ESG schema: description
• Expresses collection-level metadata, i.e. logical metadata that describes a set of logically related data files (for example, a model run).
• Developed following an object model: we defined objects with properties, inheritance between objects, and relations between objects (see following slide)
• Although developed specifically for modeled data, it could be easily extended to express observational, experimental and analysis data.
• Metadata encoded in XML, conforming to an XML schema definition document (metadata syntax)
• XML metadata may be stored directly in an XML native database (Apache Xindice), or may be shredded and stored in a relational database (MySQL) within a set of purposedly defined tables.
• Currently developing API for I/O of ESG metadata as XML to/from a transparent database backend
Object[1] id
Object[1] id
Activity[0,1] name[0,1] description[0,1] rights[0,n] date type= encoding=[0,n] note[0,n] participant role=[0,n] reference uri=
Activity[0,1] name[0,1] description[0,1] rights[0,n] date type= encoding=[0,n] note[0,n] participant role=[0,n] reference uri=
isA
Investigation
Investigation
isA
Project[0,n] topic type=[0,1] funding
Project[0,n] topic type=[0,1] funding
isA
Ensemble
Ensemble
Campaign
Campaign
isPartOf
Simulation[0,n] simulationInput type=[0,n] simulationHardware
Simulation[0,n] simulationInput type=[0,n] simulationHardware
Observation
Observation
Experiment
Experiment
Analysis
Analysis
Dataset[0,1] type[0,1] conventions[0,n] date type= encoding=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
Dataset[0,1] type[0,1] conventions[0,n] date type= encoding=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
isA
generatedBy
isPartOf
Person[0,1] firstName[0,1] lastName[0,1] contact
Person[0,1] firstName[0,1] lastName[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
isA
worksFor
participant role=
Class
Class
AbstractClass
AbstractClass
inheritanceassociation
LEGEND
Service[0,1] name[0,1] description
Service[0,1] name[0,1] description
serviceRef
ParameterList
ParameterList
hasParameters
Parameter[1] name[0,1] mapping authority=
Parameter[1] name[0,1] mapping authority=
hasParameter
activityRef
isDerivedFrom
Part II
NcML
NetCDF Markup Language
May 8, 2003 Earth System Grid 12
NcML: description
• Developed as ESG/Unidata collaboration• XML language for expressing metadata associated with netCDF data (i.e. data
following the netCDF model)• Modular, extensible architecture: built as a set of schema modules each
fulfilling a specific funtionality: Core NcML schema: XML encoding of file-level metadata associated with any
netcdf file (i.e. same information as contained in netCDF header). Useful for expressing metadata into an encoding standard (XML), so that it can be processed by a large number of clients; also, metadata may be made immediately available even if data is not (for example, it’s on remote storage).
Coordinate system extension: allows capturing of information related to coordinate and coordiante systems (normally encoded as netCDF conventions like COADS or CF). This info can be used for example by high level visualization and analysis clients.
Dataset extension (under development): allows data aggregation and subsetting, definition of derived or virtual data. Aggregation metadata information is used to expose a dataset independently on how (which files) the data is actually stored
Planned extension for openGIS-ISO interoperability• NcML is automatically generated by parsing the input netCDF file(s)
May 8, 2003 Earth System Grid 13
NcML: schemas architecture
NcML core(generic netcdf data)
NcML core(generic netcdf data)
NcML Coordinate Systems(netcdf conventions for coord, coord systems)
NcML Coordinate Systems(netcdf conventions for coord, coord systems)
NcML dataset(aggregation, operations
on data)
NcML dataset(aggregation, operations
on data)
openGIS-ISO openGIS-ISO
Part III
THREDDS
May 8, 2003 Earth System Grid 19
THREDDS
Project lead by Unidata in collaboration with many universities and research groups
Aimed ad developing a standard for hierarchical cataloguing of data and associated metadata
Allows cross browsing of catalogs and associated metadata, federation of data holdings among multiple repositories
ESG is currently evaluating THREDDS technology: we produced and published on the web THREDDS catalogs for 16 PCM runs
Ultimately, ESG might decide to produce THREDDS catalogs for all of its data holdings, either as a separate process or by generating them from other metadata sources
Part IV
Conclusions
May 8, 2003 Earth System Grid 23
Future Development
• Schema conversion: automatic generation of metadata conforming to other standards from ESG collection level metadata DIF, for publishing to GCMD discovery system (also, DIF can be
converted to ISO) Dublin Core, for publishing to digital libraries
• Aggregation metadata: Finalize NcML dataset extension Conversion of NcML aggregation metadata into:
- CDML (for CDAT visualization) - LAS (for analysis of data through LAS)
• Ontologies for scientific schemas interoperability
May 8, 2003 Earth System Grid 24
Collaborations and Impact
• COLLABORATIONS PCM/CCSM modeling community (“ESG schema”) UK eScience office (“ESG schema”) Unidata (NcML)
• FEDERATIONS THREDDS servers GCMD search and discovery engine Digital Libraries
• IMPACT ESG schema could be adopted by a wide scientific community NcML may become standard for XML encoding of netCDF data NcML will be used as standard for Unidata DODS aggregation
server