nasa earth science data and information system (esdis) project preservation activities – software...

15
NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems and Applications, Inc. & ESDIS Project, NASA Goddard Space Flight Center WGISS Data Stewardship Interest Group Meeting , September 30, 2015

Upload: theodora-waters

Post on 05-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

NASA Earth Science Data and Information System (ESDIS) Project

Preservation Activities – Software & Documentation

H. K. “Rama” Ramapriyan Science Systems and Applications, Inc. & ESDIS Project, NASA

Goddard Space Flight CenterWGISS Data Stewardship Interest Group Meeting , September 30, 2015

Page 2: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Topics

Context - NASA Earth Science Data Preservation Content SpecificationSoftware and document preservation - lessons learnedNASA Earth Science Data System Working Groups • 5 of 16 WG’s in Data Stewardship Interest

Area (2014-2015)• 4 of 16 WG’s in Data Stewardship Interest

Area (2015-2016)

Page 3: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Preservation

NASA not a “preservation agency”, but…• it is essential for NASA to preserve all the data and associated

content beyond the lives of NASA’s missions to meet NASA’s near-term objective of providing access to data and services for active scientific research. Also NASA has to ensure that the data and associated content are preserved for transition to permanent archival agencies.

Preservation involves ensuring long-term protection of:BitsDiscoverability and accessibilityReadability Understandability UsabilityReproducibility of results

Page 4: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Preservation Content Specification (PCS)

Has been in effect since November 2011; latest version dated January 2013Covers eight categories of content plus a checklist (see next page)Of necessity, rigor of application varies among completed, on-going and future missions • Completed missions – requirements had not been in place;

several items may no longer be available for preservation; responsible individuals may not be accessible

• On-going missions - requirements had not been in place; some of the relevant data and documentation generated early in mission may not be available easily; need additional work to reach responsible individuals

• Future missions – requirements are in place; included as part of mission planning

Page 5: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Preservation Content Categories

1. Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets

2. Science Data Products: Raw instrument data, Level 0 through Level 4 data products and associated metadata

3. Science Data Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information

4. Mission Data Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers

5. Science Data Product Software: Product generation software and software documentation

6. Science Data Product Algorithm Input: Any ancillary data or other data sets used in generation or calibration of the data or derived product; ancillary data description and documentation

7. Science Data Product Validation: Records, publications and data sets8. Science Data Software Tools: product access (reader) tools.9. Checklist: “metadata” about the above 8 categories showing how and

where items in each category are preserved

Page 6: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Organizations holding relevant content during project life cycle

Instrument Teams / PI’s

Instrument Developer/

Manufacturer

Data gathering project (e.g., flight

project)

Interdisciplinary Data User / PIProduct

Generation Support Teams

(SIPSs)

DAACs

Calibration Teams

Mission Operations Team

Validation Teams

USGS

NOAA

NASA Technical

Report Server (NTRS)

NASA Aeronautics and Space Database (NA&SD)”

General users

International Partner

Archives

Page 7: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Use of PCS in NASA to-date

Distributed Active Archive Centers (DAACs) work with instrument teams, with higher priority to instruments at or near end-of-life • Using PCS as checklist• UARS (Sept. 1991), Earth Probe/TOMS (July 1996), AIRS, AMSR-

E (EOS Aqua – May 2002), ICESat-1 (Jan. 2003), HIRDLS, MLS (EOS Aura – July 2004), LIS (TRMM – Nov. 1997)

• Artifacts called for in PCS have been gathered for several of the above, organized by categories and archived (e.g., see http://disc.sci.gsfc.nasa.gov/Aura/additional/documentation/hirdls-preservation-documents)

New missions are required to plan to preserve and deliver to DAACs items listed in PCS• Included as a “Level 1” requirement for new missions since

2012• SMAP mission (launched Jan. 2015) has started preparing list of

ancillary data and documentation to be preserved

Page 8: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Software

Missions are required to deliver product generation software (source code) Purpose of preservation of software is primarily for users to understand exactly how products were generated• Algorithm Theoretical Basis Documents are generally not a precise

description• PCS states “The final version of a derived product should be the version

archived. If results reported in peer reviewed publications were based on earlier versions of the product, those versions or at least representative subsets of those versions should also be archived. At a minimum, the algorithm and software that generated such earlier versions should be archived.”

• “Versions of science data product software should be archived for each major product release. A major product release is characterized by the appearance of peer reviewed publications where reported results are based on the product version.”

It is not expected that “heritage software” will necessarily be executable; it may take significant effort to regenerate products from preserved softwareIn some cases, software specification documents have been deemed acceptable as substitutes for source code

Page 9: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Documentation

PCS calls for several types of documentation covering project/data life cyclesDAACs archive and maintain checklists of specific documentation delivered by instrument teams and flight projectsGoddard DAAC uses Fedora Commons, an open-source repository management system • Simple web-based Graphical User Interface (GUI).• Allows entry of objects or data-streams (these can be of

any type document, image, source code, binary data, etc.) • The DAAC has developed a command line script to allow

batch ingest of objects into the Fedora Repository. Public access documents are kept separate from restricted (sensitive or proprietary) documentsHeritage missions require extensive work for gathering and processing documents for preservation

Page 10: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Examples for Scale of Effort

*Includes source code and documentation**List of published papers

CategoryNumber of

Items (HIRDLS)

Number of Items (GLAS/

ICESat)

Preflight/Pre-Operations Calibration 168 23

Product Documentation 18 34

Mission Calibration 10 12

Science Data Product Software 26* 5

Science Data Product Algorithm Inputs 1 56

Science Data Product Validation 1** 3

Science Data Software Tools 1 20

Total 225 153

Thousands of items had to be reviewed for deciding what had to be preserved

Page 11: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Standard for Preservation Content (1 of 2)

NASA would like to see a broad international standard

identifying preservation content – NASA’s PCS is a

good starting point NASA has drafted a TC 211 New Work Item Proposal (NWIP) for this

ISO/TC 211 had approved a NWIP for ISO 19165 before

NASA’s draft was generated “Geographic Information - Preservation of digital data and metadata”

initiated by Prof. Wolfgang Kresse, Chair of International Society for

Photogrammetry and Remote Sensing (ISPRS) Ad-hoc Group on

Standards

Some overlap in interests between NASA’s draft NWIP

and ISO 19165 ISO 19165 mainly driven by the interests of National Mapping and

Cadastral Agencies (vector data)11

Page 12: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

Standard for Preservation Content (2 of 2)

Options Include content similar to NASA’s PCS as a part of ISO 19165

Wait for ISO 19165 to be completed and initiate an extension (say

19165-2)

H. K. “Rama” Ramapriyan participating as U.S.

expert in the Working Group (WG-7) working on

ISO 19165 Discussion session held on June 8, 2015 during TC 211 Plenary

Meeting held in Southampton, England

Rama recommended including language in the standard to

suggest/require such content standards to be developed and

open the door for supplementary standards – e.g., ISO 19165-2

(similarly to ISO 19115-2).

Awaiting revised draft and follow-up

12

Page 13: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

ESDSWG – Data Stewardship Interest Area – Working Groups (April 2014 – March 2015) (1 of 3)

Working Groups (April 2014 – March 2015)• Data Preservation Practices WG

• Mission: Collaborate with stakeholders to define and document an archive process, spanning all types of projects, that can be used to encourage the timely delivery of science data products and related documentation, as defined in the PCS document

• Key Accomplishments: “Data Preservation Guidelines” document delivered to ESDIS Project – provides relationship between different project lifecycles and archive lifecycle and recommendations on when various artifacts should be collected for archival

• Data Quality WG (continues into 2015-2016)• Mission: Assess the existing data quality standards and

practices in the inter-agency and international arena to determine a working solution relevant to ESDIS, DAACs, and NASA Data Providers

• Key Accomplishments: Analyzed 16 use cases. Arrived at over 90 recommendations. Integrated document with 12 high priority recommendations delivered to ESDIS Project. “Low-hanging fruits” identified. Analysis of implementation complexity in progress

Page 14: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

ESDSWG – Data Stewardship Interest Area – Working Groups (April 2014 – March 2015) (2 of 3)

Dataset Interoperability WG (continues into 2015-2016)• Mission: Identify best practices to bridge or reduce gaps

between NASA-stewarded data and data from outside NASA, and to ensure NASA data discoverability, maintainability and extensibility using CF, ISO, and Attribute Conventions for Data Discovery (ACDD) conventions

• Key Accomplishments: 1. Seven recommendations for Grid Structures in Earth science datasets; 2. Continued improvement of metadata compliance checking; 3. Continued engagement with CF community to exploit group hierarchies

Digital Object Identifiers WG (continues related work in Citations and Identifiers WG – 2015-2016)• Mission: Develop a method to promote consistency,

discoverability, and usefulness across NASA DOI landing pages• Key Accomplishments: Developed list of minimal metadata

elements needed to meet the needs of a DOI landing page, reviewed with all DAACs and made final recommendation to ESDIS Project. Made recommendations for improvements in ESDIS Project’s DOI registration process. (On-going work on identifiers for objects other than datasets.)

Page 15: NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems

ESDSWG – Data Stewardship Interest Area – Working Groups (April 2014 – March 2015) (3 of 3)

Working Groups (April 2014 – March 2015)• PROV-ES WG (continues into 2015-2016)

• Mission: assess and determine an interoperable provenance standard for use in Earth Science Data Systems to enable the following:– Ensure capturing the increasing amount of contextual processing

information of Earth Science Data Records (ESDRs).– Improve the understanding of the lineage and dependencies of

ESDRs.– Provide an interoperable representation of provenance for NASA

EOS missions that adheres to the NASA Preservation Information Architecture.

• Key Accomplishments: – Defined extensions to W3C PROV to accommodate Earth science-

specific processes (during 2013-2014)– Infused Automatic PROV-ES generation into initial NASA data

systems– Implemented faceted search interface to display and explore PROV-

ES records– Identified use cases at DAACs to which PROV-ES will be applied