instant karma: accessing provenance information for amsr-e science data products amsr-e science team...

20
Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of Alabama in Huntsville

Upload: betty-lucas

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Instant Karma: Accessing Provenance Information for

AMSR-E Science Data Products

AMSR-E Science Team Meeting29 June 2011

UAHuntsvilleThe University of Alabama in Huntsville

Page 2: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Instant Karma Overview

229 June 2011

Data Center Operations

Provenance Research

Earth Science

Instant Karma Project

• Collaboration among– AMSR-E SIPS (MSFC Earth Science Office

and UAHuntsville ITSC)

– Provenance researchers at Indiana University’s Data to Insight Center

– AMSR-E Sea Ice science team (GSFC)

AMSR-E Science Team Meeting Asheville, NC

• Primary goal is to improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community– Using Karma provenance tool

– Initial focus on Sea Ice processing

Page 3: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Team MembersName Organization Role

Michael Goodman NASA MSFC Principal Investigator

Thorsten Markus Don Cavalieri Alvaro Ivanoff

NASA GSFC Co-Investigator – Science TeamScience Algorithm DeveloperScience Software Developer

Helen Conover Bruce Beaumont Anurag Bhaskar Ajinkya Kulkarni Michael McEniry Kathryn Regner Cara Stein

UAHuntsville Information Technology & Systems Center

Co-Investigator – Project ManagerAMSR-E SIPS Implementation Student, Metadata UtilitiesProvenance Browser Systems AdministrationAMSR-E SIPS Systems EngineerProvenance Browser

Beth Plale Mehmet Aktas Scott Jensen Harsh Joshi Robert Ping Prajakta Purohit Yuan Luo

Indiana University Data to Insight Center

Co-Investigator – Karma SystemKarma Database and ServicesKarma Database and Services Provenance Collection ServicesProject ManagementProvenance Collection ServicesKarma System Monitoring

Dawn Conway UAHuntsville Earth System Science

AMSR-E Science Metadata Consultant

29 June 2011 3AMSR-E Science Team Meeting Asheville, NC

Page 4: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Key Responsibility at Data Creation

• Archivists are traditionally responsible for curation (i.e., adding metadata and provenance) but not present at creation of scientific data. Moment of creation is where most knowledge about product is present.

• To be effective, requires tools in earliest stages of data’s life that help with preservation.

29 June 2011 4

Diagram from Berman et al. “Sustainable Economics for a Digital Planet”

AMSR-E Science Team Meeting Asheville, NC

Create Long Term AccessArchiveDistribute

Preservation Action

Preservation Action

Preservation Action

Page 5: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Instant Karma Provenance Research

• How to’s of provenance capture are becoming better understood

• Collecting right (and only “right”) provenance is a harder challenge, and one we hope to make progress on in this project

• Delivering on representation and use case support simultaneously for today’s users and next generation uses is ongoing challenge– Sea ice research, climate, classroom, informatics ...

29 June 2011 5AMSR-E Science Team Meeting Asheville, NC

Page 6: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Types of Provenance Information

• Data lineage: product generation “recipe” (data inputs, software and hardware)

• Additional knowledge about science algorithms, instrument variations, etc.

• Lots of information already available, but scattered across multiple locations– Processing system configuration– Dataset and file level metadata– Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release notes)– Data documentation (e.g., guide documents, README files)

• Instant Karma project aims to collate and organize information from multiple sources

629 June 2011 AMSR-E Science Team Meeting Asheville, NC

Page 7: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

SIPS-GHRC Processing Architecture

• Algorithm Packages from the Science Team and Team Lead of the Science Computing Facility

• Processing automation controlled by SIPS scripts

– Pass processing is data driven

– L3 product generation is scheduled after nominal availability of input products

7

L2A Brightness

Temps

L2B OceanL2B LandL2B Rain

Pass ScriptOceanLandSnowSea Ice

Daily ScriptSnow

Pentad Script

Ocean

Weekly Script

OceanRainSnow

Monthly Script

Input Data and Metadata File(s)

Ancillary File(s)

Science DataMetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets

Delivered Algorithm Package

Control Script

29 June 2011 AMSR-E Science Team Meeting Asheville, NC

Page 8: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Sea Ice Product Generation

8

One day’s worth of Level-

2A Tbs

Ice MaskSnow Melt Mask

Sea Ice Products (6, 12, and 25 km)MetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets

Delivered Algorithm Package

Daily Processing Script

Sea Ice Algorithms

C and FORTRANProvided by

Science Team

Perl with CProvided by

SIPS

Bourne shellProvided by

SCF

Hardware Platform

Operating System, Compilers

Software Libraries

29 June 2011 AMSR-E Science Team Meeting Asheville, NC

Page 9: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

25 km grid resolution; polar stereographic grid

12.5 km grid resolution; polar stereographic grid

6.25 km grid resolution; polar stereographic grid

Brightness temperatures

6, 10, 18, 23, 36, 89 GHz

18, 23, 36, 89 GHz 89 GHz

Sea ice concentration

NASA Team 2 (NT2)Bootstrap – NT2

NASA Team 2 (NT2)Bootstrap – NT2

Snow depth on sea ice

5-day product

Except for the snow depth product, all products daily averages using (a) ascending orbits only, (b) descending orbits only, (c) all orbits.

AMSR-E Standard Sea Ice Products

29 June 2011 9AMSR-E Science Team Meeting Asheville, NC

Page 10: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Perspectives on Processing Lineage

1029 June 2011

Data Processing View Provenance View

AMSR-E Science Team Meeting Asheville, NC

Page 11: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about

final data granule details [Use case 1]– Spatial location, time of observation, algorithms employed, input data

and ancillary files– Provenance bundle to include pointers to relevant documentation

Answer “Something isn’t right” question [Use case 1 variant]- E.g., did not receive data for several days so snow melt mask may be

inaccurate.

• Compare two data granules [Use case 2]- Query system to get list of provenance differences (e.g., versions of

software, number and versions of input files) General provenance graph for a given science process, e.g.,

Sea Ice processing [Use case 3]- Current algorithms and versions, nominal number and versions of input

files, pointers to relevant documentation

1129 June 2011 AMSR-E Science Team Meeting Asheville, NC

Page 12: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Karma Provenance Repository

Provenance Collection, Storage and Access

• Instrumented AMSR-E SIPS processing workflow for sea ice in the testbed environment. – Provenance information is captured in experiment run log files– Log files are parsed to generate provenance notifications. – These notifications are then imported into the Karma database. – The Karma Service Query API is used to generate OPM-compatible

XML graphs, each corresponding to a processing run. 29 June 2011 12

Ocean

Weekly Script

OceanRainSnow

Monthly Script

Snow

Pentad Script

OceanLandSnowSea Ice

Daily Script

L2A Brightness Temps

Pass Script

L2B OceanL2B LandL2B Rain

Provenance Collection Query API

ProcessingTestbed

AMSR-E Science Team Meeting Asheville, NC

Provenance Browser

Page 13: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Additional Science-Relevant Provenance and Context Information

• Harvesting granule information from ECS metadata• Also recording processing location associated with each

data granule• Working with AMSR-E Science Computing Facility to identify

algorithm and data product information– Algorithm versions and descriptions– Parameters and data fields– Ancillary files– Flag values and explanations– Pointers to full documentation

• Defining how to harvest, transmit and display this information

29 June 2011 13AMSR-E Science Team Meeting Asheville, NC

Page 14: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Provenance Browser for AMSR-E Products

• Initial Prototype– Interactive browsing of

provenance graphs and information about workflows and final data products

– Uses Karma Service Query API to extract provenance graphs from the Karma provenance repository.

29 June 2011 14AMSR-E Science Team Meeting Asheville, NC

http://instantkarmatest.itsc.uah.edu/karmabrowser/view

Page 15: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Instant Karma Major Milestones

29 June 2011 15AMSR-E Science Team Meeting Asheville, NC

Page 16: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Project Schedule

29 June 2011 AMSR-E Science Team Meeting Asheville, NC 16

Page 17: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Near Term Plans• Begin routine daily Sea Ice processing with provenance

collection in Testbed• Enhance context metadata available to Karma system• Refine Provenance Browser and Karma Query API• Develop joint Karma project / SIPS approach for instrumenting

additional processing workflows• Work with other NASA Earth science provenance projects to

develop common approaches to providing provenance information with science data files

• Begin to engage user community– Present to AMSR-E Science Team– Outreach to Sea Ice community at GSFC– Work with NSIDC DAAC to identify friendly users for beta testing

1729 June 2011 AMSR-E Science Team Meeting Asheville, NC

Page 18: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Conclusions

• Instant Karma project is on track to incorporate provenance capture into the AMSR-E SIPS

• Ready to begin testing the AMSR-E Provenance Browser with the wider user community

• Continuing to work with NASA ESDIS, ESDSWG and ESIP Federation in defining common provenance practices across NASA data systems

29 June 2011 AMSR-E Science Team Meeting Asheville, NC 18

Page 19: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

Instant Karma Project Contacts

• Instant Karma Project Web Site– http://instantkarmatest.itsc.uah.edu/ – http://www.pti.iu.edu/d2i/provenance_instantkarma

• AMSR-E Provenance Browser – http://instantkarmatest.itsc.uah.edu/karmabrowser/view

• Contacts– Michael Goodman [email protected] – Helen Conover [email protected] – Beth Plale [email protected]– Thorsten Markus [email protected]

29 June 2011 AMSR-E Science Team Meeting Asheville, NC 19

Page 20: Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of

AcronymsAMSR-E Advanced Microwave Scanning Radiometer for EOSAPI Application Programming InterfaceDAAC Distributed Active Archive CenterECS EOSDIS Core SystemEOS Earth Observing SystemEOSDIS Earth Observing System Data and Information SystemESDSWG Earth Science Data Systems Working Group ESIP Earth Science Information PartnersGSFC Goddard Space Flight CenterITSC Information Technology & Systems CenterMSFC Marshall Space Flight CenterMODIS Moderate Resolution Imaging SpectroradiometerNASA National Aeronautics and Space AdministrationNSIDC National Snow and Ice Data CenterOPM Open Provenance ModelSCF Science Computing FacilitySIPS Science Investigator-led Processing SystemUAHuntsville University of Alabama in HuntsvilleXML eXtensible Markup Language

29 June 2011 20AMSR-E Science Team Meeting Asheville, NC