instant karma: accessing provenance information for amsr-e science data products amsr-e science team...
TRANSCRIPT
Instant Karma: Accessing Provenance Information for
AMSR-E Science Data Products
AMSR-E Science Team Meeting29 June 2011
UAHuntsvilleThe University of Alabama in Huntsville
Instant Karma Overview
229 June 2011
Data Center Operations
Provenance Research
Earth Science
Instant Karma Project
• Collaboration among– AMSR-E SIPS (MSFC Earth Science Office
and UAHuntsville ITSC)
– Provenance researchers at Indiana University’s Data to Insight Center
– AMSR-E Sea Ice science team (GSFC)
AMSR-E Science Team Meeting Asheville, NC
• Primary goal is to improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community– Using Karma provenance tool
– Initial focus on Sea Ice processing
Team MembersName Organization Role
Michael Goodman NASA MSFC Principal Investigator
Thorsten Markus Don Cavalieri Alvaro Ivanoff
NASA GSFC Co-Investigator – Science TeamScience Algorithm DeveloperScience Software Developer
Helen Conover Bruce Beaumont Anurag Bhaskar Ajinkya Kulkarni Michael McEniry Kathryn Regner Cara Stein
UAHuntsville Information Technology & Systems Center
Co-Investigator – Project ManagerAMSR-E SIPS Implementation Student, Metadata UtilitiesProvenance Browser Systems AdministrationAMSR-E SIPS Systems EngineerProvenance Browser
Beth Plale Mehmet Aktas Scott Jensen Harsh Joshi Robert Ping Prajakta Purohit Yuan Luo
Indiana University Data to Insight Center
Co-Investigator – Karma SystemKarma Database and ServicesKarma Database and Services Provenance Collection ServicesProject ManagementProvenance Collection ServicesKarma System Monitoring
Dawn Conway UAHuntsville Earth System Science
AMSR-E Science Metadata Consultant
29 June 2011 3AMSR-E Science Team Meeting Asheville, NC
Key Responsibility at Data Creation
• Archivists are traditionally responsible for curation (i.e., adding metadata and provenance) but not present at creation of scientific data. Moment of creation is where most knowledge about product is present.
• To be effective, requires tools in earliest stages of data’s life that help with preservation.
29 June 2011 4
Diagram from Berman et al. “Sustainable Economics for a Digital Planet”
AMSR-E Science Team Meeting Asheville, NC
Create Long Term AccessArchiveDistribute
Preservation Action
Preservation Action
Preservation Action
Instant Karma Provenance Research
• How to’s of provenance capture are becoming better understood
• Collecting right (and only “right”) provenance is a harder challenge, and one we hope to make progress on in this project
• Delivering on representation and use case support simultaneously for today’s users and next generation uses is ongoing challenge– Sea ice research, climate, classroom, informatics ...
29 June 2011 5AMSR-E Science Team Meeting Asheville, NC
Types of Provenance Information
• Data lineage: product generation “recipe” (data inputs, software and hardware)
• Additional knowledge about science algorithms, instrument variations, etc.
• Lots of information already available, but scattered across multiple locations– Processing system configuration– Dataset and file level metadata– Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release notes)– Data documentation (e.g., guide documents, README files)
• Instant Karma project aims to collate and organize information from multiple sources
629 June 2011 AMSR-E Science Team Meeting Asheville, NC
SIPS-GHRC Processing Architecture
• Algorithm Packages from the Science Team and Team Lead of the Science Computing Facility
• Processing automation controlled by SIPS scripts
– Pass processing is data driven
– L3 product generation is scheduled after nominal availability of input products
7
L2A Brightness
Temps
L2B OceanL2B LandL2B Rain
Pass ScriptOceanLandSnowSea Ice
Daily ScriptSnow
Pentad Script
Ocean
Weekly Script
OceanRainSnow
Monthly Script
Input Data and Metadata File(s)
Ancillary File(s)
Science DataMetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets
Delivered Algorithm Package
Control Script
29 June 2011 AMSR-E Science Team Meeting Asheville, NC
Sea Ice Product Generation
8
One day’s worth of Level-
2A Tbs
Ice MaskSnow Melt Mask
Sea Ice Products (6, 12, and 25 km)MetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets
Delivered Algorithm Package
Daily Processing Script
Sea Ice Algorithms
C and FORTRANProvided by
Science Team
Perl with CProvided by
SIPS
Bourne shellProvided by
SCF
Hardware Platform
Operating System, Compilers
Software Libraries
29 June 2011 AMSR-E Science Team Meeting Asheville, NC
25 km grid resolution; polar stereographic grid
12.5 km grid resolution; polar stereographic grid
6.25 km grid resolution; polar stereographic grid
Brightness temperatures
6, 10, 18, 23, 36, 89 GHz
18, 23, 36, 89 GHz 89 GHz
Sea ice concentration
NASA Team 2 (NT2)Bootstrap – NT2
NASA Team 2 (NT2)Bootstrap – NT2
Snow depth on sea ice
5-day product
Except for the snow depth product, all products daily averages using (a) ascending orbits only, (b) descending orbits only, (c) all orbits.
AMSR-E Standard Sea Ice Products
29 June 2011 9AMSR-E Science Team Meeting Asheville, NC
Perspectives on Processing Lineage
1029 June 2011
Data Processing View Provenance View
AMSR-E Science Team Meeting Asheville, NC
AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about
final data granule details [Use case 1]– Spatial location, time of observation, algorithms employed, input data
and ancillary files– Provenance bundle to include pointers to relevant documentation
Answer “Something isn’t right” question [Use case 1 variant]- E.g., did not receive data for several days so snow melt mask may be
inaccurate.
• Compare two data granules [Use case 2]- Query system to get list of provenance differences (e.g., versions of
software, number and versions of input files) General provenance graph for a given science process, e.g.,
Sea Ice processing [Use case 3]- Current algorithms and versions, nominal number and versions of input
files, pointers to relevant documentation
1129 June 2011 AMSR-E Science Team Meeting Asheville, NC
Karma Provenance Repository
Provenance Collection, Storage and Access
• Instrumented AMSR-E SIPS processing workflow for sea ice in the testbed environment. – Provenance information is captured in experiment run log files– Log files are parsed to generate provenance notifications. – These notifications are then imported into the Karma database. – The Karma Service Query API is used to generate OPM-compatible
XML graphs, each corresponding to a processing run. 29 June 2011 12
Ocean
Weekly Script
OceanRainSnow
Monthly Script
Snow
Pentad Script
OceanLandSnowSea Ice
Daily Script
L2A Brightness Temps
Pass Script
L2B OceanL2B LandL2B Rain
Provenance Collection Query API
ProcessingTestbed
AMSR-E Science Team Meeting Asheville, NC
Provenance Browser
Additional Science-Relevant Provenance and Context Information
• Harvesting granule information from ECS metadata• Also recording processing location associated with each
data granule• Working with AMSR-E Science Computing Facility to identify
algorithm and data product information– Algorithm versions and descriptions– Parameters and data fields– Ancillary files– Flag values and explanations– Pointers to full documentation
• Defining how to harvest, transmit and display this information
29 June 2011 13AMSR-E Science Team Meeting Asheville, NC
Provenance Browser for AMSR-E Products
• Initial Prototype– Interactive browsing of
provenance graphs and information about workflows and final data products
– Uses Karma Service Query API to extract provenance graphs from the Karma provenance repository.
29 June 2011 14AMSR-E Science Team Meeting Asheville, NC
http://instantkarmatest.itsc.uah.edu/karmabrowser/view
Instant Karma Major Milestones
29 June 2011 15AMSR-E Science Team Meeting Asheville, NC
Project Schedule
29 June 2011 AMSR-E Science Team Meeting Asheville, NC 16
Near Term Plans• Begin routine daily Sea Ice processing with provenance
collection in Testbed• Enhance context metadata available to Karma system• Refine Provenance Browser and Karma Query API• Develop joint Karma project / SIPS approach for instrumenting
additional processing workflows• Work with other NASA Earth science provenance projects to
develop common approaches to providing provenance information with science data files
• Begin to engage user community– Present to AMSR-E Science Team– Outreach to Sea Ice community at GSFC– Work with NSIDC DAAC to identify friendly users for beta testing
1729 June 2011 AMSR-E Science Team Meeting Asheville, NC
Conclusions
• Instant Karma project is on track to incorporate provenance capture into the AMSR-E SIPS
• Ready to begin testing the AMSR-E Provenance Browser with the wider user community
• Continuing to work with NASA ESDIS, ESDSWG and ESIP Federation in defining common provenance practices across NASA data systems
29 June 2011 AMSR-E Science Team Meeting Asheville, NC 18
Instant Karma Project Contacts
• Instant Karma Project Web Site– http://instantkarmatest.itsc.uah.edu/ – http://www.pti.iu.edu/d2i/provenance_instantkarma
• AMSR-E Provenance Browser – http://instantkarmatest.itsc.uah.edu/karmabrowser/view
• Contacts– Michael Goodman [email protected] – Helen Conover [email protected] – Beth Plale [email protected]– Thorsten Markus [email protected]
29 June 2011 AMSR-E Science Team Meeting Asheville, NC 19
AcronymsAMSR-E Advanced Microwave Scanning Radiometer for EOSAPI Application Programming InterfaceDAAC Distributed Active Archive CenterECS EOSDIS Core SystemEOS Earth Observing SystemEOSDIS Earth Observing System Data and Information SystemESDSWG Earth Science Data Systems Working Group ESIP Earth Science Information PartnersGSFC Goddard Space Flight CenterITSC Information Technology & Systems CenterMSFC Marshall Space Flight CenterMODIS Moderate Resolution Imaging SpectroradiometerNASA National Aeronautics and Space AdministrationNSIDC National Snow and Ice Data CenterOPM Open Provenance ModelSCF Science Computing FacilitySIPS Science Investigator-led Processing SystemUAHuntsville University of Alabama in HuntsvilleXML eXtensible Markup Language
29 June 2011 20AMSR-E Science Team Meeting Asheville, NC