1 tracking metadata and lineage of the data processing chain for mapping snow cover properties with...

33
1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1 , Thomas H. Painter 2 , Peter Slaughter 1 , Jeff Dozier 1 1 Donald Bren School of Environmental Science and Management, University of California, Santa Barbara 2 National Snow and Ice Data Center, University of Colorado, Boulder

Upload: meryl-snow

Post on 31-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

1

Tracking Metadata and Lineageof the Data Processing Chain

for Mapping Snow Cover Propertieswith the NASA MODIS

James Frew1, Thomas H. Painter2,Peter Slaughter1, Jeff Dozier1

1Donald Bren School of Environmental Science and Management, University of California, Santa Barbara2National Snow and Ice Data Center,University of Colorado, Boulder

Page 2: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

2

Outline

Motivation Snow mapping product Implications for hydrologic modeling

Lineage Capture Wrapping: the ESSW experience Instrumenting,

overriding,monitoring: the (ongoing) ES3 experience

Page 3: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

3

MODIS image – Sierra Nevada

EOS Terra MODIS

07 March 2004

MOD09 Surface Reflectance

0.555 0.645 0.858

Page 4: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

44

Snow-covered area and grain size

Page 5: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

5

Hindu Kush

2003 DOY 070

Page 6: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

6

Colorado RockiesCLPX

13 March 2002

Page 7: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

7

Model structure: MODIS snow-area / albedo

Basinmask

Processing Lineage

Watershedinfo

MODIScloud mask

(48 bits)

MODIS 7 land bands (112 bits)

MODIS quality flags

Topography

MODIS snow cover and grain

size

MODISview

angles

Solarzenith,

azimuth

Snowfraction

albedoRMSerror

Vegfraction

Soilfraction

Shadefraction

Open water

fraction

Quality flag

Page 8: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

8

Lineage Capture, Take 1

The ESSW experience

Page 9: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

9

Using Existing Science Applications

No “standard”Earth science computing environment commercial packages (ArcInfo, MATLAB, …) public packages/models (MM5, MODTRAN, …) locally-developed codes arbitrary combinations of

Example: SST from AVHRR commercial, standalone programs parameters highly customized for UCSB

How do we get these programs to communicate cooperate

with ESSW, without rewriting them?

Navigate(Manual/Automatic)

Receive

Ingest and Calibrate

Rectify

Sea Surface Temp (SST)

SSTMaps

Page 10: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

10

Lineage: Current Best Practice

Page 11: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

11

Earth System Science Workbench (ESSW)

Producer and consumer issues can both be addressedby a laboratory metaphor

Experiment Network of models … ingesting / synthesizing data … generating products

Laboratory Experiment execution environment

– Computing + storage = accessibility + scalability

Lab Notebook Persistent storage that can be queried Keeps track of all experiments

– Documentation + lineage = accountability

Page 12: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

12

Wrap Your App: Scripts Talk to ESSW

No changes,just additions Wrapper scripts

– Make program (groups) look like ESSW experiments

– use Perl API

Lab Notebook daemon– Accepts API commands– Creates XML documents

Sends to database

ESSW database– XML metadata & DTDs– Tabular metadata

XML search terms Lineage links

Navigate(Manual/Automatic)

Receive

Ingest and Calibrate

Rectify

Sea Surface Temp (SST)

SSTMaps

ESSWDatabase

Perl API

Lab Notebookdaemon

XML + SQL

MySQL

JDBC

Java

Perl

Page 13: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

13

ESSW Metadata management

Lab Notebook daemon verifies XML metadata document

Experiment step metadata stored for product lineage tracking

Complete metadata document stored in custom database table XML DTD ← 1:1 → database table (n+1)th column is document itself

Some metadata values extracted into database tables DTD contains column names and types for some elements Always save all the XML,

even if don’t know how to “columnize” all of it

Page 14: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

14

# SST experiment wrapper # $L1B is the input Level 1B AVHRR image file# $SST is the output SST image file # run legacy command "nitpix": creates SST image from L1B image $base_temp = 5.0;$temp_step = 0.1;... system("nitpix base_temp=$base_temp temp_step=$temp_step ... $L1B $SST"); # start recording ESSW metadata beginXMLBld($ENV{USER}, "PRODUCTION"); # get metadata for input file $L1B_ID = findSciObjFromFile($L1B);

AHVRR Level 1Bproduct

Multi-channelsea surfacetemperaturealgorithm

Sea surfacetemperature

(SST)

avhrr_sstModel

avhrr_l1b

avhrr_sst

Wrapper Example: Input Dataset

Page 15: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

15

# create metadata for SST image $SST_ID = createMetadata("avhrr_sst"); addValue($SST_ID, "avhrr_sst.scene_id.satellite", $satellite);addValue($SST_ID, "avhrr_sst.scene_id.pass_date", $pass_date);... saveToDB($SST_ID, avhrr_sst);closeMetadata($SST_ID); saveDigest($SST, $SST_ID); 

AHVRR Level 1Bproduct

Multi-channelsea surfacetemperaturealgorithm

Sea surfacetemperature

(SST)

avhrr_sstModel

avhrr_l1b

avhrr_sst

Wrapper Example: Output Dataset

Page 16: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

16

# create metadata for SST experiment $exp = createExperimentMetadata("avhrr_sstModel");$exp_step = createExpStepMetadata($exp, "avhrr_sstExpStp"); addValue($exp_step, "avhrr_sstExpStp.base_temp", $base_temp);addValue($exp_step, "avhrr_sstExpStp.temp_step", $temp_step);... saveToDB($exp_step, "avhrr_sstExpStp");closeMetadata($exp_step);

# connect input and output images to experiment registerExperimentInputs($exp, $L1B_ID);registerExperimentOutputs($exp, $SST_ID); # finish recording ESSW metadata endXMLBld();

AHVRR Level 1Bproduct

Multi-channelsea surfacetemperaturealgorithm

Sea surfacetemperature

(SST)

avhrr_sstModel

avhrr_l1b

avhrr_sst

Wrapper Example: Process

Page 17: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

17

# create metadata for SST experiment $exp = createExperimentMetadata("avhrr_sstModel");$exp_step = createExpStepMetadata($exp, "avhrr_sstExpStp"); addValue($exp_step, "avhrr_sstExpStp.base_temp", $base_temp);addValue($exp_step, "avhrr_sstExpStp.temp_step", $temp_step);... saveToDB($exp_step, "avhrr_sstExpStp");closeMetadata($exp_step);

# connect input and output images to experiment registerExperimentInputs($exp, $L1B_ID);registerExperimentOutputs($exp, $SST_ID); # finish recording ESSW metadata endXMLBld();

AHVRR Level 1Bproduct

Multi-channelsea surfacetemperaturealgorithm

Sea surfacetemperature

(SST)

avhrr_sstModel

avhrr_l1b

avhrr_sst

Wrapper Example: Lineage Links

Page 18: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

18

Process graph reconstructedfrom ESSW database

Page 19: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

19

ESSW Lessons

Providers are customers ESIPs aren’t much good unless scientists are happy to put information in

them

A light touch is the right touch Wrapping is easier for scientists and their programmers to deal with than

complete re-engineering

Scientists do write scripts, but not necessarily Perl Scripting (gluing stuff together) comes naturally to scientists

Scientists don’t write DTDs

Nobody calls metadata APIs

ESSW was automatic, but not automatic enough…

Page 20: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

20

Lineage Capture, Take 2

The ES3 experience

Page 21: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

21

ES3 : Earth System Science Server

cheap server

RAID 5 controller

cheap server

(mirror)

RAID 5 controller

Back Up Brick (BUB)

read read (backup)

write

cheap server

RAID 5 controller

cheap server

(mirror)

RAID 5 controller

Back Up Brick (BUB)

read read (backup)

write

ESSW++ data lineage tracking

BUB data storage ROCKS processing

clusters

Alexandria Digital Library

Microsoft TerraServer

MODster

OpenDAP

MODIS

Corona

AVHRR

Watershed-scale snow

product

Global-scale snow

product

Page 22: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

22

From ESSW to ES3: Summary

Perl wrappers “Probulators”

Perl API web services + XML messages

MySQL XML database(s)

Page 23: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

23

From Wrappers to Probulators

Wrappers: Active Lineage +

Complete control over what gets recorded Single language/API for all wrapped events Not tied to execution

– You can even lie about what happened

– Must explicitly script everything Scripts can drift from reality

– You can even lie about what happened

Page 24: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

24

From Wrappers to Probulators

Probulators: Passive Lineage +

Record what actually happened– Not just what you think happened

– Not what didn’t happen

Automatic: don’t have to write new scripts for everything

– Different flavors for different environments

– Can’t just do everything in Perl…

Page 25: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

25

Probulator patterns

Instrumentation Insert lineage capture instructions directly into science codes

– e.g. “I just created file ‘foo’” Typical implementation: preprocessor/precompiler

Overriding Replace standard routines/libraries with lineage-capturing versions

– e.g. open(…) → snoopy_open(…) Typical implementation: modify execution environment

– environment variables– configuration files

Passive monitoring Trace program execution

– e.g. “called open() with args foo, bar, …” Typical implementation: strace’d shell

Page 26: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

26

ES3 Lineage Architecture

probulator1

probulatorn

logger transmitter ES3 core

logfiles

Page 27: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

27

Probulating IDL: Instrumenting the code;editpro modscag_cleanse,prefix=prefix,ns=ns,nl=nlHELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, $ ENTER="modscag_cleanse", ENVIROMENT=ES3_ENVIROMENT

; clean up {under,over}flow of MODSCAG run;; Input: prefix = prefix for all of the MODSCAG output filenames; ns = number of samples; nl = number of lines; Output: rewrite of the MODSCAG files;; t.h.painter / 1.19.2005

; open snow fileES3_openr,1,string(prefix,'snow.pic')snow=fltarr(ns,nl)readu,1,snow

[ blah blah blah ]

HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, LEAVE="modscag_cleanse", $ ENVIROMENT=ES3_ENVIROMENTEND ; modscag_cleanse

Page 28: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

28

Probulating IDL: Results

<init time="20050522T234606Z”pid="31002" stime="20050522T234604Z" pstime="20050522T234256Z" ppid="30920" language="idl" user="haavar" hostname="spitting-duck.bren.ucsb.edu"><enviroment>

<variable name="!PATH" value="/home/haavar/probulator//idl:/home/rsi/idl_6.1/lib/hook:

[…]</enviroment><mount-points>

<mount share="dab15:/ed15/rsi" type="nfs">/home/rsi</mount></mount-points>

</init><enter region="modscag_cleanse">

<enviroment><variable type="INT" name="NL" value="2"/><variable type="INT" name="NS" value="2"/>

[…]</enviroment>

</enter><exec time="20050522T234610Z" routine="OPENR"> <io> <file read="true">/home/haavar/painter/data/tillsnow.pic</file> </io></exec>]

Page 29: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

29

Probulating bash: Passive Monitoring

cat /etc/passwd | grep haavar | sed -n 's/\(.*:\)\{2\}\([0-9]\+\).*/\2/p'

25232 1138336174.480079 open("/etc/ld.so.cache", O_RDONLY) = 325232 1138336174.480215 open("/lib/libm.so.6", O_RDONLY) = 3[…]25234 1138336178.887267 dup2(3, 255) = 25525234 1138336178.887912 pipe([3, 4]) = 025234 1138336178.888257 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 2523525235 1138336178.889366 dup2(4, 1) = 125235 1138336178.889975 pipe([3, 4]) = 025235 1138336178.890326 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 2523625235 1138336178.891260 pipe([4, 5]) = 025235 1138336178.891756 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 2523725235 1138336178.892753 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 2523825238 1138336178.894266 dup2(4, 0) = 025236 1138336178.894726 dup2(4, 1) = 125237 1138336178.894763 dup2(3, 0) = 025237 1138336178.895581 dup2(5, 1) = 1[…]25238 1138336178.897006 execve("/bin/sed", ["sed", "-n", "s/\\(.*:\\)\\{2\\}\\([0-9]\\

+\\).*/\\2/p"], ["HOSTNAME=rubber-duck.bren.ucsb.edu", "TERM=xterm-color", […]25236 1138336178.900117 execve("/bin/cat", ["cat", "/etc/passwd”], […]25237 1138336178.903342 execve("/bin/grep", ["grep", "haavar"], […]

Page 30: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

30

Probulating bash: Results

[… <init> same as IDL …]<exec time="20060027T042938.900117Z" routine="/bin/cat" pid="25236" ppid="25235">

<arguments><argument>/etc/passwd</argument>

</arguments><io>

<pipe read="true" id="std-in"/><pipe write="true" id="3"/><pipe write="true" id="std-err"/><file read="true">/etc/ld.so.cache</file>

[…]<file read="true">/etc/passwd</file>

</io></exec><exec time="20060027T042938.903342Z" routine="/bin/grep" pid="25237" ppid="25235">

<arguments><argument>haavar</argument>

</arguments><io>

<pipe read="true" id="3"/><pipe write="true" id="4"/>

[…]</io>

</exec>

Page 31: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

31

Now What?

Probulator reports not universally unique Q: How hook separate reports together? A: Logger assigns UUIDs to

– Data streams

– Processes

– Jobs (workflows)

Lineage not explicit Q: How publish lineage? A: ES3 Core builds serialized graph

Page 32: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

32

Thanks to:

Current Mike Colee Stephane Maritorena Dominic Metzger Karl Rittger Dave Siegel

Former Anurag Acharya Rajendra Bose Scott Denning Debbie Donahue Jim Duff Calin Duma Erik Fields Jim Gray Steve Miley Jordan Morris Mark Pelletier Pete Peterson Walter Rosenthal Klaus Schauser Håvar Valeur

Page 33: 1 Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS James Frew 1, Thomas H. Painter 2, Peter

33

To Probulate Further… http://www.snow.ucsb.edu : Publications

Bose, R. and Frew, J., 2005. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys, vol. 37, no. 1, pp. 1-28. doi:10.1145/1057977.1057978

Dozier, J., and Painter, T.H., 2004. Multispectral and hyperspectral remote sensing of alpine snow properties. Annual Review of Earth and Planetary Sciences, vol. 32, pp. 465-494. doi:10.1146/annurev.earth.32.101802.120404

Molotch, N.P., Painter, T.H., Bales, R.C., and Dozier, J., 2004. Incorporating remotely sensed snow albedo into spatially distributed snowmelt modeling. Geophysical Research Letters, 31, L03501 doi:10.1029/2003GL019063

Frew, J. and Bose, R., 2001. Earth System Science Workbench: a data management infrastructure for Earth science products. In: Kerschberg, L. and Kafatos, M. (eds.) 2001. Proceedings, 13th International Conference on Scientific and Statistical Database Management (SSDBM 2001), pp. 180-189. doi:10.1109/SSDM.2001.938550