long term ecological research network officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... ·...

35
Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development Mark Servilla LTER Network Office LTER Information Managers Annual Meeting – San Jose, California 2 – 5 August 2007

Upload: others

Post on 18-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

Long Term Ecological ResearchNetwork Office

EcoTrendsCyber-infrastructure

Development

Mark ServillaLTER Network Office

LTER Information ManagersAnnual Meeting – San Jose, California

2 – 5 August 2007

Page 2: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Building Blocks to Success

• EcoTrends NIS module

• PASTA NIS Module Framework

• Metacat/EML metadata and data management

• PostgreSQL RDBMS

• Java Servlet, JSP, and R programming

• Community support for data collection, documentation, and accessibility

EcoTrends

PASTA

Metacat/EML

Community

PostgreSQL/Java/Tomcat

Page 3: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

PASTA Architecture

SourceA

SourceB

SourceC

Metacat-Harvester

EML

WorkflowEngine

Parser-Loader

DatasetRegistry

Cache

Metadata

DerivedData

WebAPI

HTML

SOAP

EML.xml

Data loading for syntheticprocessing based on events

(e.g., new data, metadata change)

Existing LTERmetadata infrastructure

(Metacat and EML)

Source datacache availableto all workflow

engines

Support formultiple scientificworkflow engines

(e.g., R script, Kepler,Chimera, D2K)

Metadata andderived data

products;metadata as

EML

Standard interfacesto support variousweb portals (e.g.,

Trends, GEOSS, GEON,NEON, WATERS) and

web service APIs

Metadata describingderived data, including data

provenance and data versioning – expand on community

provenance research

Derived data management

Site data/metadata

Existing infrastructure

New infrastructure

Pluggable work flows

User interfaces

Page 4: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

EcoTrends Development 2007

SourceA

SourceB

SourceC

Metacat-Harvester

EML

WorkflowEngine

Parser-Loader

DatasetRegistry

Cache

Metadata

DerivedData

WebAPI

HTML

SOAP

EML.xml

EcoTrends development realm

Page 5: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Development ProcessUse-case

Project Plan

Requirements

Coding

Testing

Release

Milestones

ITERATIVE

SOLUTIONS

Editorial andtechnical

committees,and LNO

Technicalcommittee,

NISAC,and LNO

Editorial andtechnical

committees,and LNO

Page 6: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Major Milestones• EML generation

• Derived data loading

• Website presentation/integration

• Data discovery and presentation– Browse (by site, by topic/sub-topic)

– Search (simple keyword, advanced)

– Result (result set display, dataset display, plot display)

• Data exploration– Graphing (single and multiple datasets)

– Aggregation (temporal)

– Download (data and metadata)

• Site auditing/DAS– Web page, data access, and plot auditing

– Use-statistics and data access policy conformance

Page 7: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

EML Generation

Step 1: Core Metadata

– Define core metadata (e.g., contact information) that is repeated in all EML documents

Step 2: File Name Parsing

– Parse the derived data file names for site/station, variable, unit, and timescale metadata

Step 3: Derived Data Analysis

– Analyze derived data for temporal coverage and data value bounds

Step 4: R Script Analysis and Inclusion

– Include in the methods section of EML the R script used to generate derived data and any annotation associated with a specific derived data product

Step 5: Manual Documentation

– Include both non-automated metadata and tacit knowledge metadata into the EML

Page 8: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Derived Data Loading

• Parse data and load relational database• Record level attributes -

PRIMARY_KEY :: INTEGERSTART_DATE :: DATESTAMPEND_DATE :: DATESTAMPOBS :: FLOATN_EXPECTED :: INTEGERS_DEV :: FLOATS_ERR :: FLOATPROP_MISSING :: FLOATPROP_QUESTIONABLE ::FLOATPROP_ESTIMATED :: FLOATPROP_TRACE :: FLOATPROP_INVALID :: FLOATCOMMENT :: TEXT

Page 9: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Website Presentation

• Initial design and development– EcoTrends

editorial committee

– Electric Sage Designs, LLC

– Laura Downey, Usability Engineer, SEEK Project

Page 10: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Website Integration

Stage 1:Apache, PHP,CSS,Javascript,and MySQL

Stage 2:Apache, PHP,CSS,Javascript,and MySQL

Stage 3:Tomcat, Servlet,JSP, CSS,Javascript, andMetacat

Refactor Refactor

Refactor original website to reflect consistency and modularity; modify CSS for application specific design (e.g., table layout)

Convert all PHP functionality to equivalent Java Server Page (JSP); integrate Metacat based content

Page 11: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Data Discovery and Presentation

Page 12: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Data Discovery and Presentation

Page 13: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Data Discovery and Presentation

Page 14: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Data Discovery and Presentation

Page 15: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Data Exploration

• Graphing (single and multiple datasets)• Aggregation (temporal)• Download (data and metadata)

Page 16: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Site Auditing/DAS

• Web page auditing• Data access auditing• Plot auditing• Use-statistics and data access policy conformance

Page 17: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Parting shot…

Page 18: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

PASTA Architecture

SourceA

SourceB

SourceC

Metacat-Harvester

EML

WorkflowEngine

Parser-Loader

DatasetRegistry

Cache

Metadata

DerivedData

WebAPI

HTML

SOAP

EML.xml

Data loading for syntheticprocessing based on events

(e.g., new data, metadata change)

Existing LTERmetadata infrastructure

(Metacat and EML)

Source datacache availableto all workflow

engines

Support formultiple scientificworkflow engines

(e.g., R script, Kepler,Chimera, D2K)

Metadata andderived data

products;metadata as

EML

Standard interfacesto support variousweb portals (e.g.,

Trends, GEOSS, GEON,NEON, WATERS) and

web service APIs

Metadata describingderived data, including data

provenance and data versioning – expand on community

provenance research

Page 19: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

PASTA Application Stack

EML, Metacat, and Harvester

Registry/Parser/Loader

Workflow Engine

Cache Database

Derived DatabaseMetadata Harvest

Web API - Portal

Site Data and EML Metadata

EML, Metacat, and Harvester

Network-level Synthesis

Site-level data archive

Data transformation andintegration

Standardized data products

Dataset identification andloading

Network interface

Existing EML Harvesting

Scal

e(te

mpo

ral-s

patia

l-org

aniz

atio

nal)

Page 20: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Generalized Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 21: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 22: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

LTER Site Data Collection

• Time-series data– Physical environment (e.g.,

climate, …)– Human population and

economy– Biogeochemistry– Biotic structure

• Data/metadata– Relational Database– Spreadsheet– Text file– HTML/XML

Page 23: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 24: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

EML, Metacat, and the Harvester

• EML Package IDknb-lter-site.XX.YYknb-lter-sev.354.1knb-lter-sev.354.2knb-lter-sev.354.3

• Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted

• Harvester is a time-based update process to “pull” site EML and inserts into Metacat

“existing LTERinvestment intechnology”

SourceA

SourceB

SourceC

Metacat-Harvester

EML

Page 25: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 26: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

EML Loader/Parser

• Dataset registry identifies Trends data in Metacat

• New revisions assert a “new” data load. The EML parser/loader*– Translates the site EML

into the RDBMS DDL– Creates a new DB table in

the primary database based on the revision

– Loads the new data into the primary database

– Trigger to continue workflow

*Collaboration with NCEAS/SEEK

SourceA

SourceB

SourceC

Metacat-Harvester

EML

Parser-Loader

DatasetRegistry

Cache

Page 27: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 28: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Workflow Data Transformation

• “Cache” database stores site data in native site schema and based on snap-shot version

• Workflow Engine– reads native schema

– performs transformation/integration

– writes to global schema

– produces EML metadata

• “Derived” database stores derived data in consistent global schema

WorkflowEngine

Cache

Metadata

DerivedData

Page 29: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Site to Global Schema Mapping

Maximum wind speed meters/secondwspdmax

Minimum wind speed meters/secondwpsdmin

Wind speed meters/secondwspd

Standard deviation of wind directionwdirstd

Wind direction (azimuth)wdir

Timestamp of observation 15 min intervaldate_time

MCM Canada Glacier Wind valueTimestamp (daily)

Wind direction (knb-eco-trends.1.1)

valueTimestamp (daily)

Wind direction std dev (knb-eco-trends.2.1)

valueTimestamp (daily)

Wind speed max (knb-eco-trends.5.1)

“triggered bydata load”

Page 30: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Global Schema

knb_eco_trends_1_1scope

identifier

revision

Page 31: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 32: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

EML for “derived” data

• EML metadata for the derived data and inserts into Metacat

• Derived data is now accessible through “all” Metacat user interface

Metacat-Harvester

EML

WorkflowEngine

Metadata

DerivedData

EML.xml

Page 33: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into

“cache” database5. Workflow Engine transforms “cache” data into

“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in

Metacat8. Derived data is made available through web portal

Page 34: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Web API

• Store Front provides API to derived data products in secondary DB

• HTML – today• Web service –

tomorrow• Issues:

– Authentication– Authorization– Provenance– Quality– Interactive Plots

http://www.ecotrends.info(beta site location)

Metacat-Harvester

EML

Metadata

DerivedData

WebAPI

HTML

SOAP

EML.xml

Page 35: Long Term Ecological Research Network Officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... · Long Term Ecological Research Network Office EcoTrends Cyber-infrastructure Development

LNO NIS

Parting shot…