long term ecological research network officegce-lter.marsci.uga.edu/lter_im/2007/app/uploads/... ·...
TRANSCRIPT
Long Term Ecological ResearchNetwork Office
EcoTrendsCyber-infrastructure
Development
Mark ServillaLTER Network Office
LTER Information ManagersAnnual Meeting – San Jose, California
2 – 5 August 2007
LNO NIS
Building Blocks to Success
• EcoTrends NIS module
• PASTA NIS Module Framework
• Metacat/EML metadata and data management
• PostgreSQL RDBMS
• Java Servlet, JSP, and R programming
• Community support for data collection, documentation, and accessibility
EcoTrends
PASTA
Metacat/EML
Community
PostgreSQL/Java/Tomcat
LNO NIS
PASTA Architecture
SourceA
SourceB
SourceC
Metacat-Harvester
EML
WorkflowEngine
Parser-Loader
DatasetRegistry
Cache
Metadata
DerivedData
WebAPI
HTML
SOAP
EML.xml
Data loading for syntheticprocessing based on events
(e.g., new data, metadata change)
Existing LTERmetadata infrastructure
(Metacat and EML)
Source datacache availableto all workflow
engines
Support formultiple scientificworkflow engines
(e.g., R script, Kepler,Chimera, D2K)
Metadata andderived data
products;metadata as
EML
Standard interfacesto support variousweb portals (e.g.,
Trends, GEOSS, GEON,NEON, WATERS) and
web service APIs
Metadata describingderived data, including data
provenance and data versioning – expand on community
provenance research
Derived data management
Site data/metadata
Existing infrastructure
New infrastructure
Pluggable work flows
User interfaces
LNO NIS
EcoTrends Development 2007
SourceA
SourceB
SourceC
Metacat-Harvester
EML
WorkflowEngine
Parser-Loader
DatasetRegistry
Cache
Metadata
DerivedData
WebAPI
HTML
SOAP
EML.xml
EcoTrends development realm
LNO NIS
Development ProcessUse-case
Project Plan
Requirements
Coding
Testing
Release
Milestones
ITERATIVE
SOLUTIONS
Editorial andtechnical
committees,and LNO
Technicalcommittee,
NISAC,and LNO
Editorial andtechnical
committees,and LNO
LNO NIS
Major Milestones• EML generation
• Derived data loading
• Website presentation/integration
• Data discovery and presentation– Browse (by site, by topic/sub-topic)
– Search (simple keyword, advanced)
– Result (result set display, dataset display, plot display)
• Data exploration– Graphing (single and multiple datasets)
– Aggregation (temporal)
– Download (data and metadata)
• Site auditing/DAS– Web page, data access, and plot auditing
– Use-statistics and data access policy conformance
LNO NIS
EML Generation
Step 1: Core Metadata
– Define core metadata (e.g., contact information) that is repeated in all EML documents
Step 2: File Name Parsing
– Parse the derived data file names for site/station, variable, unit, and timescale metadata
Step 3: Derived Data Analysis
– Analyze derived data for temporal coverage and data value bounds
Step 4: R Script Analysis and Inclusion
– Include in the methods section of EML the R script used to generate derived data and any annotation associated with a specific derived data product
Step 5: Manual Documentation
– Include both non-automated metadata and tacit knowledge metadata into the EML
LNO NIS
Derived Data Loading
• Parse data and load relational database• Record level attributes -
PRIMARY_KEY :: INTEGERSTART_DATE :: DATESTAMPEND_DATE :: DATESTAMPOBS :: FLOATN_EXPECTED :: INTEGERS_DEV :: FLOATS_ERR :: FLOATPROP_MISSING :: FLOATPROP_QUESTIONABLE ::FLOATPROP_ESTIMATED :: FLOATPROP_TRACE :: FLOATPROP_INVALID :: FLOATCOMMENT :: TEXT
LNO NIS
Website Presentation
• Initial design and development– EcoTrends
editorial committee
– Electric Sage Designs, LLC
– Laura Downey, Usability Engineer, SEEK Project
LNO NIS
Website Integration
Stage 1:Apache, PHP,CSS,Javascript,and MySQL
Stage 2:Apache, PHP,CSS,Javascript,and MySQL
Stage 3:Tomcat, Servlet,JSP, CSS,Javascript, andMetacat
Refactor Refactor
Refactor original website to reflect consistency and modularity; modify CSS for application specific design (e.g., table layout)
Convert all PHP functionality to equivalent Java Server Page (JSP); integrate Metacat based content
LNO NIS
Data Discovery and Presentation
LNO NIS
Data Discovery and Presentation
LNO NIS
Data Discovery and Presentation
LNO NIS
Data Discovery and Presentation
LNO NIS
Data Exploration
• Graphing (single and multiple datasets)• Aggregation (temporal)• Download (data and metadata)
LNO NIS
Site Auditing/DAS
• Web page auditing• Data access auditing• Plot auditing• Use-statistics and data access policy conformance
LNO NIS
Parting shot…
LNO NIS
PASTA Architecture
SourceA
SourceB
SourceC
Metacat-Harvester
EML
WorkflowEngine
Parser-Loader
DatasetRegistry
Cache
Metadata
DerivedData
WebAPI
HTML
SOAP
EML.xml
Data loading for syntheticprocessing based on events
(e.g., new data, metadata change)
Existing LTERmetadata infrastructure
(Metacat and EML)
Source datacache availableto all workflow
engines
Support formultiple scientificworkflow engines
(e.g., R script, Kepler,Chimera, D2K)
Metadata andderived data
products;metadata as
EML
Standard interfacesto support variousweb portals (e.g.,
Trends, GEOSS, GEON,NEON, WATERS) and
web service APIs
Metadata describingderived data, including data
provenance and data versioning – expand on community
provenance research
LNO NIS
PASTA Application Stack
EML, Metacat, and Harvester
Registry/Parser/Loader
Workflow Engine
Cache Database
Derived DatabaseMetadata Harvest
Web API - Portal
Site Data and EML Metadata
EML, Metacat, and Harvester
Network-level Synthesis
Site-level data archive
Data transformation andintegration
Standardized data products
Dataset identification andloading
Network interface
Existing EML Harvesting
Scal
e(te
mpo
ral-s
patia
l-org
aniz
atio
nal)
LNO NIS
Generalized Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
LTER Site Data Collection
• Time-series data– Physical environment (e.g.,
climate, …)– Human population and
economy– Biogeochemistry– Biotic structure
• Data/metadata– Relational Database– Spreadsheet– Text file– HTML/XML
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
EML, Metacat, and the Harvester
• EML Package IDknb-lter-site.XX.YYknb-lter-sev.354.1knb-lter-sev.354.2knb-lter-sev.354.3
• Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted
• Harvester is a time-based update process to “pull” site EML and inserts into Metacat
“existing LTERinvestment intechnology”
SourceA
SourceB
SourceC
Metacat-Harvester
EML
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
EML Loader/Parser
• Dataset registry identifies Trends data in Metacat
• New revisions assert a “new” data load. The EML parser/loader*– Translates the site EML
into the RDBMS DDL– Creates a new DB table in
the primary database based on the revision
– Loads the new data into the primary database
– Trigger to continue workflow
*Collaboration with NCEAS/SEEK
SourceA
SourceB
SourceC
Metacat-Harvester
EML
Parser-Loader
DatasetRegistry
Cache
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
Workflow Data Transformation
• “Cache” database stores site data in native site schema and based on snap-shot version
• Workflow Engine– reads native schema
– performs transformation/integration
– writes to global schema
– produces EML metadata
• “Derived” database stores derived data in consistent global schema
WorkflowEngine
Cache
Metadata
DerivedData
LNO NIS
Site to Global Schema Mapping
Maximum wind speed meters/secondwspdmax
Minimum wind speed meters/secondwpsdmin
Wind speed meters/secondwspd
Standard deviation of wind directionwdirstd
Wind direction (azimuth)wdir
Timestamp of observation 15 min intervaldate_time
MCM Canada Glacier Wind valueTimestamp (daily)
Wind direction (knb-eco-trends.1.1)
valueTimestamp (daily)
Wind direction std dev (knb-eco-trends.2.1)
valueTimestamp (daily)
Wind speed max (knb-eco-trends.5.1)
…
“triggered bydata load”
LNO NIS
Global Schema
knb_eco_trends_1_1scope
identifier
revision
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
EML for “derived” data
• EML metadata for the derived data and inserts into Metacat
• Derived data is now accessible through “all” Metacat user interface
Metacat-Harvester
EML
WorkflowEngine
Metadata
DerivedData
EML.xml
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision indicating new data
3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset into
“cache” database5. Workflow Engine transforms “cache” data into
“derived” data6. Transformed data is stored in “derived” database7. EML is generated for derived data and is stored in
Metacat8. Derived data is made available through web portal
LNO NIS
Web API
• Store Front provides API to derived data products in secondary DB
• HTML – today• Web service –
tomorrow• Issues:
– Authentication– Authorization– Provenance– Quality– Interactive Plots
http://www.ecotrends.info(beta site location)
Metacat-Harvester
EML
Metadata
DerivedData
WebAPI
HTML
SOAP
EML.xml
LNO NIS
Parting shot…