eap ilter 9 july 2007 don henshaw andrews experimental forest lter pacific northwest research...

31
EAP ILTER EAP ILTER 9 July 2007 9 July 2007 Don Henshaw Don Henshaw Andrews Experimental Forest LTER Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Pacific Northwest Research Station, USFS Forest Service Oregon State University Oregon State University Corvallis, Oregon Corvallis, Oregon CLIMDB/HYDRODB: CLIMDB/HYDRODB: A Web Harvester And Data A Web Harvester And Data Warehouse Approach To Building Warehouse Approach To Building A Cross-site Climate And A Cross-site Climate And Hydrology Database Hydrology Database

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Don Henshaw Don Henshaw Andrews Experimental Forest LTERAndrews Experimental Forest LTER

Pacific Northwest Research Station, USFS Forest ServicePacific Northwest Research Station, USFS Forest ServiceOregon State UniversityOregon State University

Corvallis, OregonCorvallis, Oregon

CLIMDB/HYDRODB: CLIMDB/HYDRODB: A Web Harvester And Data Warehouse A Web Harvester And Data Warehouse

Approach To Building A Cross-site Approach To Building A Cross-site Climate And Hydrology DatabaseClimate And Hydrology Database

Page 2: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Long-Term ResearchLong-Term Ecological Research (LTER)U.S. Forest Service Research (USFS)

International LTER (ILTER)

The 20-year review of LTER challenges the The 20-year review of LTER challenges the network to enhance its inter-site research network to enhance its inter-site research activities by adopting a strategy for network-activities by adopting a strategy for network-based researchbased research

USFS Research intends to increase USFS Research intends to increase collaboration and develop network products collaboration and develop network products for existing Experimental for existing Experimental Forests/WatershedsForests/Watersheds

International LTER collaborationInternational LTER collaboration

Page 3: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Allow and enhance discovery and access of Allow and enhance discovery and access of informationinformation

Foster development of network-level datasetsFoster development of network-level datasets

Commit to populate climate and hydrology datasets Commit to populate climate and hydrology datasets

Facilitate synthesis and integration of informationFacilitate synthesis and integration of informationImprove discovery, access, aggregation, and Improve discovery, access, aggregation, and visualization of data across multiple sitesvisualization of data across multiple sites

Overcome diversity in individual site information Overcome diversity in individual site information systemssystems

Promote collaboration and community-buildingPromote collaboration and community-buildingDevelop partnerships between Information Technology Develop partnerships between Information Technology and science communitiesand science communities

LTER Network Information System Advisory Committee, 2003, 2004

LTER Network Information System (NIS)LTER Network Information System (NIS)GoalsGoals

Page 4: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

ClimDB/HydroDB ObjectivesClimDB/HydroDB Objectives Improve access to long-term collections of

climatic and hydrological data– Long-Term Ecological Research (LTER)

26 NSF-funded sites Taiwan Ecological Research Network (ILTER)

– U.S. Forest Service Research Experimental Forests / Experimental Watersheds

Use web technologies to facilitate synthetic research – Maintain a current data warehouse of multi-site, multi-

network, long-term data– Provide single portal accessibility with a query interface to download and graphically display data

Page 5: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,
Page 6: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

ClimDB/HydroDB ClimDB/HydroDB Harvester / Database/ Query Interface Harvester / Database/ Query Interface

Data Providers Central Site Public User

Triggerson-demand

auto-harvestHTTP Post

USFS Data

Exchange Format

Web Pagedisplay, graph, download

Web ServicesSOAP, WSDL

Access Toolssite-specific data mining

Data Warehouse

Centralized ClimDB/HydroDB

DatabaseH

arv

est

er

NWSData

USGS Data

LTER Data

Queryinterface

Page 7: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

ClimDB/HydroDB ComponentsClimDB/HydroDB Components Data ProvidersData Providers

Individual sitesIndividual sites– Participating sites manage and control original source Participating sites manage and control original source

data within their local information systemsdata within their local information systems– Sites provide data as a static or dynamically created fileSites provide data as a static or dynamically created file

Exchange formatExchange format– Consistent, comma-delimited fileConsistent, comma-delimited file– Flexibility allows contributors to add or remove Flexibility allows contributors to add or remove

parameters from harvest files at any time parameters from harvest files at any time – Attributes and units standardized and based on a Attributes and units standardized and based on a

controlled vocabularycontrolled vocabulary

Page 8: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

Harvest

““Harvester” MechanicsHarvester” Mechanics

Data Warehouse

Centralized ClimDB/HydroDB

Database

Harv

est

er

ExchangeData

Transform,

QA, Load

FeedbackError logsSite

contactClimHy Admin

The Quality Assurance (QA)/Feedback System:

• Provides feedback through error and warning messages directly to the client’s browser and through e-mail

• Specifies errors in exchange format • Identifies data limit and integrity errors • Enables sites to quickly modify their datasets for successful re-harvesting

Page 9: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Participant Web PageParticipant Web Page

http://www.fsl.orst.edu/climhy/harvest/harvest.htm

Page 10: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Duplicate records found

Page 11: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Illegal number of data fields in exchange file

Page 12: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Failed min<mean<max relationship

Page 13: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Allows HydroDB to directly harvest U. S. Geological Survey (USGS) gauging station data from their webpage

Captures near real-time provisional USGS hydrological data on a weekly schedule

Harvests USGS historical data and replaces the provisional data with final archived versions on a regular basis

Generalized as a service to the broader LTER community

Georgia Coastal Ecosystem Georgia Coastal Ecosystem LTER Collaboration LTER Collaboration

Page 14: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

Georgia Coastal Ecosystem LTER Collaboration Georgia Coastal Ecosystem LTER Collaboration USGS Data Harvesting ServiceUSGS Data Harvesting Service

Page 15: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

CentralizedCentralized Architecture Architecture

Source data is loaded into a global schema in the Source data is loaded into a global schema in the relational database (RDBMS)relational database (RDBMS)– Calculates and loads aggregated data (monthly, annual)Calculates and loads aggregated data (monthly, annual)

The global schema for the data warehouse is based on The global schema for the data warehouse is based on highly normalized tables within the databasehighly normalized tables within the database– allows simple structures to house all site data and allows simple structures to house all site data and

metadatametadata– is extensible to additional daily measurementsis extensible to additional daily measurements

The central data warehouse is persistent and The central data warehouse is persistent and participants can continually update and replace participants can continually update and replace harvested data harvested data

Data Warehouse

Centralized ClimDB/HydroDB

DatabaseTransform,

QA, Load

Harv

est

er

Page 16: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Data Access PageData Access PagePublic Access Web PagePublic Access Web Page

http://www.fsl.orst.edu/climhy

Page 17: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Data AcquisitionData Acquisition

Download or Graphical Display

Page 18: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Page 19: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Metadata Metadata ReportsReports

Detail information for the general site, all stations, and all parameters.

Metadata descriptions can also be downloaded

as a PDF

Page 20: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Georgia Coastal Everglades (GCE)Georgia Coastal Everglades (GCE)Matlab Data ToolboxMatlab Data Toolbox

GUI dialog for retrieving ClimDB/HydroDB data

From Wade Sheldon (GCE)

Page 21: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Imported data set GCE tools editor window)

Imported data set (GCE data grid view, with flagged values displayed)

ClimDB/HydroDB Metadata template

Page 22: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Page 23: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

Client (Harvester)

ClimDB Web Services

Data Service Metadata Service Notification Service

Climate

Data

Harvester sends XML request for data to Web Service

1.

One web Service queries an LTER Site database, another exports the data, and another issues an email to the LTER Site data manager detailing success of query

2.

LTER Site

ClimDB

Centralized ClimDB

Database (Andrews LTER)

3.

LTER Site climate data are returned to harvester in XML

5.

The centralized ClimDB database at Andrews LTER is populated

Diagram modified from Longjiang Ding, SDSC

XMLClimDB

Config File

EMLResource

Description

Wizard

SOAP, WSDL, UDDI

Web Services demonstration of ClimDBWeb Services demonstration of ClimDB

4.

Web Service wraps the entire centralized ClimDB database

ClimDB Web Services

Data Service

SOAP, WSDL, UDDI

Page 24: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Site contributions have increased dramatically in the past year for air temperature, precipitation, and stream discharge.

Site ContributionSite Contribution

Participation includes:• 40 total sites

• 24 LTER sites + 2 International LTER sites• 22 USFS sites • 11 sites include USGS gauging stations

• 281 total measurement stations• 143 meteorological, 138 stream gauging (59 USGS)

• 21 daily measurement parameters• 7,200,000 daily values

Page 25: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Data Data Warehouse Warehouse

ContentContent

ParameterParameter

(Daily values)(Daily values)

% by % by Measured Measured ParameterParameter

Stream DischargeStream Discharge 2929

PrecipitationPrecipitation 2626

Air TemperatureAir Temperature 2222

Relative HumidityRelative Humidity 44

Global RadiationGlobal Radiation 44

Soil TemperatureSoil Temperature 33

Resultant Wind SpeedResultant Wind Speed 33

Resultant Wind DirectionResultant Wind Direction 22

OtherOther 77

Observations:•Coverage of precipitation, discharge, and air temperature data is strong across sites.•We encourage sites to contribute relative humidity, soil temperature, wind speed & direction, and global radiation in datasets.

Primary emphasis

Secondary emphasis

Page 26: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

ClimDB/HydroDB Web Access SummaryClimDB/HydroDB Web Access Summary

Values based on data from February 2003 - August 2006

Type of downloadType of download

67006700

DownloadsDownloads

12%12%50%50%38%38%TotalTotal

DisplaysDisplaysPlotsPlotsFilesFiles

Visitors to the ClimDB/HydroDB web interface are increasing and currently average 30 sessions per day. 

Page 27: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Status of Type of UseStatus of Type of Use

Type of UseType of Use % of % of TotalTotal

ResearchResearch 40%40%(60% general research)(60% general research)

EducationEducation 35%35%(90% students)(90% students)

Testing/ExploringTesting/Exploring 25%25%(50% testing by participants)(50% testing by participants)

Values based on data plots from January - March 2004

Page 28: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Keys to Successful ImplementationKeys to Successful Implementation Scientific interestScientific interest

– Scientist/modeler demand for current and comparable data Scientist/modeler demand for current and comparable data – Need for synthetic data productsNeed for synthetic data products

Organizational Organizational – Commitment to building network databasesCommitment to building network databases– Information management (15% LTER site budget)Information management (15% LTER site budget)– Data access / release policiesData access / release policies– Data collection standardsData collection standards– Planning meetings included Climatologists, Information Managers, Data Planning meetings included Climatologists, Information Managers, Data

Users/Modelers, and Field Technician participationUsers/Modelers, and Field Technician participation IncentivesIncentives

– Financial incentives Financial incentives – Value-added products returned to participating sitesValue-added products returned to participating sites

Easy access, aggregated data, graphical displays, QA checks Easy access, aggregated data, graphical displays, QA checks Host site commitment Host site commitment

– Leadership, time, resourcesLeadership, time, resources

Page 29: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

ConclusionsConclusions The ClimDB/HydroDB approach is an effective bridge The ClimDB/HydroDB approach is an effective bridge

technology between older, more rigid data distribution technology between older, more rigid data distribution models and modern service-oriented architectures models and modern service-oriented architectures

Establishes software and service development at the Establishes software and service development at the central node permitting rapid adaptation to changing central node permitting rapid adaptation to changing needsneeds

Maintains low-overhead, flexibility and technological Maintains low-overhead, flexibility and technological neutrality for data providersneutrality for data providers

Additional "concentrator nodes" and middleware services Additional "concentrator nodes" and middleware services can also be deployed very easily and rapidly within this can also be deployed very easily and rapidly within this model to improve efficiency and build bridges to other model to improve efficiency and build bridges to other federated databases federated databases

Page 30: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

Funding was provided by • National Science Foundation (NSF)

•Long-Term Ecological Research (LTER) supplemental funding

•U. S. Forest Service Research and Development

• Forest Health Monitoring (FHM) program• Pacific Northwest Research Station (PNW)

…to the Andrews Forest LTER at Oregon State University forClimDB/HydroDB development

…to individual sites for the preparation of climate and hydrology data

Visit ClimDB/HydroDB at http://www.fsl.orst.edu/climhy

AcknowledgementAcknowledgement

Page 31: EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

EAP ILTEREAP ILTER9 July 20079 July 2007

User GuideUser GuideSection 1.3 Required Steps for Site ParticipationSection 1.3 Required Steps for Site Participation

To participating the site will:To participating the site will:

Provide the research areas, meteorological stations, Provide the research areas, meteorological stations, gauged watersheds, and gauging station names and gauged watersheds, and gauging station names and code namescode names

Restructure local site data into a standardized daily Restructure local site data into a standardized daily exchange format exchange format

Use the online metadata forms to provide metadata Use the online metadata forms to provide metadata for overall research area, for every weather station for overall research area, for every weather station and for every parameterand for every parameter

Harvest data Harvest data