eap ilter 9 july 2007 don henshaw andrews experimental forest lter pacific northwest research...
Post on 22-Dec-2015
214 views
TRANSCRIPT
EAP ILTEREAP ILTER9 July 20079 July 2007
Don Henshaw Don Henshaw Andrews Experimental Forest LTERAndrews Experimental Forest LTER
Pacific Northwest Research Station, USFS Forest ServicePacific Northwest Research Station, USFS Forest ServiceOregon State UniversityOregon State University
Corvallis, OregonCorvallis, Oregon
CLIMDB/HYDRODB: CLIMDB/HYDRODB: A Web Harvester And Data Warehouse A Web Harvester And Data Warehouse
Approach To Building A Cross-site Approach To Building A Cross-site Climate And Hydrology DatabaseClimate And Hydrology Database
EAP ILTEREAP ILTER9 July 20079 July 2007
Long-Term ResearchLong-Term Ecological Research (LTER)U.S. Forest Service Research (USFS)
International LTER (ILTER)
The 20-year review of LTER challenges the The 20-year review of LTER challenges the network to enhance its inter-site research network to enhance its inter-site research activities by adopting a strategy for network-activities by adopting a strategy for network-based researchbased research
USFS Research intends to increase USFS Research intends to increase collaboration and develop network products collaboration and develop network products for existing Experimental for existing Experimental Forests/WatershedsForests/Watersheds
International LTER collaborationInternational LTER collaboration
EAP ILTEREAP ILTER9 July 20079 July 2007
Allow and enhance discovery and access of Allow and enhance discovery and access of informationinformation
Foster development of network-level datasetsFoster development of network-level datasets
Commit to populate climate and hydrology datasets Commit to populate climate and hydrology datasets
Facilitate synthesis and integration of informationFacilitate synthesis and integration of informationImprove discovery, access, aggregation, and Improve discovery, access, aggregation, and visualization of data across multiple sitesvisualization of data across multiple sites
Overcome diversity in individual site information Overcome diversity in individual site information systemssystems
Promote collaboration and community-buildingPromote collaboration and community-buildingDevelop partnerships between Information Technology Develop partnerships between Information Technology and science communitiesand science communities
LTER Network Information System Advisory Committee, 2003, 2004
LTER Network Information System (NIS)LTER Network Information System (NIS)GoalsGoals
EAP ILTEREAP ILTER9 July 20079 July 2007
ClimDB/HydroDB ObjectivesClimDB/HydroDB Objectives Improve access to long-term collections of
climatic and hydrological data– Long-Term Ecological Research (LTER)
26 NSF-funded sites Taiwan Ecological Research Network (ILTER)
– U.S. Forest Service Research Experimental Forests / Experimental Watersheds
Use web technologies to facilitate synthetic research – Maintain a current data warehouse of multi-site, multi-
network, long-term data– Provide single portal accessibility with a query interface to download and graphically display data
EAP ILTEREAP ILTER9 July 20079 July 2007
ClimDB/HydroDB ClimDB/HydroDB Harvester / Database/ Query Interface Harvester / Database/ Query Interface
Data Providers Central Site Public User
Triggerson-demand
auto-harvestHTTP Post
USFS Data
Exchange Format
Web Pagedisplay, graph, download
Web ServicesSOAP, WSDL
Access Toolssite-specific data mining
Data Warehouse
Centralized ClimDB/HydroDB
DatabaseH
arv
est
er
NWSData
USGS Data
LTER Data
Queryinterface
EAP ILTEREAP ILTER9 July 20079 July 2007
ClimDB/HydroDB ComponentsClimDB/HydroDB Components Data ProvidersData Providers
Individual sitesIndividual sites– Participating sites manage and control original source Participating sites manage and control original source
data within their local information systemsdata within their local information systems– Sites provide data as a static or dynamically created fileSites provide data as a static or dynamically created file
Exchange formatExchange format– Consistent, comma-delimited fileConsistent, comma-delimited file– Flexibility allows contributors to add or remove Flexibility allows contributors to add or remove
parameters from harvest files at any time parameters from harvest files at any time – Attributes and units standardized and based on a Attributes and units standardized and based on a
controlled vocabularycontrolled vocabulary
Harvest
““Harvester” MechanicsHarvester” Mechanics
Data Warehouse
Centralized ClimDB/HydroDB
Database
Harv
est
er
ExchangeData
Transform,
QA, Load
FeedbackError logsSite
contactClimHy Admin
The Quality Assurance (QA)/Feedback System:
• Provides feedback through error and warning messages directly to the client’s browser and through e-mail
• Specifies errors in exchange format • Identifies data limit and integrity errors • Enables sites to quickly modify their datasets for successful re-harvesting
EAP ILTEREAP ILTER9 July 20079 July 2007
Participant Web PageParticipant Web Page
http://www.fsl.orst.edu/climhy/harvest/harvest.htm
EAP ILTEREAP ILTER9 July 20079 July 2007
Duplicate records found
EAP ILTEREAP ILTER9 July 20079 July 2007
Illegal number of data fields in exchange file
EAP ILTEREAP ILTER9 July 20079 July 2007
Failed min<mean<max relationship
EAP ILTEREAP ILTER9 July 20079 July 2007
Allows HydroDB to directly harvest U. S. Geological Survey (USGS) gauging station data from their webpage
Captures near real-time provisional USGS hydrological data on a weekly schedule
Harvests USGS historical data and replaces the provisional data with final archived versions on a regular basis
Generalized as a service to the broader LTER community
Georgia Coastal Ecosystem Georgia Coastal Ecosystem LTER Collaboration LTER Collaboration
Georgia Coastal Ecosystem LTER Collaboration Georgia Coastal Ecosystem LTER Collaboration USGS Data Harvesting ServiceUSGS Data Harvesting Service
EAP ILTEREAP ILTER9 July 20079 July 2007
CentralizedCentralized Architecture Architecture
Source data is loaded into a global schema in the Source data is loaded into a global schema in the relational database (RDBMS)relational database (RDBMS)– Calculates and loads aggregated data (monthly, annual)Calculates and loads aggregated data (monthly, annual)
The global schema for the data warehouse is based on The global schema for the data warehouse is based on highly normalized tables within the databasehighly normalized tables within the database– allows simple structures to house all site data and allows simple structures to house all site data and
metadatametadata– is extensible to additional daily measurementsis extensible to additional daily measurements
The central data warehouse is persistent and The central data warehouse is persistent and participants can continually update and replace participants can continually update and replace harvested data harvested data
Data Warehouse
Centralized ClimDB/HydroDB
DatabaseTransform,
QA, Load
Harv
est
er
EAP ILTEREAP ILTER9 July 20079 July 2007
Data Access PageData Access PagePublic Access Web PagePublic Access Web Page
http://www.fsl.orst.edu/climhy
EAP ILTEREAP ILTER9 July 20079 July 2007
Data AcquisitionData Acquisition
Download or Graphical Display
EAP ILTEREAP ILTER9 July 20079 July 2007
EAP ILTEREAP ILTER9 July 20079 July 2007
Metadata Metadata ReportsReports
Detail information for the general site, all stations, and all parameters.
Metadata descriptions can also be downloaded
as a PDF
EAP ILTEREAP ILTER9 July 20079 July 2007
Georgia Coastal Everglades (GCE)Georgia Coastal Everglades (GCE)Matlab Data ToolboxMatlab Data Toolbox
GUI dialog for retrieving ClimDB/HydroDB data
From Wade Sheldon (GCE)
EAP ILTEREAP ILTER9 July 20079 July 2007
Imported data set GCE tools editor window)
Imported data set (GCE data grid view, with flagged values displayed)
ClimDB/HydroDB Metadata template
EAP ILTEREAP ILTER9 July 20079 July 2007
Client (Harvester)
ClimDB Web Services
Data Service Metadata Service Notification Service
Climate
Data
Harvester sends XML request for data to Web Service
1.
One web Service queries an LTER Site database, another exports the data, and another issues an email to the LTER Site data manager detailing success of query
2.
LTER Site
ClimDB
Centralized ClimDB
Database (Andrews LTER)
3.
LTER Site climate data are returned to harvester in XML
5.
The centralized ClimDB database at Andrews LTER is populated
Diagram modified from Longjiang Ding, SDSC
XMLClimDB
Config File
EMLResource
Description
Wizard
SOAP, WSDL, UDDI
Web Services demonstration of ClimDBWeb Services demonstration of ClimDB
4.
Web Service wraps the entire centralized ClimDB database
ClimDB Web Services
Data Service
SOAP, WSDL, UDDI
EAP ILTEREAP ILTER9 July 20079 July 2007
Site contributions have increased dramatically in the past year for air temperature, precipitation, and stream discharge.
Site ContributionSite Contribution
Participation includes:• 40 total sites
• 24 LTER sites + 2 International LTER sites• 22 USFS sites • 11 sites include USGS gauging stations
• 281 total measurement stations• 143 meteorological, 138 stream gauging (59 USGS)
• 21 daily measurement parameters• 7,200,000 daily values
EAP ILTEREAP ILTER9 July 20079 July 2007
Data Data Warehouse Warehouse
ContentContent
ParameterParameter
(Daily values)(Daily values)
% by % by Measured Measured ParameterParameter
Stream DischargeStream Discharge 2929
PrecipitationPrecipitation 2626
Air TemperatureAir Temperature 2222
Relative HumidityRelative Humidity 44
Global RadiationGlobal Radiation 44
Soil TemperatureSoil Temperature 33
Resultant Wind SpeedResultant Wind Speed 33
Resultant Wind DirectionResultant Wind Direction 22
OtherOther 77
Observations:•Coverage of precipitation, discharge, and air temperature data is strong across sites.•We encourage sites to contribute relative humidity, soil temperature, wind speed & direction, and global radiation in datasets.
Primary emphasis
Secondary emphasis
EAP ILTEREAP ILTER9 July 20079 July 2007
ClimDB/HydroDB Web Access SummaryClimDB/HydroDB Web Access Summary
Values based on data from February 2003 - August 2006
Type of downloadType of download
67006700
DownloadsDownloads
12%12%50%50%38%38%TotalTotal
DisplaysDisplaysPlotsPlotsFilesFiles
Visitors to the ClimDB/HydroDB web interface are increasing and currently average 30 sessions per day.
EAP ILTEREAP ILTER9 July 20079 July 2007
Status of Type of UseStatus of Type of Use
Type of UseType of Use % of % of TotalTotal
ResearchResearch 40%40%(60% general research)(60% general research)
EducationEducation 35%35%(90% students)(90% students)
Testing/ExploringTesting/Exploring 25%25%(50% testing by participants)(50% testing by participants)
Values based on data plots from January - March 2004
EAP ILTEREAP ILTER9 July 20079 July 2007
Keys to Successful ImplementationKeys to Successful Implementation Scientific interestScientific interest
– Scientist/modeler demand for current and comparable data Scientist/modeler demand for current and comparable data – Need for synthetic data productsNeed for synthetic data products
Organizational Organizational – Commitment to building network databasesCommitment to building network databases– Information management (15% LTER site budget)Information management (15% LTER site budget)– Data access / release policiesData access / release policies– Data collection standardsData collection standards– Planning meetings included Climatologists, Information Managers, Data Planning meetings included Climatologists, Information Managers, Data
Users/Modelers, and Field Technician participationUsers/Modelers, and Field Technician participation IncentivesIncentives
– Financial incentives Financial incentives – Value-added products returned to participating sitesValue-added products returned to participating sites
Easy access, aggregated data, graphical displays, QA checks Easy access, aggregated data, graphical displays, QA checks Host site commitment Host site commitment
– Leadership, time, resourcesLeadership, time, resources
EAP ILTEREAP ILTER9 July 20079 July 2007
ConclusionsConclusions The ClimDB/HydroDB approach is an effective bridge The ClimDB/HydroDB approach is an effective bridge
technology between older, more rigid data distribution technology between older, more rigid data distribution models and modern service-oriented architectures models and modern service-oriented architectures
Establishes software and service development at the Establishes software and service development at the central node permitting rapid adaptation to changing central node permitting rapid adaptation to changing needsneeds
Maintains low-overhead, flexibility and technological Maintains low-overhead, flexibility and technological neutrality for data providersneutrality for data providers
Additional "concentrator nodes" and middleware services Additional "concentrator nodes" and middleware services can also be deployed very easily and rapidly within this can also be deployed very easily and rapidly within this model to improve efficiency and build bridges to other model to improve efficiency and build bridges to other federated databases federated databases
EAP ILTEREAP ILTER9 July 20079 July 2007
Funding was provided by • National Science Foundation (NSF)
•Long-Term Ecological Research (LTER) supplemental funding
•U. S. Forest Service Research and Development
• Forest Health Monitoring (FHM) program• Pacific Northwest Research Station (PNW)
…to the Andrews Forest LTER at Oregon State University forClimDB/HydroDB development
…to individual sites for the preparation of climate and hydrology data
Visit ClimDB/HydroDB at http://www.fsl.orst.edu/climhy
AcknowledgementAcknowledgement
EAP ILTEREAP ILTER9 July 20079 July 2007
User GuideUser GuideSection 1.3 Required Steps for Site ParticipationSection 1.3 Required Steps for Site Participation
To participating the site will:To participating the site will:
Provide the research areas, meteorological stations, Provide the research areas, meteorological stations, gauged watersheds, and gauging station names and gauged watersheds, and gauging station names and code namescode names
Restructure local site data into a standardized daily Restructure local site data into a standardized daily exchange format exchange format
Use the online metadata forms to provide metadata Use the online metadata forms to provide metadata for overall research area, for every weather station for overall research area, for every weather station and for every parameterand for every parameter
Harvest data Harvest data