sharing and publishing data using cuahsi his

59
Sharing and publishing data using CUAHSI HIS Outline • HIS data publication system • WaterML and WaterOneFlow web services • Observations data model (ODM) • Data loading • Data editing and quality control • Controlled vocabularies • HIS central registration and tagging

Upload: marisa

Post on 06-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Sharing and publishing data using CUAHSI HIS. Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging. Base Station Computer(s). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sharing and publishing data using CUAHSI HIS

Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system

• WaterML and WaterOneFlow web services

• Observations data model (ODM)

• Data loading

• Data editing and quality control

• Controlled vocabularies

• HIS central registration and tagging

Page 2: Sharing and publishing data using CUAHSI HIS

Base StationComputer(s)

Telemetry Network

Sensors

Query, Visualize, and Edit data using ODM Tools

Excel Text

ODMDatabase

ODM Data

Loader

Streaming Data

Loader

GetSitesGetSiteInfoGetVariableInfoGetValues

WaterOneFlowWeb Service

WaterML

DiscoveryHydroseek

AccessAnalysis

GISMatlabSplus

RIDL

JavaC++VB

Water Metadata Catalog

Harvester

Service Registry Hydrotagger

HIS Central

HydroExcelHydroGetHydroLink

HydroObjects

ODM

ODM

Contribute your ODM

HIS Data Publication System

Page 3: Sharing and publishing data using CUAHSI HIS

Steps in publishing data1. Establish an HIS Server

2. Load observations into an ODM database

3. Provide access to data through web services (http://<your-server>/<your-network>/cuahsi_1_0.asmx?WSDL)

4. Index the resulting water data service at HIS Central (http://hiscentral.cuahsi.org)

Page 4: Sharing and publishing data using CUAHSI HIS

Establishing an HIS Server• Windows server platform

• Base Software: Microsoft SQL and ArcGIS Server

• HIS Server applications

– WaterOneFlow web services

– ODM + tools

– DASH

• HIS Data

http://his.cuahsi.org/hisserver.html

Page 5: Sharing and publishing data using CUAHSI HIS

Load Observations into an ODM Database

Soil moisture

data

Streamflow

Flux towerdata

Groundwaterlevels

Water Quality

Precipitation& Climate ODM

Page 6: Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system

• WaterML and WaterOneFlow web services

• Observations data model (ODM)

• Data loading

• Data editing and quality control

• Controlled vocabularies

• HIS central registration and tagging

Page 7: Sharing and publishing data using CUAHSI HIS

WaterML and WaterOneFlow

Locations

Variables

Time

GetSiteInfoGetVariableInfoGetValues

WaterOneFlowWeb Service

Client

TCEQ

UTUSGS

DataRepositories

Data

DataData

EXTRACTTRANSFORMLOAD

WaterML

WaterML is an XML language for communicating water dataWaterOneFlow is a set of web services based on WaterML

Slide from David Valentine

Page 8: Sharing and publishing data using CUAHSI HIS

Web ServicesLibrary

Web Application: Data Portal

Your application• Excel, ArcGIS, Matlab• Fortran, C/C++, Visual Basic• Hydrologic model• …………….

Your operating system• Windows, Unix, Linux, Mac

Internet Simple Object Access Protocol

WaterOneFlow Web Services

Slide from David Valentine

Page 9: Sharing and publishing data using CUAHSI HIS

WaterOneFlow• Set of query functions • Returns data in WaterML

NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, USGS SNOTEL, ODM (multiple sites)

Slide from David Valentine

Page 10: Sharing and publishing data using CUAHSI HIS

WaterML design principles• Goal - capture semantics of hydrologic observations

discovery and retrieval• Role - exchange schema for CUAHSI web services• Driven by

– Hydrologists (community review)– ODM– USGS NWIS, EPA STORET, Academic Sources

• Conformance with Open Geospatial Consortium standards. http://www.opengeospatial.org/

• For XSD pros, the WaterML schema is athttp://his.cuahsi.org/wofws.html

Slide from David Valentine

Page 11: Sharing and publishing data using CUAHSI HIS

Data Source

Network

Sites

Variables

Values

{Value, Time, Qualifier, Offset}

Utah State University

Little Bear River

Little Bear River at Mendon Rd

Dissolved Oxygen

9.78 mg/L, 1 October 2007, 6PM

• A data source operates and provides data to an observation network• A network is a set of observation sites (stored in a single ODM instance)• A site is a point location where one or more variables are measured• A variable is a measured property (e.g. describing the flow or quality of water)• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value• An offset allows specification of measurements at various depths in water

GetSites

GetSiteInfo

GetVariableInfo

GetValues

Point Observations Information Model

Page 12: Sharing and publishing data using CUAHSI HIS

- Sites

- Variables

- TimeSeries

Building Blocks of WaterML Responses

• Response Types • Key Elements

– site– sourceInfo– seriesCatalog– variable– value– queryInfo

GetValues

GetVariableInfo

GetSiteInfoGetSites

Slide from David Valentine

Page 13: Sharing and publishing data using CUAHSI HIS

Sites responsequeryInfo

site

name

code

location

seriesCatalog

variables

Series how many

when

TimePeriodType

Slide from David Valentine

Page 14: Sharing and publishing data using CUAHSI HIS

VariablesResponseType

• variable – same as in series element

• Code, name, units Sites

Variables

Values

Slide from David Valentine

Page 15: Sharing and publishing data using CUAHSI HIS

GetValues response - timeSeries

• queryInfo

• timeSeries– sourceInfo – “where”– variable – “what”– values

Sites

Variables

Values

Slide from David Valentine

Page 16: Sharing and publishing data using CUAHSI HIS

Values

• Each time series value recorded in value element

• Timestamp, plus metadata for the value, recorded in element’s attributes

ISO Time

valuequalifier

Slide from David Valentine

Page 17: Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system

• WaterML and WaterOneFlow web services

• Observations data model (ODM)

• Data loading

• Data editing and quality control

• Controlled vocabularies

• HIS central registration and tagging

Page 18: Sharing and publishing data using CUAHSI HIS

Why an Observations Data Model

• Syntactic heterogeneity (File types and formats)• Semantic heterogeneity

– Language for observation attributes (structural)– Language to encode observation attribute values

(contextual)

• Publishing and sharing research data • Metadata to facilitate unambiguous

interpretation• Enhance analysis capability

Page 19: Sharing and publishing data using CUAHSI HIS

Scope• Focus on Hydrologic Observations made at a

point• Exclude Remote sensing or grid data. These

are part of a digital watershed but not suitable for an atomic database model and individual value queries

• Primarily store raw observations and simple derived information to get data into its most usable form.

• Limit inclusion of extensively synthesized information and model outputs at this stage.

Page 20: Sharing and publishing data using CUAHSI HIS

What are the basic attributes to be associated with each single data value and

how can these best be organized?

Value

DateTime

Variable

Location

Units

Interval (support)

Accuracy

Offset

OffsetType/ Reference Point

Source/Organization

Censoring

Data Qualifying Comments

Method

Quality Control Level

Sample Medium

Value Type

Data Type

Page 21: Sharing and publishing data using CUAHSI HIS

CUAHSI Observations Data ModelStreamflow

Flux towerdata

Precipitation& Climate

Groundwaterlevels

Water Quality

Soil moisture

data

• A relational database at the single observation level (atomic model)

• Stores observation data made at points

• Metadata for unambiguous interpretation

• Traceable heritage from raw measurements to usable information

• Standard format for data sharing

• Cross dimension retrieval and analysis

Space, S

Time, T

Variables, V

s

t

Vi

vi (s,t)“Where”

“What”

“When”

A data value

Page 22: Sharing and publishing data using CUAHSI HIS

CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html

Page 23: Sharing and publishing data using CUAHSI HIS

Site Attributes

SiteCode, e.g. NWIS:10109000SiteName, e.g. Logan River Near Logan, UTLatitude, Longitude Geographic coordinates of siteLatLongDatum Spatial reference system of latitude and longitudeElevation_m Elevation of the siteVerticalDatum Datum of the site elevationLocal X, Local Y Local coordinates of siteLocalProjection Spatial reference system of local coordinatesPosAccuracy_m Positional AccuracyState, e.g. UtahCounty, e.g. Cache

Page 24: Sharing and publishing data using CUAHSI HIS

Feature

Waterbody

HydroIDHydroCodeFTypeNameAreaSqKmJunctionID

HydroPoint

HydroIDHydroCodeFTypeNameJunctionID

Watershed

HydroIDHydroCodeDrainIDAreaSqKmJunctionIDNextDownID

ComplexEdgeFeature

EdgeType

Flowline

Shoreline

HydroEdge

HydroIDHydroCodeReachCodeNameLengthKmLengthDownFlowDirFTypeEdgeTypeEnabled

SimpleJunctionFeature

1HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

*

1

*

HydroNetwork

*

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

1

1

CouplingTable

SiteIDHydroID

Sites

SiteIDSiteCode

SiteNameLatitudeLongitude…

Observations Data Model

1

1

OR

Independent of, but can be coupled to Geographic Representation

ODM Arc Hydro

Page 25: Sharing and publishing data using CUAHSI HIS

Variable attributes

VariableName, e.g. dischargeVariableCode, e.g. NWIS:0060SampleMedium, e.g. waterValueType, e.g. field observation, laboratory sampleIsRegular, e.g. Yes for regular or No for intermittentTimeSupport (averaging interval for observation)DataType, e.g. Continuous, Instantaneous, CategoricalGeneralCategory, e.g. Climate, Water QualityNoDataValue, e.g. -9999

m3/sFlowCubic meters per second

Page 26: Sharing and publishing data using CUAHSI HIS

Scale issues in the interpretation of data

The scale triplet

From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.

a) Extent b) Spacing c) Support

length or time

quan

tity

length or time

quan

tity

length or time

quan

tity

Page 27: Sharing and publishing data using CUAHSI HIS

From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.

The effect of sampling for measurement scales not commensurate with the process scale

-1.5

-1

-0.5

0

0.5

1

1.5-1.25

-0.75

-0.25

0.25

0.75

1.25

(b) extent too small – trend

(c) support too large – smoothing out

-1.25

-0.75

-0.25

0.25

0.75

1.25 (a) spacing too large – noise (aliasing)

Page 28: Sharing and publishing data using CUAHSI HIS

Discharge, Stage, Concentration and Daily Average Example

Page 29: Sharing and publishing data using CUAHSI HIS

Data Types• Continuous (Frequent sampling - fine spacing)• Sporadic (Spot sampling - coarse spacing)• Cumulative• Incremental• Average• Maximum• Minimum• Constant over Interval• Categorical

t

0

d)(Q)t(V

t

tt

d)(Q)t(V

t

tVtQ

)(

)(

Page 30: Sharing and publishing data using CUAHSI HIS

Incomplete or Inexact daily total occurring. Value is not a true 24-hour amount. One or

more periods are missing and/or an accumulated amount has begun but not ended

during the daily period.

15 min Precipitation from NCDC

Page 31: Sharing and publishing data using CUAHSI HIS

Irregularly sampled groundwater level

Page 32: Sharing and publishing data using CUAHSI HIS

Offset

OffsetValue

Distance from a datum or control point at which an observation was made

OffsetType defines the type of offset, e.g. distance below water level, distance above ground surface, or distance from bank of river

Page 33: Sharing and publishing data using CUAHSI HIS

Water Chemistry from a profile in a lake

Page 34: Sharing and publishing data using CUAHSI HIS

Groups and Derived From Associations

Page 35: Sharing and publishing data using CUAHSI HIS

Stage and Streamflow Example

Page 36: Sharing and publishing data using CUAHSI HIS

Daily Average Discharge ExampleDaily Average Discharge Derived from 15 Minute Discharge Data

Page 37: Sharing and publishing data using CUAHSI HIS

Methods and Samples

Method specifies the method whereby an observation is measured, e.g. Streamflow using a V notch weir, TDS using a Hydrolab, sample collected in auto-sampler

SampleID is used for observations based on the laboratory analysis of a physical sample and identifies the sample from which the observation was derived. This keys to a unique LabSampleID (e.g. bottle number) and name and description of the analytical method used by a processing lab.

Page 38: Sharing and publishing data using CUAHSI HIS

Water Chemistry from Laboratory Sample

Page 39: Sharing and publishing data using CUAHSI HIS

ValueAccuracy

A numeric value that quantifies measurement accuracy defined as the nearness of a measurement to the standard or true value. This may be quantified as an average or root mean square error relative to the true value. Since the true value is not known this may should be estimated based on knowledge of the method and measurement instrument. Accuracy is distinct from precision which quantifies reproducibility, but does not refer to the standard or true value.

Accurate Low Accuracy, but precise

Low Accuracy

ValueAccuracy

Page 40: Sharing and publishing data using CUAHSI HIS

Data Quality

Qualifier Code and Description provides qualifying information about the observations, e.g. Estimated, Provisional, Derived, Holding time for analysis exceeded

QualityControlLevel records the level of quality control that the data has been subjected to.- Level 0. Raw Data - Level 1. Quality Controlled Data - Level 2. Derived Products - Level 3. Interpreted Products - Level 4. Knowledge Products

Page 41: Sharing and publishing data using CUAHSI HIS

Series of Observations

A “Data Series” is a set of all the observations of a particular variable at a site.

The SeriesCatalog is programmatically generated to provide users with the ability to do data discovery (i.e. what data is available and where) without formulating complex queries or hitting the DataValues table which can get very large.

Page 42: Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system

• WaterML and WaterOneFlow web services

• Observations data model (ODM)

• Data loading

• Data editing and quality control

• Controlled vocabularies

• HIS central registration and tagging

Page 43: Sharing and publishing data using CUAHSI HIS

Loading data into ODM

• Interactive OD Data Loader (OD Loader)– Loads data from spreadsheets and

comma separated tables in simple format

• Scheduled Data Loader (SDL)– Loads data from datalogger files on a

prescribed schedule.– Interactive configuration

• SQL Server Integration Services (SSIS)– Microsoft application accompanying

SQL Server useful for programming complex loading or data management functions

OD Data Loader

SDL

SSIS

Page 44: Sharing and publishing data using CUAHSI HIS

ObservationsDatabase

(ODM)

Base StationComputer

ODM StreamingData Loader

Inte

rnet

Sensor Network

Remote Monitoring Sites

Data discovery, visualization, and analysis through Internet

enabled applications

Inte

rnet

Radio Repeaters

ApplicationsCentral Observations

Database

From Jeff Horsburgh

Page 45: Sharing and publishing data using CUAHSI HIS

ODM

Streaming Data Text

Files

Base StationComputer(s)

ODM SDL manages the periodic insertion of the streaming data into the ODM database using the mappings stored in the XML configuration file.

ODM SDL Import Application

XML Config

File

ODM SDL Mapping Wizard

• Automate the data loading process via scheduled updates

• Map datalogger files to the ODM schema and controlled vocabularies

ODM Streaming Data LoaderLoading theLittle Bear

Sensor DataInto ODM

From Jeff Horsburgh

Page 46: Sharing and publishing data using CUAHSI HIS

CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html

123

Work from Out to In

4

56

7

At last …

And don’t

forget …

Page 47: Sharing and publishing data using CUAHSI HIS

Managing Data Within ODM - ODM Tools

• Query and export – export data series and metadata

• Visualize – plot and summarize data series

• Edit – delete, modify, adjust, interpolate, average, etc.

Page 48: Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system

• WaterML and WaterOneFlow web services

• Observations data model (ODM)

• Data loading

• Data editing and quality control

• Controlled vocabularies

• HIS central registration and tagging

Page 49: Sharing and publishing data using CUAHSI HIS

Syntactic Heterogeneity

ODM ObservationsDatabase

ODM ObservationsDatabase

ExcelFiles

ExcelFiles

AccessFiles

AccessFiles

TextFiles

TextFiles

Data LoggerFiles

Data LoggerFiles

Multiple Data SourcesWith Multiple Formats

From Jeff Horsburgh

Page 50: Sharing and publishing data using CUAHSI HIS

Semantic HeterogeneityGeneral Description of Attribute USGS NWISa EPA STORETb

Structural Heterogeneity

Code for location at which data are collected "site_no" "Station ID"

Name of location at which data are collected "Site" OR "Gage" "Station Name"

Code for measured variable "Parameter" ?c

Name of measured variable "Description" "Characteristic Name"

Time at which the observation was made "datetime" "Activity Start"

Code that identifies the agency that collected the data "agency_cd" "Org ID"

Contextual Semantic Heterogeneity

Name of measured variable "Discharge" "Flow"

Units of measured variable "cubic feet per second" "cfs"

Time at which the observation was made "2008-01-01" "2006-04-04 00:00:00"

Latitude of location at which data are collected "41°44'36" "41.7188889"

Type of monitoring site "Spring, Estuary, Lake, Surface Water" "River/Stream"a United States Geological Survey National Water Information System (http://waterdata.usgs.gov/nwis/).b United States Environmental Protection Agency Storage and Retrieval System (http://www.epa.gov/storet/).c An equivalent to the USGS parameter code does not exist in data retrieved from EPA STORET.

From Jeff Horsburgh

Page 51: Sharing and publishing data using CUAHSI HIS

Overcoming Semantic Heterogeneity

• ODM Controlled Vocabulary System– ODM CV central database– Online submission and editing

of CV terms– Web services for

broadcasting CVs

Variable NameInvestigator 1: “Temperature, water”

Investigator 2: “Water Temperature”

Investigator 3: “Temperature”

Investigator 4: “Temp.”

ODM VariableNameCV

Term…

Sunshine duration

Temperature

Turbidity

From Jeff Horsburgh

Page 52: Sharing and publishing data using CUAHSI HIS

Dynamic controlled vocabulary moderation system

Local ODMDatabase

Master ODM Controlled Vocabulary

ODM Website

ODM ControlledVocabulary Moderator

ODM Data Manager

ODMControlled Vocabulary

Web Services

ODM Tools

Local Server

XMLXML

http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh

Page 53: Sharing and publishing data using CUAHSI HIS

Outline

• HIS data publication system• WaterML and WaterOneFlow web services• Observations data model (ODM)• Data loading• Data editing and quality control• Controlled vocabularies• HIS central registration and tagging

Page 54: Sharing and publishing data using CUAHSI HIS

Registering Web Services with HIS Central

• Listing of all public data services

• Enables applications like Hydroseek to discover data

Page 55: Sharing and publishing data using CUAHSI HIS

Tagging Variables for Data Discovery Through a Metadata Catalog

Ontology: A hierarchy of concepts

Each Variable in your data is connected to a corresponding Concept

From Michael Piasecki

Page 56: Sharing and publishing data using CUAHSI HIS

Department of Civil, Architectural & Environmental Engineering04/20/23 Department of Civil, Architectural & Environmental Engineering 56

Tagging variables in Ontology

WATERS Network Information System

Steps1. The WSDL for a set of ODM

web services is registered in the WSDL Registry

2. The “harvester” jumps into action and trawls through the web services at the WSDL to find and identify new variables

3. It returns i) data updating information and ii) variable names used and compares these to those used by HydroSeek.

From Michael Piasecki

Page 57: Sharing and publishing data using CUAHSI HIS

Department of Civil, Architectural & Environmental Engineering04/20/23 Department of Civil, Architectural & Environmental Engineering 57

Mapping onto Ontology

Steps contd.4. New variables are manually

mapped onto appropriate ontology concept.

5. HydroSeek catalogue is updated.

Test-Bed VarName Siteexist? VarName? content ActionCCBay DOConcSuf Y Y new data update Cat (Time)CCBay DOConcBot Y N new variable place in TaggerBin => DOCCBay DOConcMid N Y new data upudate Cat (Site+Time)

SRBHOS DO_Water Y Y new data update Cat (Time)

Minnehaha TempSurf Y N new variable place in TaggerBin => TempMInnehaha StreamDOCon Y N new variable place in TaggerBin => DO

SantaFe WaterDOCon Y N new variable place in TaggerBin => DOSantaFe GoldConc Y N new var/no conc place in TaggerBin => ??

From Michael Piasecki

Page 58: Sharing and publishing data using CUAHSI HIS

Hydroseekhttp://www.hydroseek.org

Supports search by location and type of data across multiple observation networks including NWIS, Storet, and university data

Page 59: Sharing and publishing data using CUAHSI HIS

Summary• Generic method for publishing observational data

– Supports many types of point observational data– Overcomes syntactic and semantic heterogeneity using a

standard data model and controlled vocabularies– Supports a national network of observatory test beds but can

grow!

• Web services provide programmatic machine access to data– Work with the data in your data analysis software of choice

• Internet-based applications provide user interfaces for the data and geographic context for monitoring sites