hic06 spatial interpolation

Numerically Optimized Empirical Modeling Numerically Optimized Empirical Modeling of Highly Dynamic, Spatially Expansive, of Highly Dynamic, Spatially Expansive,

and Behaviorally Heterogeneous and Behaviorally Heterogeneous Hydrologic Systems – Part 2Hydrologic Systems – Part 2

Jana Stewart, U.S. Geological Survey, Middleton, WIMatthew Mitro, Wisconsin DNR, Madison, WI

Ed Roehl, Advanced Data Mining, LLC, Greer, SC John Risley, U.S. Geological Survey, Portland, OR

Part 1Part 1

International EnvironmentalInternational EnvironmentalModelling and Software Modelling and Software

Society 2006, Burlington VTSociety 2006, Burlington VT

Well Locations

(100x100 miles)

16-year hydrographs Upper Floridan Aquifer, Upper Floridan Aquifer, Suwannee River Valley, FloridaSuwannee River Valley, Florida

• Research – MLP ANNs to spatially interpolate• Highly spatially discontinuous

– MLP ANNs – continuous functions– Optimally segment well behaviors?

• High temporal variability

Western Oregon Stream Temperature ModelingWestern Oregon Stream Temperature Modeling

• Thermal TMDL• Modeled Output - ST

hourly time series Jun-Oct 1999 at 146 “pristine” sites

• Potential Inputs– STATIC - 34 variables,

including stream shading and basin forestation

– CLIMATE TIME SERIES - 65 hourly air temperature, dew-point, solar radiation, barometric pressure, snowpack, and precipitation from 25 locations.

ST sitesClimatic sites

Coa

st R

ange

Eco

regi

on

Cas

cade

s Ec

oreg

ion

Klamath MountainsEcoregion

Willamette ValleyEco-region

Portland

Corvallis

Eugene

Ashland

Pac

ific

Oce

an

ObjectivesObjectives

• Model Highly Dynamic, Spatially Expansive, and Behaviorally Heterogeneous Hydrologic Systems

• Divide and conquer – big problem transformed into multiple small problems

• Use a sequence of numerically optimized algorithms– minimize subjectivity

Steps (divide and conquer)Steps (divide and conquer)1. SEGMENT DATA - into behavioral classes

• Cluster time series - k-means, SOM– Intermediate cross correlation matrix

• Bonus – identifies redundant/unique sites for network optimization

1. MODEL EACH BEHAVIORAL CLASS separately• Process signals to separate low and high frequency

components• “Stacked” data set for training• Decorrelate input variables as needed• ANNs – multivariate, non-linear curve fitting• Sub-models of low and high frequency components,

combine predictions = “super model”• Sensitivity analysis determines which static and time

series variables are predictive

Steps – cont.Steps – cont.3. BUILD CLASSIFIER – to link static site

characteristics to dynamic behaviors (classes)• static inputs ⇒ mapping function ⇒ class id

• krigging in Floridan Aquifer (x,y,class id)

• classification model– Nearest neighbor classifier (linear)– ANN-classifier (non-linear)

4. RUN MODELi. Input new site vector of static inputs

ii. Run classifier to select behavioral model

iii. Run behavioral model

iv. Write output

Clustering Results – Floridan AquiferClustering Results – Floridan Aquifer

12 classes – probably more 12 classes – probably more than necessarythan necessary

indicates well redundancyindicates well redundancy

Accuracy by ClusterAccuracy by Cluster

Actual Prediction

C1C1

History from Apr 1982 to Oct 1998

No

rmal

ized

Wat

er L

evel

ab

ove

Sea

Lev

el

C3C3

C6C6

C10C10

Super Model PredictionSuper Model Prediction

Gulf of Mexico

Max elevation above sea level ~ 180 feet

Suwannee

River

⇑⇑ run time run time application application displaydisplay

Western Oregon – 1 of 6 validation Western Oregon – 1 of 6 validation sites not used for trainingsites not used for training

21

20

19

18

17

16

15

14

13

12

11

25 30 5 10 15 20 25 31 5 10 15 20 25 31 5 10 15 JUNE JULY AUGUST SEPTEMBER

Western Oregon – another validation siteWestern Oregon – another validation site

• Good dynamics

• Static inputs primary source of error

14

13

12

11

10

9

8

7

6

525 30 5 10 15 20 25 31 5 10 15 20 25 31 5 10 15

JUNE JULY AUGUST SEPTEMBER

Part 2Part 2HIC06HIC06

Wisconsin Temperature ModelingWisconsin Temperature Modeling• Fisheries management• Modeled Output

– 254 ST daily time series measured Jun-Aug, 1990-2002

• temporally discontinuous – different sites measured different years

• Potential Inputs– STATIC - 42 variables including

land cover, drainage area, and streambed characteristics

– CLIMATE TIME SERIES- 353 daily air temperature, dew-point, solar radiation, barometric pressure and precipitation from 25 locations.

Asynchronous Site MonitoringAsynchronous Site Monitoring• Modified time series clustering method• Steps

a) Compile populations having overlapping signals• 1998 to 2002 made up 241 of the 254 sites

a) Estimate # classes per population, then choose same k for all populations. k=3 for Wisconsin model

b) Apply the standard time series clustering algorithm to each population using k

c) Perform sensitivity analyses with prototype ANN classification models - determine best static variables

d) Determine overall best static variables e) Cluster all sites using best static variablesf) ANN dynamic models of each behavioral class as before.g) ANN classification models as before for “new sites”

Best Static VariablesBest Static Variables Top variables

Variable description 6 10 14 Land cover–agriculture (W) * * * Area–drainage area (W) * * * Land cover–forest (W) * * * Bedrock depth–depth to bedrock (0?50 feet) (W) * * * Surficial deposit texture–medium (W) * * * Stream network–downstream link (S) * * * Stream network–gradient (S) * * Land cover–wetland (W) * * Darcy value–darcy (W) * * Bedrock depth–depth to bedrock (51? 100 feet) (W) * * Land cover–urban (W) * Surficial deposit texture–fine (W) * Bedrock type–sandstone (W) * Bedrock depth–depth to bedrock (101?200 feet) (W) *

Measured & Predicted Class 1 Stream TempsMeasured & Predicted Class 1 Stream Temps

• 14 “test” sites not used to train ANNs– concatenated– June – August

• R2=0.66 • Dynamically good• Offsets (high or low) from static variables

measured predicted

ConclusionsConclusions• Numerical methods

1. Signal processing, e.g., spectral filtering

2. Clustering, e.g., k-means

3. ANN non-linear, dynamic sub-models of behavioral components assembled into super-model

4. Classification, e.g., ANN non-linear classifier

• Approach uses all available static and time series data

• Divide and conquer makes big problems tractable• Near optimal results – limited by data quality• Compact finished model

Florida Everglades Water LevelsFlorida Everglades Water Levels

• In progress• Water management• Modeled Output - 260

real-time WL gages• Potential Inputs

– STATIC - 6 variables – x,y + 4 vegetation

– WL TIME SERIES - 260 real-time WL gages

• autoregressive

hic06 spatial interpolation

Environment

time series variables

build classifier

validation sites

floridan aquifer x

static site characteristics

hourly air temperature

conquersegment data

redundantunique sites