hic06 spatial interpolation
TRANSCRIPT
Numerically Optimized Empirical Modeling Numerically Optimized Empirical Modeling of Highly Dynamic, Spatially Expansive, of Highly Dynamic, Spatially Expansive,
and Behaviorally Heterogeneous and Behaviorally Heterogeneous Hydrologic Systems – Part 2Hydrologic Systems – Part 2
Jana Stewart, U.S. Geological Survey, Middleton, WIMatthew Mitro, Wisconsin DNR, Madison, WI
Ed Roehl, Advanced Data Mining, LLC, Greer, SC John Risley, U.S. Geological Survey, Portland, OR
Part 1Part 1
International EnvironmentalInternational EnvironmentalModelling and Software Modelling and Software
Society 2006, Burlington VTSociety 2006, Burlington VT
Well Locations
(100x100 miles)
16-year hydrographs Upper Floridan Aquifer, Upper Floridan Aquifer, Suwannee River Valley, FloridaSuwannee River Valley, Florida
• Research – MLP ANNs to spatially interpolate• Highly spatially discontinuous
– MLP ANNs – continuous functions– Optimally segment well behaviors?
• High temporal variability
Western Oregon Stream Temperature ModelingWestern Oregon Stream Temperature Modeling
• Thermal TMDL• Modeled Output - ST
hourly time series Jun-Oct 1999 at 146 “pristine” sites
• Potential Inputs– STATIC - 34 variables,
including stream shading and basin forestation
– CLIMATE TIME SERIES - 65 hourly air temperature, dew-point, solar radiation, barometric pressure, snowpack, and precipitation from 25 locations.
ST sitesClimatic sites
Coa
st R
ange
Eco
regi
on
Cas
cade
s Ec
oreg
ion
Klamath MountainsEcoregion
Willamette ValleyEco-region
Portland
Corvallis
Eugene
Ashland
Pac
ific
Oce
an
ObjectivesObjectives
• Model Highly Dynamic, Spatially Expansive, and Behaviorally Heterogeneous Hydrologic Systems
• Divide and conquer – big problem transformed into multiple small problems
• Use a sequence of numerically optimized algorithms– minimize subjectivity
Steps (divide and conquer)Steps (divide and conquer)1. SEGMENT DATA - into behavioral classes
• Cluster time series - k-means, SOM– Intermediate cross correlation matrix
• Bonus – identifies redundant/unique sites for network optimization
1. MODEL EACH BEHAVIORAL CLASS separately• Process signals to separate low and high frequency
components• “Stacked” data set for training• Decorrelate input variables as needed• ANNs – multivariate, non-linear curve fitting• Sub-models of low and high frequency components,
combine predictions = “super model”• Sensitivity analysis determines which static and time
series variables are predictive
Steps – cont.Steps – cont.3. BUILD CLASSIFIER – to link static site
characteristics to dynamic behaviors (classes)• static inputs ⇒ mapping function ⇒ class id
• krigging in Floridan Aquifer (x,y,class id)
• classification model– Nearest neighbor classifier (linear)– ANN-classifier (non-linear)
4. RUN MODELi. Input new site vector of static inputs
ii. Run classifier to select behavioral model
iii. Run behavioral model
iv. Write output
Clustering Results – Floridan AquiferClustering Results – Floridan Aquifer
12 classes – probably more 12 classes – probably more than necessarythan necessary
indicates well redundancyindicates well redundancy
Accuracy by ClusterAccuracy by Cluster
Actual Prediction
C1C1
History from Apr 1982 to Oct 1998
No
rmal
ized
Wat
er L
evel
ab
ove
Sea
Lev
el
C3C3
C6C6
C10C10
Super Model PredictionSuper Model Prediction
Gulf of Mexico
Max elevation above sea level ~ 180 feet
Suwannee
River
⇑⇑ run time run time application application displaydisplay
Western Oregon – 1 of 6 validation Western Oregon – 1 of 6 validation sites not used for trainingsites not used for training
21
20
19
18
17
16
15
14
13
12
11
25 30 5 10 15 20 25 31 5 10 15 20 25 31 5 10 15 JUNE JULY AUGUST SEPTEMBER
Western Oregon – another validation siteWestern Oregon – another validation site
• Good dynamics
• Static inputs primary source of error
14
13
12
11
10
9
8
7
6
525 30 5 10 15 20 25 31 5 10 15 20 25 31 5 10 15
JUNE JULY AUGUST SEPTEMBER
Wisconsin Temperature ModelingWisconsin Temperature Modeling• Fisheries management• Modeled Output
– 254 ST daily time series measured Jun-Aug, 1990-2002
• temporally discontinuous – different sites measured different years
• Potential Inputs– STATIC - 42 variables including
land cover, drainage area, and streambed characteristics
– CLIMATE TIME SERIES- 353 daily air temperature, dew-point, solar radiation, barometric pressure and precipitation from 25 locations.
Asynchronous Site MonitoringAsynchronous Site Monitoring• Modified time series clustering method• Steps
a) Compile populations having overlapping signals• 1998 to 2002 made up 241 of the 254 sites
a) Estimate # classes per population, then choose same k for all populations. k=3 for Wisconsin model
b) Apply the standard time series clustering algorithm to each population using k
c) Perform sensitivity analyses with prototype ANN classification models - determine best static variables
d) Determine overall best static variables e) Cluster all sites using best static variablesf) ANN dynamic models of each behavioral class as before.g) ANN classification models as before for “new sites”
Best Static VariablesBest Static Variables Top variables
Variable description 6 10 14 Land cover–agriculture (W) * * * Area–drainage area (W) * * * Land cover–forest (W) * * * Bedrock depth–depth to bedrock (0?50 feet) (W) * * * Surficial deposit texture–medium (W) * * * Stream network–downstream link (S) * * * Stream network–gradient (S) * * Land cover–wetland (W) * * Darcy value–darcy (W) * * Bedrock depth–depth to bedrock (51? 100 feet) (W) * * Land cover–urban (W) * Surficial deposit texture–fine (W) * Bedrock type–sandstone (W) * Bedrock depth–depth to bedrock (101?200 feet) (W) *
Measured & Predicted Class 1 Stream TempsMeasured & Predicted Class 1 Stream Temps
• 14 “test” sites not used to train ANNs– concatenated– June – August
• R2=0.66 • Dynamically good• Offsets (high or low) from static variables
measured predicted
ConclusionsConclusions• Numerical methods
1. Signal processing, e.g., spectral filtering
2. Clustering, e.g., k-means
3. ANN non-linear, dynamic sub-models of behavioral components assembled into super-model
4. Classification, e.g., ANN non-linear classifier
• Approach uses all available static and time series data
• Divide and conquer makes big problems tractable• Near optimal results – limited by data quality• Compact finished model