development of a machine learning swe estimation model for ... · 10 anns in swe retrievals tong et...
TRANSCRIPT
Development of a Machine Learning SWE Estimation Model for BC: Early Results
Drew Snauffer, William Hsieh, Alex Cannon
2
Outline● Gridded product evaluation
● across BC● by physiographic regions● by survey
● Machine learning approach: ANN● testing and validation● bagging● predictors
● Results● overview of runs● example traces: Kwadacha River
3
SWE Gridded Products
● ERA-Interim: ECMWF's global atmospheric reanalysis 1979-present
● ERA-Interim/Land: HTESSEL LSM + precip adjustments GPCP v2.1
● MERRA: NASA's global atmospheric reanalysis 1979-present
● MERRA-Land: offline replay of MERRA's land model + merge of gauge-based data & updated Catchment LSM
● GLDAS-1: assimilation of satellite & ground based obs using Noah LSM; mix of meteorological forcings
● GLDAS-2: bias corrected Princeton meteorological forcings (1948-2010)
● CMC: snow depths from 3 observational sets + densities from snow climate classes
● GlobSnow: forward model of satellite microwave TBs + synoptic
station obs
4
5
Stats
● Pearson Correlation
● Mean Absolute Error
● Bias
● Relative Bias
6
Gridded product evaluation
● All evaluated products:● high MAE (300-400mm)● strong negative bias (200-
300mm)● ERALand had best MAE & bias,
followed by MERRA & GLDAS2
● Correlations:● ERALand had highest, followed
by MERRA, CMC and GLDAS-2
● Globsnow and ERA-Interim had lowest
7
BC's Physiographic Regions:Coast Mountains and Islands (CMI)Interior Plateau (IP)Northern and Central Plateaus and Mountains (NPM)Great Plains (GP)Columbia Mountains and Southern Rockies (CR)
8
BC's Physiographic Regions:Coast Mountains and Islands (CMI)Interior Plateau (IP)Northern and Central Plateaus and Mountains (NPM)Great Plains (GP)Columbia Mountains and Southern Rockies (CR)
9
10
ANNs in SWE retrievals
● Tong et al., 2010 built ANNs to retrieve SWE in the Quesnel River Basin
● AMSR-E passive microwave + in situ snow pillow data
● Station were close to but not necessarily at average grid cell elevation
● Filtered high SWE values
● Land cover not included in model
● ANNs using all channels outperformed those using only 37-19GHz (a & b)
● ANNs trained on data from 2 stations performed poorly at a 3rd nearby station because of different land surface characteristics (c)
a)
b)
c)
r=0.95
r=-0.44
r=0.81
11
Artificial Neural Network (ANN) setup● R package “monmlp”: builds 1 or 2 hidden layer monotonic
(or not) ANN
● inputs:● SWE gridded products● covariates: date (survey #), elevation, lat, lon, station -
mean grid cell elevation difference
● target: manual snow survey measurements
● output: predicted SWE
● testing: withhold x% of stations from ANN training, predict & compare
● 256 “long record” stations / 8 test splits = 32 stations/split
● validation: during training, withhold y% of data for evaluating model complexity
12
13
Model tuning: Bagging● Bootstrap aggregating “bagging”
● training dataset created by extracting records uniformly and with replacement
● size of record extraction is determined by block length● out of bootstrap samples are used to find validation
error● block length of 1 (an individual observation)
● very complex networks and high test set errors● likely due to temporal autocorrelation
● blocking by station and season didn't improve much
● blocking by station significantly mitigates temporal autocorrelation
14
Gridded products as predictors
● Need to have overlap in availability of dates (+/- 1 day) except for monthly products
● Another possible approach: use mean and std dev of an ensemble products as predictors
Product num valid points cum overlapERAInterim 34678 34678MERRA 34678 34678MERRALand 33488 33488GLDAS2 33185 31949ERALand 33182 31949GlobSnow 29252 28438GLDAS1 10412 8371CMC 3197 2241
15
Results
MERRA GlobSnow CMC
443.86 334.59 351.51 401.97 326.50 345.58 439.65 313.15 311.13 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 228.79 ✔ 212.35 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 205.58 ✔ ✔ ✔ ✔ ✔ ✔ ✔ 204.71
✔ ✔ 203.60 ✔ ✔ ✔ ✔ ✔ ✔ 201.28
✔ ✔ ✔ ✔ 200.56 ✔ ✔ ✔ ✔ ✔ 199.91 ✔ ✔ ✔ ✔ ✔ ✔ 199.02
✔ ✔ ✔ 196.14
ERA-Interim
ERA-Interim Land
MERRA-Land
GLDAS-1
GLDAS-2
Productsaveraged
ANN RMSE
16
17
Thank you! Questions?Drew Snauffer - [email protected]
Summary● early runs of a simple ANN show promise for improving
representation of SWE in BC● ANNs perform better using predictors individually rather than
averaging● best ANN appears to use neither 1 product nor all products
Future work● weight different gridded products (best linear unbiased
estimator "BLUE")● addition of data sources (PM, SCA) and covariates (slope,
aspect, ground cover)● incorporation of snow models● break out ML models by region● other ML techniques (support vector regression, random
forests, etc)