development of a machine learning swe estimation model for ... · 10 anns in swe retrievals tong et...

Development of a Machine Learning SWE Estimation Model for BC: Early Results

Drew Snauffer, William Hsieh, Alex Cannon

2

Outline● Gridded product evaluation

● across BC● by physiographic regions● by survey

● Machine learning approach: ANN● testing and validation● bagging● predictors

● Results● overview of runs● example traces: Kwadacha River

3

SWE Gridded Products

● ERA-Interim: ECMWF's global atmospheric reanalysis 1979-present

● ERA-Interim/Land: HTESSEL LSM + precip adjustments GPCP v2.1

● MERRA: NASA's global atmospheric reanalysis 1979-present

● MERRA-Land: offline replay of MERRA's land model + merge of gauge-based data & updated Catchment LSM

● GLDAS-1: assimilation of satellite & ground based obs using Noah LSM; mix of meteorological forcings

● GLDAS-2: bias corrected Princeton meteorological forcings (1948-2010)

● CMC: snow depths from 3 observational sets + densities from snow climate classes

● GlobSnow: forward model of satellite microwave TBs + synoptic

station obs

5

Stats

● Pearson Correlation

● Mean Absolute Error

● Bias

● Relative Bias

6

Gridded product evaluation

● All evaluated products:● high MAE (300-400mm)● strong negative bias (200-

300mm)● ERALand had best MAE & bias,

followed by MERRA & GLDAS2

● Correlations:● ERALand had highest, followed

by MERRA, CMC and GLDAS-2

● Globsnow and ERA-Interim had lowest

7

BC's Physiographic Regions:Coast Mountains and Islands (CMI)Interior Plateau (IP)Northern and Central Plateaus and Mountains (NPM)Great Plains (GP)Columbia Mountains and Southern Rockies (CR)

8

BC's Physiographic Regions:Coast Mountains and Islands (CMI)Interior Plateau (IP)Northern and Central Plateaus and Mountains (NPM)Great Plains (GP)Columbia Mountains and Southern Rockies (CR)

10

ANNs in SWE retrievals

● Tong et al., 2010 built ANNs to retrieve SWE in the Quesnel River Basin

● AMSR-E passive microwave + in situ snow pillow data

● Station were close to but not necessarily at average grid cell elevation

● Filtered high SWE values

● Land cover not included in model

● ANNs using all channels outperformed those using only 37-19GHz (a & b)

● ANNs trained on data from 2 stations performed poorly at a 3rd nearby station because of different land surface characteristics (c)

a)

b)

c)

r=0.95

r=-0.44

r=0.81

11

Artificial Neural Network (ANN) setup● R package “monmlp”: builds 1 or 2 hidden layer monotonic

(or not) ANN

● inputs:● SWE gridded products● covariates: date (survey #), elevation, lat, lon, station -

mean grid cell elevation difference

● target: manual snow survey measurements

● output: predicted SWE

● testing: withhold x% of stations from ANN training, predict & compare

● 256 “long record” stations / 8 test splits = 32 stations/split

● validation: during training, withhold y% of data for evaluating model complexity

13

Model tuning: Bagging● Bootstrap aggregating “bagging”

● training dataset created by extracting records uniformly and with replacement

● size of record extraction is determined by block length● out of bootstrap samples are used to find validation

error● block length of 1 (an individual observation)

● very complex networks and high test set errors● likely due to temporal autocorrelation

● blocking by station and season didn't improve much

● blocking by station significantly mitigates temporal autocorrelation

14

Gridded products as predictors

● Need to have overlap in availability of dates (+/- 1 day) except for monthly products

● Another possible approach: use mean and std dev of an ensemble products as predictors

Product num valid points cum overlapERAInterim 34678 34678MERRA 34678 34678MERRALand 33488 33488GLDAS2 33185 31949ERALand 33182 31949GlobSnow 29252 28438GLDAS1 10412 8371CMC 3197 2241

15

Results

MERRA GlobSnow CMC

443.86 334.59 351.51 401.97 326.50 345.58 439.65 313.15 311.13 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 228.79 ✔ 212.35 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 205.58 ✔ ✔ ✔ ✔ ✔ ✔ ✔ 204.71

✔ ✔ 203.60 ✔ ✔ ✔ ✔ ✔ ✔ 201.28

✔ ✔ ✔ ✔ 200.56 ✔ ✔ ✔ ✔ ✔ 199.91 ✔ ✔ ✔ ✔ ✔ ✔ 199.02

✔ ✔ ✔ 196.14

ERA-Interim

ERA-Interim Land

MERRA-Land

GLDAS-1

GLDAS-2

Productsaveraged

ANN RMSE

17

Thank you! Questions?Drew Snauffer - [email protected]

Summary● early runs of a simple ANN show promise for improving

representation of SWE in BC● ANNs perform better using predictors individually rather than

averaging● best ANN appears to use neither 1 product nor all products

Future work● weight different gridded products (best linear unbiased

estimator "BLUE")● addition of data sources (PM, SCA) and covariates (slope,

aspect, ground cover)● incorporation of snow models● break out ML models by region● other ML techniques (support vector regression, random

forests, etc)

development of a machine learning swe estimation model for ... · 10 anns in swe retrievals tong et...

Documents