bayesian hierarchical modeling of hydroclimate problems balaji rajagopalan department of civil,...

Download Bayesian Hierarchical Modeling of Hydroclimate Problems Balaji Rajagopalan Department of Civil, Environmental and Architectural Engineering And Cooperative

If you can't read please download the document

Upload: rosanna-ryan

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Outline Bayesian Hierarchical Modeling Introduction from GLM Hydroclimate Applications BHM Contrast with near Bayesian models currently in vogue Stochastic Rainfall Generator BHM (Lima and Lall, 2009, WRR) Latent Gaussian Process Model (Kleiber et al., 2012, WRR) Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences) Seasonal Flow Flow extremes Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate)

TRANSCRIPT

Bayesian Hierarchical Modeling of Hydroclimate Problems Balaji Rajagopalan Department of Civil, Environmental and Architectural Engineering And Cooperative Institute for Research in Environmental Sciences (CIRES) University of Colorado Boulder, CO, USA Bayes by the Bay Conference, Pondicherry January 7, 2013 Co-authors & Collaborators Upmanu Lall and Naresh Devineni Columbia University, NY Hyun-Han Kwon, Chonbuk National University, South Korea Carlos Lima, Universidade de Brasila, Brazil Pablo Mendoza James McCreight & Will Kleiber University of Colorado, Boulder, CO Richard Katz NCAR, Boulder, CO NSF, NOAA, USBReclamation and Korean Science Foundation Outline Bayesian Hierarchical Modeling Introduction from GLM Hydroclimate Applications BHM Contrast with near Bayesian models currently in vogue Stochastic Rainfall Generator BHM (Lima and Lall, 2009, WRR) Latent Gaussian Process Model (Kleiber et al., 2012, WRR) Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences) Seasonal Flow Flow extremes Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate) Linear Regression Models Suppose the model relating the regressors to the response is In matrix notation this model can be written as Linear Regression Models where Linear Regression Models We wish to find the vector of least squares estimators that minimizes: The resulting least squares estimate is 12-1 Multiple Linear Regression Models Properties of the Least Squares Estimators Unbiased estimators: Covariance Matrix: 12-1 Multiple Linear Regression Models Properties of the Least Squares Estimators Individual variances and covariances: In general, Generalized Linear Model (GLM) Bayesian Perspective Linear Regression is not appropriate when the dependent variable y is not Normal Transformations of y to Normal are not possible Several situations (rainfall occurrence; number of wet/dry days; etc.) Hence, GLM Linear model is fitted to a suitably transformed variable of y Linear model is fitted to the parameters of the assumed distribution of y Likelihood Generalized Linear Model (GLM) Bayesian Perspective Noninformative prior on Assuming Normal distribution for Y, g (.) is identity Linear Regression Exponential family PDF, parameters All distributions Arise from this Normal, Exponential, Gamma Binomial, Poisson, etc Generalized Linear Model (GLM) Bayesian Perspective Log and logit Canonical Link Functions Generalized Linear Model (GLM) Bayesian Perspective Inverse Chi-Square Generalized Linear Model (GLM) Bayesian Perspective GLM is hierarchical Specific Distribution Link function With a simple step i.e., Providing priors and computing likelihood/posterior BHM Assuming Normal distribution of dependent variable and uninformative priors BHM collapses to a standard Linear Regression Model Thus BHM is a generalized framework Uncertainty in the model parameters and model Structure are automatically obtained. Summary Generalized Linear Model (GLM) Example - Bayesian Hierarchical Model Hard to sample from posterior - Use MCMC Stochastic Weather Generators Precipitation Occurrence, Rain Onset Day (Lima and Lall, 2009) Precipitation Occurrence and Amounts (Kleiber, 2012) 28.5 9.7 Historical Data Synthetic series Conditional on Climate Information Process model Frequency distribution of outcomes Users most interested in sectoral/process outcomes (streamflows, crop yields, risk of disease X, etc.) Need for a robust spatial weather generator Need for Downscaling Seasonal climate forecasts and future climate model projections often have coarse scales: Spatial: regional Temporal: seasonal, monthly Process models (hydrologic models, ecological models, crop growth models) often require daily weather data for a given location There is a scale mismatch! Stochastic Weather Generators can help bridge this scale gap. Precipitation Occurrence 504 stations in Brazil (Latitude & Longitude shown in figure) Lima and Lall (WRR, 2009) Modeling of rainfall occurrence (0 = dry, 1 = rain, P = 0.254mm threshold) using a probabilistic model (logistic regression): Modeling Occurrence at a Site where y st (n) is a non-homegeneous Bernoulli random variable for station s, day n and year t, being either 1 for a wet state or 0 for a dry state. p st (n) is the rainfall probability for station s and day n of year t. The seasonal cycle is modeled through Fourier harmonics: Results from Site #3 Outlier? Bayesian Hierarchical Model (BHM) But rainfall occurrence is correlated in space how to model? - partial BHM Shrinks paramters towards a common mean, reduce uncertainty since we are use more information to estimate model parameters; Parameter uncertainties are fully accounted during simulations Bayesian Hierarchical Model (BHM) Likelihood Function Posterior Distribution Bayes theorem MCMC to obtain posterior distribution Results for Station #3 Yearly Probability of Rainfall Results Station #3 - Average Probability of Rainfall Clusters on average day of max probability Max Probability of Rainfall Day of Max Probability of Rainfall Max Probability of rainfall correlated With climate variables ENSO, etc. Characterize rainfall onset Prediction of onset Lima and Lall (2009, WRR) Space-time Precipitation Generator Latent Gaussian Process (Kleiber et al., 201, WRR) Latent Gaussian Process Fit a GLM for Precipitation Occurrence and amounts at each location independently Occurrence logistic regression-based Amounts Gamma link function Spatial Process to smooth the GLM coefficients in space Almost Bayesian Hierarchical Modeling Alpha, gamma shape and scale parameter of Gamma Latent Gaussian Process Occurrence Model Latent Gaussian Process Parameter Estimation MLE, two step GLM + Latent Gaussian Process Kleiber et al. (2012) For Max and Min Temperature Models Conditioned on Precipitation Model - Using Latent Gaussian Process Kleiber et al. (2013, Annals of App. Statistics, in press) Outline Bayesian Hierarchical Modeling Introduction from GLM Hydroclimate Applications BHM Contrast with near Bayesian models currently in vogue Stochastic Rainfall Generator BHM (Lima and Lall, 2009, WRR) Latent Gaussian Process Model (Kleiber et al., 2012, WRR) Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences) Seasonal Flow Flow extremes Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate) Seasonal average and maximum Streamflow Forecasting (Kwon et al.,2009, Hydrologic Sciences) Streamflow Forecasting at Three Gorges Dam Identify Predictors Correlate seasonal streamflow with large scale climate variables from preceding seaons JJA flow with MAM climate Select regions of strong (Grantz et al., 2005) correlation predictors Streamflow Forecasting at Three Gorges Dam BHM for Seasonoal Streamflow Model is distributed as half-Cauchy with parameter 25 mildly informative Gelman (2006, Bayesian Analysis) MCMC is used to obtain the posterior distributions Data showed mild nonlinearity Quadratic terms in the model Streamflow Forecasting at Three Gorges Dam Predictors 2, 3, 4 and 5 Show tighter Bounds Uncertainty in predictors (i.e. model) is obtained and propogated in the forecacsts You can use PCA or stepwise etc. to reduce the number of predictors (this can be crude) Streamflow Forecasting at Three Gorges Dam Maximum Seasonal Streamflow Extreme Value Analysis Floods (Kwon et al.,2010, Hydrologic Sciences) American River at Fair Oaks - Ann. Max. Flood 100 yr flood estimated from 21 & 51 yr moving windows Floods The time varying (nonstationary) nature of hydrologic (flood) frequency (few examples) Climate Variability and Climate Change Climate Mechanisms that lead to changes in flood statistics Adaptation Strategy Adaptive Flood Risk Estimation Nonstationary Flood Frequency Estimation Seasonal to Inter-annual Forecasts & Climate Change Improved Infrastructure Management Summary / Climate Questions and Issues related to Hydrologic Extremes Flood mean given DJF NINO3 and PDO NINO3 PDO Flood Variance given DJF NINO3 and PDO NINO3 PDO Derived using weighted local regression with 30 neighbors Correlations: Log(Q) vs DJF NINO vs DJF PDO Jain & Lall, 2000 IWV (cm) Atmospheric River generates flooding CZD Russian River flooding in Monte Rio, California 18 February 2004 photo courtesy of David Kingsmill Russian River, CA Flood Event of 18-Feb-04 GPS IWV data from near CZD: Feb 2004 Bodega Bay Cloverdale Atmospheric river 10 rain at CZD in ~48 hours IWV (cm) IWV (inches) Slide from Paul Neimans talk Flood Estimation Under Nonstationarity Significant interannual/interdecadal variability of floods Stationarity assumptions (i.i.d) are invalid Large scale climate features in the Ocean- Atmosphere-Land system orchestrate floods at all time scales Need tools that can capture the nonstationarity Incorporate large scale climate information Year-to-Year time scale (Climate Variability) Flood mitigation planning, reservoir operations Interdecadal time scale (Climate Variability and Change) Facility design, planning and management Exponential (light, shape = 0), Pareto (heavy, shape > 0) and Beta (bounded, shape < 0) Generalized extreme value (GEV) can be used to characterize extreme flow distribution (Katz et al., 2002) 3 Model parameters Location parameter: (where distribution is centered) Scale parameter: (spread of the distribution) Shape parameter: (behavior of distribution tail) Gumbell, Frischet, Weibull (Coles 2001) Unconditional GEV Incorporate covariates into GEV parameters to account for nonstationarity Could apply to any parameter, but location is most intuitive: GLM Framework Hierarchical Bayesian Modeling natural and attractive alternative GEV fit using extRemes toolkit in R (Gilleland and Katz, 2011) (Gilleland and Katz 2005) Streamflow Forecasting at Three Gorges Dam BHM for Seasonal Maximum Flow Model is distributed as half-Cauchy with parameter 25 mildly informative Gelman (2006, Bayesian Analysis) MCMC is used to obtain the posterior distributions Data showed mild nonlinearity Quadratic terms in the model Streamflow Forecasting at Three Gorges Dam Predictors 3 and 5 Show tighter Bounds Streamflow Forecasting at Three Gorges Dam Nonstationary Flood Risk at Three Gorges Dam Dynamic 50-year flood from BHM and Stationary 50-year flood Conditional (nonstationary) Extremes in Water Quality (Towler et al., 2009, WRR) Case study location: PWB Towler et al. (2009) Forest to Faucet - Rain -Runoff -Storage (2 reservoirs) -Chemical Disinfection (Cl 2, NH 3 ) -No physical filtration (unfiltered) -Distribution Case study location: PWB Exceedances (SWTR criterion: turbidity < 5 NTU) Precipitation events High Flows Back-up groundwater source (Pumping $$) GEV Model UncondCond T Cond R Cond RT Cond R+T Variable00 0 + 1 T 0 + 1 R 0 + 1 (RT) 0 + 1 R+ 2 T 0 (se)1924 (120)1930 (1000)1739 (410)611.4 (150)1911 (880) 1 (se) (27)61.08 (32)3.716 (0.36)141.2 (14) 2 (se) (24) (se)1245 (84)1220 (81)1246 (160)923.7 (69)968.5 (74) (se) (0.065) (0.065) (0.084) (0.082) (0.075) llh K12223 AIC M0*M0* -Uncond Cond R D Sig ** -No (0.635)Yes (0.000) *** * Nested model to which model is compared in likelihood ratio test ** Significance is tested at =0.05 level, and ( ) indicates p-value. *** Correlation between the cross-validated z90 estimates and the observed maximum values Conditional quantiles correspond well to observed record Year Maximum Streamflow (cfs) Uses concurrent climate, but could also be used with seasonal forecast GEV distribution can be compared for specific historic times P and T climate change projections from IPCC AR4 are readily available 12 km 2 resolution (1/8 of a grid cell) Bias correct P & T to historic data for PWB watershed area Results indicate increasing maximum streamflow anomalies Observed 16 GCM models GCM model average Year Maximum Streamflow Anomaly (%) Streamflow quantiles shift higher under CC projections Observed 16 GCM models Probability of a turbidity spike given a certain maximum flow Likelihood of Turbidity Spike (Ang and Tang 2007) Maximum Flow (CFS) Conditional P(E) Likelihood of a turbidity spike increases under CC projections Observed 16 GCM models Percentile th (top whisker) th (box top) th (box middle) Likelihood of a turbidity spike increases P(E) Small shifts in risk can result in high expected loss Expected loss can be high, especially for the risk averse Summary Bayesian Hierarchical Modeling Powerful tool for all functional (regression) estimation problems (which is most of forecasting/simulation) Provides model and parameter uncertainties Obviates the need for discarding covariates Enables incorporation of expert opinions Enables modeling a rich variety of variable types Continuous, skewed, bounded, categorical, discrete etc. And distributions (Binomial, Poisson, Gammma, GEV) Generalized Framework Traditional linear models are a subset Paleo Hydrology Reconstruction Devineni and Lall, 2012, J. Climate accepted Colorado River Demand - Supply UC CRSS stream gauges LC CRSS stream gauges Motivation Paleo Hydrology Colorado River Example Streamflow and Tree Ring Data variable length streamflow record (Y t ) (5 sites) 246 years chronology (X t ) (8 tree ring chronologies) Average Summer (JJA) Flows as Predictand Annual Tree Ring Growth Index (Chronology) as Predictor 246 years common data Summer Flow = f(tree rings) + error Preliminary Data Analysis Bayesian Hypothesis (correlation tree chronology Vs average summer seasonal flow) Hypothesis No Shrinkage of Regression Coefficients (no pooling) traditional regression Shrinkage of Regression Coefficients across sites (partial pooling) hierarchical model (a) (b) Station-tree correlations similar! - pooling? Bayesian Hierarchical Models Partial Pooling Hierarchical Model Shrinkage on the coefficients to incorporate the predictive ability of each tree chronology on multiple stations Streamflow Log Normal Distribution Regression Coefficients () of the hierarchical model - multivariate normal distribution Key ideas: 1.Streamflow at each site comes from a pdf 2.Parameters of each pdf informed by each tree 3.Common multivariate distribution of parameters across trees 4.Noniformative prior for parameters of multivariate distribution 5.MCMC for parameter estimation Delaware River Reconstruction and Performance Models Developed Hierarchical Bayesian Regression (Partial Pooling) Linear Regression (No Pooling) Model Simulations WinBUGS : Bayesian Inference Using Gibbs Sampler 7500 simulations with 3 chains and convergence tests. Cross Validated Performance Metrics Reduction of Error (RE), Coefficient of Efficiency (CE) Delaware River Reconstruction and Performance Posterior PDF (Model Level 1) Delaware River Reconstruction and Performance No Pooling Partial Pooling Regression Coefficients Model Level 2 Delaware River Reconstruction Cross-Validated Performance CanonsvillePepacton Paleo Hydrology Reconstruction Traditional Methods Linear/Nonlinear Regression PCA of Tree Rings Regression on leading PCs Slide 88 of 49 Objective 1: Tree-ring Reconstructions LCBR Naturalize streamflow 9 nodes in CRSS 5 are well correlated with precipitation (>0.5) Referred to as good nodes (blue) 4 are not correlated (