data, models and reality: a modern statistical perspective
TRANSCRIPT
![Page 1: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/1.jpg)
Data, models and reality: a modern statisticalperspective
Richard E. Chandler
Department of Statistical Science, University College London([email protected])
ICRC-CORDEX, Stockholm, May 2016
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 1 / 50
![Page 2: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/2.jpg)
Prelude: climate modelling for impact assessment
My starting point:
Climate science is funded because we must live with the weather:we must prepare for weather-related risks and hazards, and adapt asthey change in the future, and we can also exploit the weather e.g. for
energy production and crop growth.
Corollary
Climate scientists need to think about users’ information requirements
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 2 / 50
![Page 3: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/3.jpg)
Prelude: climate modelling for impact assessment
My starting point:
Climate science is funded because we must live with the weather:we must prepare for weather-related risks and hazards, and adapt asthey change in the future, and we can also exploit the weather e.g. for
energy production and crop growth.
Corollary
Climate scientists need to think about users’ information requirements
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 2 / 50
![Page 4: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/4.jpg)
Examples of user information requirements
Historical weather observations for assessing risk, variability andchange at specific locations, e.g.:
to meet engineering design standards (return period estimation etc.);to evaluate power output from wind farmsto set insurance premiums
Historical weather observations to be used as inputs for ‘downstream’models, e.g.:
calibrating or validating hydrological models for assessing flood risk orwater resourcescalibrating or validating crop models for assessing agricultural productivity
Future projections to be used as inputs for ‘downstream’ models, e.g.to assess future changes in flood risk or agricultural productivityto inform policy development and mitigation / adaptation strategies
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 3 / 50
![Page 5: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/5.jpg)
Examples of user information requirements
Historical weather observations for assessing risk, variability andchange at specific locations, e.g.:
to meet engineering design standards (return period estimation etc.);to evaluate power output from wind farmsto set insurance premiums
Historical weather observations to be used as inputs for ‘downstream’models, e.g.:
calibrating or validating hydrological models for assessing flood risk orwater resourcescalibrating or validating crop models for assessing agricultural productivity
Future projections to be used as inputs for ‘downstream’ models, e.g.to assess future changes in flood risk or agricultural productivityto inform policy development and mitigation / adaptation strategies
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 3 / 50
![Page 6: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/6.jpg)
Examples of user information requirements
Historical weather observations for assessing risk, variability andchange at specific locations, e.g.:
to meet engineering design standards (return period estimation etc.);to evaluate power output from wind farmsto set insurance premiums
Historical weather observations to be used as inputs for ‘downstream’models, e.g.:
calibrating or validating hydrological models for assessing flood risk orwater resourcescalibrating or validating crop models for assessing agricultural productivity
Future projections to be used as inputs for ‘downstream’ models, e.g.to assess future changes in flood risk or agricultural productivityto inform policy development and mitigation / adaptation strategies
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 3 / 50
![Page 7: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/7.jpg)
User requirements: some challenges for climate science
1 Historical observations are often incomplete, subject to errors andchanges in measurement practice / technology and often unavailable atlocations of interest . . .
2 Data requirements of ‘downstream’ models may not coincide withavailable observations
3 Multiple imperfect sources of partial information may be available e.g.satellites, station observations etc. — how to combine these?
4 Future projections are usually derived from GCMs but can varysubstantially between models — what can we say usefully about thefuture, therefore?
Aim of presentation:
Aim to indicate potential for use of modern statisticalmethods to meet these challenges
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 4 / 50
![Page 8: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/8.jpg)
User requirements: some challenges for climate science
1 Historical observations are often incomplete, subject to errors andchanges in measurement practice / technology and often unavailable atlocations of interest . . .
2 Data requirements of ‘downstream’ models may not coincide withavailable observations
3 Multiple imperfect sources of partial information may be available e.g.satellites, station observations etc. — how to combine these?
4 Future projections are usually derived from GCMs but can varysubstantially between models — what can we say usefully about thefuture, therefore?
Aim of presentation:
Aim to indicate potential for use of modern statisticalmethods to meet these challenges
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 4 / 50
![Page 9: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/9.jpg)
Why statistics?
Statistics is about:1 Identifying what data are required to answer a question with a
specified degree of precisionOr what questions you can answer with the data you have
2 Interpreting information in the presence of uncertainty
Note that previous challenges all involve interpretation of informationunder uncertainty
Note also: opportunity to think about what additional data / model runswill be most helpful in plugging ‘information gaps’ (not covered today)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 5 / 50
![Page 10: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/10.jpg)
Why statistics?
Statistics is about:1 Identifying what data are required to answer a question with a
specified degree of precisionOr what questions you can answer with the data you have
2 Interpreting information in the presence of uncertainty
Note that previous challenges all involve interpretation of informationunder uncertainty
Note also: opportunity to think about what additional data / model runswill be most helpful in plugging ‘information gaps’ (not covered today)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 5 / 50
![Page 11: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/11.jpg)
Remainder of presentation
1 Model-based statistical thinking and someimplications for climate scientists
2 Working with incomplete data3 Combining information from multiple sources
4 Summary
![Page 12: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/12.jpg)
Model-based statistics
![Page 13: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/13.jpg)
Model-based statistical thinking
Most modern statistics is built around probability models that allow / enforce:
Framing of scientific questions in presence of uncertainty — questionsrelate to parameters / properties of model
Clarity of aims — analyst is forced to formulate questions precisely
Clarity of assumptions and well understood diagnostic techniques forchecking them
Accounting for all important aspects of a complex system includingchanging relationships
Optimal use of data in making inference / combining information
Explicit quantification of uncertainty in conclusions
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 8 / 50
![Page 14: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/14.jpg)
Model-based thinking: some consequences
Statistical techniques are more than just a ‘toolkit’: they require carefulthought about underlying model structures and whether they areappropriate for the problem at hand
Can reduce the need for extensive comparison exercises by focusing onlyon techniques that were designed for use in a specific situation
Awareness of modelling options provides opportunity to incorporateprocess understanding into statistical analyses
BUT:It’s hard /
Effective use of modern methods requires detailed and criticalunderstanding of (sometimes subtle) model properties
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 9 / 50
![Page 15: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/15.jpg)
Model-based thinking: some consequences
Statistical techniques are more than just a ‘toolkit’: they require carefulthought about underlying model structures and whether they areappropriate for the problem at hand
Can reduce the need for extensive comparison exercises by focusing onlyon techniques that were designed for use in a specific situation
Awareness of modelling options provides opportunity to incorporateprocess understanding into statistical analyses
BUT:It’s hard /
Effective use of modern methods requires detailed and criticalunderstanding of (sometimes subtle) model properties
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 9 / 50
![Page 16: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/16.jpg)
Model-based thinking: application to weather generators
A simple precipitation generator
‘Weather generators’ (WGs) often used to generate daily sequencesof precipitation, temperature etc. in climate impacts studiesMany generators still use variant of WGEN (Richardson, 1981, Wat.Resour. Res.):
1 Generate daily precipitation sequence using Markov chain foroccurrence and some distribution for intensity on ‘wet’ days
2 Generate other variables conditional on precipitation e.g. usingregression relationships
Markov model ensures dependence between successive days’precipitation occurrence.
Precipitation intensity distribution often taken as e.g. exponential,gamma etc. and sampled independently on each wet day
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 10 / 50
![Page 17: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/17.jpg)
Model-based thinking: application to weather generators
Precipitation generator: some known deficiencies
1 Spell lengths: “One criticism of the Richardson-type WG is its failureto describe adequately the length of dry and wet series (i.e.persistent events such as drought and prolonged rainfall). These canbe very important in some applications (e.g. agricultural impacts).”(IPCC WG guidelines atwww.ipcc-data.org/guidelines/dgm_no2_v1_09_2004.pdf)
2 Overdispersion: tendency to underestimate variability of seasonalmeans / totals — see, e.g., Katz & Parlange (J. Climate, 1998).
3 Extremal properties: underestimation of high return levels such as100-year daily maxima (independent exponential / gamma intensitydistributions do not yield ‘heavy tailed’ extreme distributions observedin daily rainfall data e.g. Katz et al. (Adv. Water Resources, 2002))
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 11 / 50
![Page 18: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/18.jpg)
Model-based thinking: application to weather generators
Remedying WG deficiencies: some suggestions from literature
Higher-order Markov chains to improve wet and dry spell performance
Heavy-tailed intensity distributions to improve extremal behaviour
Introduce latent classes with separate parameter sets, to increasevariability in seasonal means
Nonparametric modelling to avoid specific distributional assumptions
Etc. etc.
An elephant in the room?
What about autocorrelationbetween successive days’precipitation intensities?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 12 / 50
![Page 19: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/19.jpg)
Model-based thinking: application to weather generators
Remedying WG deficiencies: some suggestions from literature
Higher-order Markov chains to improve wet and dry spell performance
Heavy-tailed intensity distributions to improve extremal behaviour
Introduce latent classes with separate parameter sets, to increasevariability in seasonal means
Nonparametric modelling to avoid specific distributional assumptions
Etc. etc.
An elephant in the room?
What about autocorrelationbetween successive days’precipitation intensities?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 12 / 50
![Page 20: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/20.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: rationale, consequences and a question
Autocorrelation is expected on physical grounds
Autocorrelation explains overdispersion: variance of seasonal total is
Var(T
∑t=1
Yt) =T
∑t=1
Var(Yt) + 2T−1
∑t1=1
T
∑t2=t1+1
Cov(Yt1 ,Yt2)
NB second term is zero for independent intensities but > 0 forcorrelated intensitiesEven weak correlations have surprisingly large effects, because oflarge number (T (T −1)/2) of contributions to sum in second term
Autocorrelated exponential / gamma distributions can lead toheavy-tailed distributions of extremes (see later)
Question: how to do it?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 13 / 50
![Page 21: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/21.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: rationale, consequences and a question
Autocorrelation is expected on physical grounds
Autocorrelation explains overdispersion: variance of seasonal total is
Var(T
∑t=1
Yt) =T
∑t=1
Var(Yt) + 2T−1
∑t1=1
T
∑t2=t1+1
Cov(Yt1 ,Yt2)
NB second term is zero for independent intensities but > 0 forcorrelated intensitiesEven weak correlations have surprisingly large effects, because oflarge number (T (T −1)/2) of contributions to sum in second term
Autocorrelated exponential / gamma distributions can lead toheavy-tailed distributions of extremes (see later)
Question: how to do it?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 13 / 50
![Page 22: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/22.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: rationale, consequences and a question
Autocorrelation is expected on physical grounds
Autocorrelation explains overdispersion: variance of seasonal total is
Var(T
∑t=1
Yt) =T
∑t=1
Var(Yt) + 2T−1
∑t1=1
T
∑t2=t1+1
Cov(Yt1 ,Yt2)
NB second term is zero for independent intensities but > 0 forcorrelated intensitiesEven weak correlations have surprisingly large effects, because oflarge number (T (T −1)/2) of contributions to sum in second term
Autocorrelated exponential / gamma distributions can lead toheavy-tailed distributions of extremes (see later)
Question: how to do it?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 13 / 50
![Page 23: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/23.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: rationale, consequences and a question
Autocorrelation is expected on physical grounds
Autocorrelation explains overdispersion: variance of seasonal total is
Var(T
∑t=1
Yt) =T
∑t=1
Var(Yt) + 2T−1
∑t1=1
T
∑t2=t1+1
Cov(Yt1 ,Yt2)
NB second term is zero for independent intensities but > 0 forcorrelated intensitiesEven weak correlations have surprisingly large effects, because oflarge number (T (T −1)/2) of contributions to sum in second term
Autocorrelated exponential / gamma distributions can lead toheavy-tailed distributions of extremes (see later)
Question: how to do it?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 13 / 50
![Page 24: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/24.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: how to do itIdea: embed ‘classical’ generator within wider class of models
Grunwald & Jones (Environmetrics, 2000) showed that Markov-basedmodels are special cases of Generalised Linear Models (GLMs)
Generic formulation of GLM for arbitrary variable Yt
{Yt} considered drawn from common family of distributions (normal,gamma, Poisson, Bernoulli, . . .)Conditional on covariate vector xt , expected value of Yt isµt = E(Yt |xt )µt related to linear predictor ηt = xt β via relationship g(µt ) = ηt forlink function g(·).
Extends linear regression model (normal distributions, g(µt) = µt ).E.g. use gamma GLM with log link for precipitation intensity
Can model autocorrelated intensities by including Yt−1 in xt — easy!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 14 / 50
![Page 25: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/25.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: how to do itIdea: embed ‘classical’ generator within wider class of models
Grunwald & Jones (Environmetrics, 2000) showed that Markov-basedmodels are special cases of Generalised Linear Models (GLMs)Generic formulation of GLM for arbitrary variable Yt
{Yt} considered drawn from common family of distributions (normal,gamma, Poisson, Bernoulli, . . .)Conditional on covariate vector xt , expected value of Yt isµt = E(Yt |xt )µt related to linear predictor ηt = xt β via relationship g(µt ) = ηt forlink function g(·).
Extends linear regression model (normal distributions, g(µt) = µt ).E.g. use gamma GLM with log link for precipitation intensity
Can model autocorrelated intensities by including Yt−1 in xt — easy!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 14 / 50
![Page 26: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/26.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: how to do itIdea: embed ‘classical’ generator within wider class of models
Grunwald & Jones (Environmetrics, 2000) showed that Markov-basedmodels are special cases of Generalised Linear Models (GLMs)Generic formulation of GLM for arbitrary variable Yt
{Yt} considered drawn from common family of distributions (normal,gamma, Poisson, Bernoulli, . . .)Conditional on covariate vector xt , expected value of Yt isµt = E(Yt |xt )µt related to linear predictor ηt = xt β via relationship g(µt ) = ηt forlink function g(·).
Extends linear regression model (normal distributions, g(µt) = µt ).
E.g. use gamma GLM with log link for precipitation intensityCan model autocorrelated intensities by including Yt−1 in xt — easy!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 14 / 50
![Page 27: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/27.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: how to do itIdea: embed ‘classical’ generator within wider class of models
Grunwald & Jones (Environmetrics, 2000) showed that Markov-basedmodels are special cases of Generalised Linear Models (GLMs)Generic formulation of GLM for arbitrary variable Yt
{Yt} considered drawn from common family of distributions (normal,gamma, Poisson, Bernoulli, . . .)Conditional on covariate vector xt , expected value of Yt isµt = E(Yt |xt )µt related to linear predictor ηt = xt β via relationship g(µt ) = ηt forlink function g(·).
Extends linear regression model (normal distributions, g(µt) = µt ).E.g. use gamma GLM with log link for precipitation intensity
Can model autocorrelated intensities by including Yt−1 in xt — easy!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 14 / 50
![Page 28: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/28.jpg)
Model-based thinking: application to weather generators
GLMs: further commentsCornerstone of modern statistical practice in all application areas
Unifying approach for many statistical techniques e.g. can also writeMarkov Chain for occurrence as GLM using logistic regressionChanging relationships represented using interaction terms that allowone variable to modulate the effect of another
No need for separate fitting to different subsets of observations (e.g.separate models for different seasons)
Standard array of diagnostic tools for checking assumptions
Framework allows development of statistical models that reflectsystem structure / process understanding
More details for climate scientists: Chandler (2005, Environmetrics).
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 15 / 50
![Page 29: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/29.jpg)
Model-based thinking: application to weather generators
GLM example: precipitation intensity over the Thames catchment, UKResponse variable: Precipitation amount (mm)
Coefficient Std Err T-stat Pr(|T|>t)Constant 8.1527 0.4787 17.0303 < 2.2e-16
1 Legendre polynomial 1 for Latitude -0.2249 0.0237 -9.4788 < 2.2e-162 Legendre polynomial 1 for Longitud 0.0027 0.0118 0.2288 0.81902333 IVT (code 55) 0.0001 0.0001 1.0902 0.27564774 Distance-weighted mean of Ln(1+Prec 0.0968 0.0066 14.7621 < 2.2e-165 Distance-weighted mean of Air press -0.0664 0.0047 -13.9768 < 2.2e-166 Daily seasonal effect, cosine compo 0.0062 0.0087 0.7141 0.47518597 Daily seasonal effect, sine compone -0.0909 0.0093 -9.7930 < 2.2e-168 Daily half-year cycle, cosine compo 0.0328 0.0084 3.8877 0.00010139 Daily half-year cycle, sine compone -0.0084 0.0086 -0.9775 0.3283467
[snip snip]
Spatial dependence structure:-----------------------------Structure used: Exponential correlation function: exp[-phi*d] for Anscom
Correlation decay rate, phi: 7.1193
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 16 / 50
![Page 30: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/30.jpg)
Model-based thinking: application to weather generators
GLM example: structure of multisite, multivariate WG for the Thames
Mean sea level pressure (MSLP)Mean sea level pressure (MSLP)
Mean 2m air temperatureMean 2m air temperature
Mean relative humidity at ~1000hPa
Mean relative humidity at ~1000hPa
Atmospheric river frequency (moisture content >300 kg m-1 s-1)
Atmospheric river frequency (moisture content >300 kg m-1 s-1)
Integrated vapour transport during atmospheric river events
Integrated vapour transport during atmospheric river events
Monthly indices of large-scale structure
Means are for region 50°-60°N, 0°-10°WAir pressureAir pressure
PrecipitationPrecipitation
Air temperatureAir temperature
Wind speedWind speed
Wet bulb temperatureWet bulb temperature
Local variables in weather generator
Short wave radiationShort wave radiation
Cloud coverCloud cover
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 17 / 50
![Page 31: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/31.jpg)
Model-based thinking: application to weather generators
GLM example: specimen WG outputs for the Thames (temperature)
2 4 6 8 10 12
280
285
290
Mean
Month
K
2 4 6 8 10 12
2.0
3.0
4.0
Std Dev
MonthK
2 4 6 8 10 12
285
295
Maximum
Month
K
2 4 6 8 10 12
265
275
285
Minimum
Month
K
2 4 6 8 10 12
0.4
0.6
0.8
Lag 1 ACF
Month
Cor
rela
tion
2 4 6 8 10 12
0.1
0.3
0.5
Lag 2 ACF
Month
Cor
rela
tion
2 4 6 8 10 12−
0.4
0.0
0.4
Correlation withwind speed
Month
Cor
rela
tion
2 4 6 8 10 12−0.
6−
0.2
0.2
Correlation withprecipitation
Month
Cor
rela
tion
All GLM output from Rglimclim
(http://www.homepages.ucl.ac.uk/˜ucakarc/work/glimclim.html)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 18 / 50
![Page 32: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/32.jpg)
Model-based thinking: application to weather generators
Autocorrelated intensities: effect on extremes
Toy example: Yt has exponentialdistribution with meanµt = 1 + φ(Yt−1−1), for some φ ∈ [0,1).
φ controls autocorrelation: successivevalues independent when φ = 0.Can show E(Yt ) = 1 for any φ.Example falls within GLM framework.
Simulate series of length 365 000 (“1000years”) with different values of φ; plotestimated PDFs of daily values andannual maxima from simulated series.
Note: increasing φ has no perceptibleeffect on daily distributions of daily valuesbut can produce much larger extremes
0 10 20 30 40 50 60 70
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Density of daily values
x
f(x)
Value of φ (mean of distribution)
0.00 (1.0) 0.15 (1.0) 0.30 (1.0) 0.45 (1.0)
10 20 30 40 50 60 70
0.00
0.05
0.10
0.15
0.20
0.25
Density of annual maxima
x
f(x)
Value of φ (mean of distribution)
0.00 (6.5) 0.15 (6.9) 0.30 (8.2) 0.45 (10.8)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 19 / 50
![Page 33: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/33.jpg)
Model-based thinking: take-home messages
Modern statistics is based on models that describe / explain systembehaviourModel-based thinking allows incorporation of process understanding intostatistical analysis (e.g. autocorrelation in precipitation intensities)
Analysis starts with “how does process behave?”, not “what statisticalmethod should I use?”Allows improvements to target cause rather than symptom (e.g.autocorrelation, overdispersion)
Effective use of model-based thinking requires good awareness ofavailable techniques:
Familiarity with GLMs wouldn’t be a bad starting point — many “standard”climatological techniques are special cases but GLMs offer greaterflexibility (e.g. autocorrelation in intensities again)
Caveat: model-based thinking also requires good understanding ofmodel properties, especially for complex systems
Find a competent and friendly statistician, or spend time with a good book. . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 20 / 50
![Page 34: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/34.jpg)
Model-based thinking: take-home messages
Modern statistics is based on models that describe / explain systembehaviourModel-based thinking allows incorporation of process understanding intostatistical analysis (e.g. autocorrelation in precipitation intensities)
Analysis starts with “how does process behave?”, not “what statisticalmethod should I use?”Allows improvements to target cause rather than symptom (e.g.autocorrelation, overdispersion)
Effective use of model-based thinking requires good awareness ofavailable techniques:
Familiarity with GLMs wouldn’t be a bad starting point — many “standard”climatological techniques are special cases but GLMs offer greaterflexibility (e.g. autocorrelation in intensities again)
Caveat: model-based thinking also requires good understanding ofmodel properties, especially for complex systems
Find a competent and friendly statistician, or spend time with a good book. . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 20 / 50
![Page 35: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/35.jpg)
Model-based thinking: take-home messages
Modern statistics is based on models that describe / explain systembehaviourModel-based thinking allows incorporation of process understanding intostatistical analysis (e.g. autocorrelation in precipitation intensities)
Analysis starts with “how does process behave?”, not “what statisticalmethod should I use?”Allows improvements to target cause rather than symptom (e.g.autocorrelation, overdispersion)
Effective use of model-based thinking requires good awareness ofavailable techniques:
Familiarity with GLMs wouldn’t be a bad starting point — many “standard”climatological techniques are special cases but GLMs offer greaterflexibility (e.g. autocorrelation in intensities again)
Caveat: model-based thinking also requires good understanding ofmodel properties, especially for complex systems
Find a competent and friendly statistician, or spend time with a good book. . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 20 / 50
![Page 36: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/36.jpg)
A good book?
Statistical Methods for Trend Detection and Analysis In the Environmental Sciences
RICHARD CHANDLERMARIAN SCOTT
STATISTICS IN PRACTICE
CHANDLER
SCOTT
Statistical Meth
ods for Trend D
etection an
d An
alysisIn the Environm
ental Sciences
Cover design by www.theoakstudio.co.uk
A series of practical books outlining the use of statistical techniques in a wide range of applications areas:
• HUMAN AND BIOLOGICAL SCIENCES
• EARTH AND ENVIRONMENTAL SCIENCES
• INDUSTRY, COMMERCE AND FINANCE
STATISTICS IN PRACTICE
Quality of Life (QoL) outcomes or Person/Patient Reported Outcome Measures (PROMs) are nowfrequently being used in randomised controlled trials (RCTs) and observational studies. This bookprovides a practical guide to the design, analysis and interpretation of studies that use such outcomes.
QoL outcomes tend to generate data with discrete, bounded and skewed distributions. Many investigatorsare concerned about the appropriateness of using standard statistical methods to analyse QoL data andwant guidance on what methods to use. QoL outcomes are frequently used in cross-sectional surveys andnon-randomised health-care evaluations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
• Provides a user-friendly guide to the design and analysis of clinical trials and observational studies in relation to QoL outcomes.
• Discusses the problems caused by QoL outcomes and presents intervention options to help tackle them.
• Guides the reader step-by-step through the selection of appropriate QoLs.
• Features exercises and solutions and a supporting website (www.wiley.com/go/walters) providingdownloadable data files.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Illustrated throughout with examples and case studies drawn from the author’s experience, this book offersstatisticians and clinicians guidance on choosing between the numerous available QoL instruments.
STEPHEN J. WALTERSSchool of Health and Related Research, University of Sheffield, UK
Statistical Methods for Trend Detection and AnalysisIn the Environmental Sciences
Red box rules are for proof stage only. Delete before final printing.
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 21 / 50
![Page 37: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/37.jpg)
Working with incomplete data
![Page 38: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/38.jpg)
Incomplete data: a ‘typical’ example?
The Thames again
Largest catchment in UK(∼ 10000km2)
Users (hydrologists)wanted hourlysequences, 8 variables,1km2 resolutionthroughout catchment
●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●●
●Stations used for calibration Kennet grid nodes
0 50 100km
Negotiated settlement: daily sequences, 5×5km2 resolution, Kennetsubcatchment (186 grid nodes)Data on (most) variables nominally available from 157 stations, 1970onwards
Other variables derived using standard procedures from literature
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 23 / 50
![Page 39: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/39.jpg)
Incomplete data: a ‘typical’ example?
Data availability (II)
Numbers of stations with data (out of 157)
Rainfall Pressure Temperature Wind speed SWR
71 52 140 135 22
Many stations have short /incomplete/ patchy records
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 24 / 50
![Page 40: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/40.jpg)
Incomplete data: the problem and some solutions
Incomplete data might be:Individual observations / blocks missing from otherwise complete recordDifferent record lengths (short records have missing ends)Absence of recording stations at required locations (e.g. subcatchmentcentres, nodes of regular grid)
Possible solutions:Preferred solution: work just with data available if statistical methodologyallows itInterpolation: estimate missing values (e.g. kriging, inverse distanceweighting, splines, reanalysis etc.)
Note: all gridded climate datasets are based on some form of interpolation
Strong recommendation:
NEVER, on any account, work with interpolatedprecipitation data!!!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 25 / 50
![Page 41: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/41.jpg)
Incomplete data: the problem and some solutions
Incomplete data might be:Individual observations / blocks missing from otherwise complete recordDifferent record lengths (short records have missing ends)Absence of recording stations at required locations (e.g. subcatchmentcentres, nodes of regular grid)
Possible solutions:Preferred solution: work just with data available if statistical methodologyallows itInterpolation: estimate missing values (e.g. kriging, inverse distanceweighting, splines, reanalysis etc.)
Note: all gridded climate datasets are based on some form of interpolation
Strong recommendation:
NEVER, on any account, work with interpolatedprecipitation data!!!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 25 / 50
![Page 42: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/42.jpg)
Incomplete data: the problem and some solutions
Incomplete data might be:Individual observations / blocks missing from otherwise complete recordDifferent record lengths (short records have missing ends)Absence of recording stations at required locations (e.g. subcatchmentcentres, nodes of regular grid)
Possible solutions:Preferred solution: work just with data available if statistical methodologyallows itInterpolation: estimate missing values (e.g. kriging, inverse distanceweighting, splines, reanalysis etc.)
Note: all gridded climate datasets are based on some form of interpolation
Strong recommendation:
NEVER, on any account, work with interpolatedprecipitation data!!!
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 25 / 50
![Page 43: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/43.jpg)
Interpolation / gridded data: what’s the problem?
“Interpolation” here means using ‘best’ estimates of missing values
Interpolated values are smoothed⇒ variability reduced (affects, e.g.,extremes)
Interpolation introduces artificial inhomogeneities e.g. due to differentdistances from nearest neighbouring gauges . . .
and it gives false impression of reduced uncertainty
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 26 / 50
![Page 44: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/44.jpg)
Interpolation / gridded data: what’s the problem?
“Interpolation” here means using ‘best’ estimates of missing values
Interpolated values are smoothed⇒ variability reduced (affects, e.g.,extremes)
Interpolation introduces artificial inhomogeneities e.g. due to differentdistances from nearest neighbouring gauges . . .
and it gives false impression of reduced uncertainty
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 26 / 50
![Page 45: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/45.jpg)
Interpolation / gridded data: what’s the problem?
“Interpolation” here means using ‘best’ estimates of missing values
Interpolated values are smoothed⇒ variability reduced (affects, e.g.,extremes)
Interpolation introduces artificial inhomogeneities e.g. due to differentdistances from nearest neighbouring gauges . . .
and it gives false impression of reduced uncertainty
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 26 / 50
![Page 46: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/46.jpg)
Interpolation / gridded data: what’s the problem?
“Interpolation” here means using ‘best’ estimates of missing values
Interpolated values are smoothed⇒ variability reduced (affects, e.g.,extremes)
Interpolation introduces artificial inhomogeneities e.g. due to differentdistances from nearest neighbouring gauges . . .
and it gives false impression of reduced uncertainty
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 26 / 50
![Page 47: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/47.jpg)
Interpolation and reality: a simple gridded dataset
Example: mapping 2009 winter rainfall in southwest Western Australia
Data available: daily rainfall records from 60 raingauges.
Kriging used (log scale) to interpolate to regular grid
Uses geoR package in R— produces ‘bestestimate’, uncertaintiesand sampling fromuncertainty distributionof rainfall surface
Uncertainty samples are‘plausible realities’; ‘bestestimate’ resemblesnone of them.
116 118 120 122
−34
−32
−30
−28
Best estimate
Longitude (°E)La
titud
e (°
S)
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
116 118 120 122
−34
−32
−30
−28
Coefficient of variation
Longitude (°E)
Latit
ude
(°S
)
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
Static display: separate maps of best estimate and uncertainty.
For uncertainty animation algorithm, see http://www.stats.gla.ac.uk/
˜adrian/papers/graphics-for-uncertainty-paper.pdf
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 27 / 50
![Page 48: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/48.jpg)
Interpolation and reality: a simple gridded dataset
Example: mapping 2009 winter rainfall in southwest Western Australia
Data available: daily rainfall records from 60 raingauges.
Kriging used (log scale) to interpolate to regular grid
Uses geoR package in R— produces ‘bestestimate’, uncertaintiesand sampling fromuncertainty distributionof rainfall surface
Uncertainty samples are‘plausible realities’; ‘bestestimate’ resemblesnone of them.
116 118 120 122
−34
−32
−30
−28
Longitude (°E)
Latit
ude
(°S
)
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
250
250
500
1000
Static display: areas of high uncertainty masked by ‘fog’
For uncertainty animation algorithm, see http://www.stats.gla.ac.uk/
˜adrian/papers/graphics-for-uncertainty-paper.pdf
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 27 / 50
![Page 49: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/49.jpg)
Interpolation and reality: a simple gridded dataset
Example: mapping 2009 winter rainfall in southwest Western Australia
Data available: daily rainfall records from 60 raingauges.
Kriging used (log scale) to interpolate to regular grid
Uses geoR package in R— produces ‘bestestimate’, uncertaintiesand sampling fromuncertainty distributionof rainfall surface
Uncertainty samples are‘plausible realities’; ‘bestestimate’ resemblesnone of them. Dynamic display: possible maps that are consistent with data
For uncertainty animation algorithm, see http://www.stats.gla.ac.uk/
˜adrian/papers/graphics-for-uncertainty-paper.pdf
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 27 / 50
![Page 50: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/50.jpg)
Dangers of interpolation: a simulation experiment
Design of the experiment
Simulate 30-year sequences at 12 locations (blue triangles):
Multi-site GLM used: identicalstructure at all sites
Sequences ‘typical’ of SEEngland
Spatial scale: ∼ 75% of dayshave sites all wet or all dry,wet-day inter-site correlations∼ 0.6–0.8.
1 2
3 4
5
6
7
8
9
10
11
12
1 2 3 4
5 6 7 8
9 10 11 12
Gauge Grid node
Map of simulation region
Use kriging to create gridded daily dataset from simulations
Regular grid: 12 nodes (red squares)
Compare annual maxima / return levels for original & gridded data
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 28 / 50
![Page 51: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/51.jpg)
Dangers of interpolation: a simulation experiment
Design of the experiment
Simulate 30-year sequences at 12 locations (blue triangles):
Multi-site GLM used: identicalstructure at all sites
Sequences ‘typical’ of SEEngland
Spatial scale: ∼ 75% of dayshave sites all wet or all dry,wet-day inter-site correlations∼ 0.6–0.8.
1 2
3 4
5
6
7
8
9
10
11
12
1 2 3 4
5 6 7 8
9 10 11 12
Gauge Grid node
Map of simulation region
Use kriging to create gridded daily dataset from simulations
Regular grid: 12 nodes (red squares)
Compare annual maxima / return levels for original & gridded data
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 28 / 50
![Page 52: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/52.jpg)
Dangers of interpolation: a simulation experiment
Design of the experiment
Simulate 30-year sequences at 12 locations (blue triangles):
Multi-site GLM used: identicalstructure at all sites
Sequences ‘typical’ of SEEngland
Spatial scale: ∼ 75% of dayshave sites all wet or all dry,wet-day inter-site correlations∼ 0.6–0.8.
1 2
3 4
5
6
7
8
9
10
11
12
1 2 3 4
5 6 7 8
9 10 11 12
Gauge Grid node
Map of simulation region
Use kriging to create gridded daily dataset from simulations
Regular grid: 12 nodes (red squares)
Compare annual maxima / return levels for original & gridded data
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 28 / 50
![Page 53: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/53.jpg)
Dangers of interpolation: a simulation experiment
Design of the experiment
Simulate 30-year sequences at 12 locations (blue triangles):
Multi-site GLM used: identicalstructure at all sites
Sequences ‘typical’ of SEEngland
Spatial scale: ∼ 75% of dayshave sites all wet or all dry,wet-day inter-site correlations∼ 0.6–0.8.
1 2
3 4
5
6
7
8
9
10
11
12
1 2 3 4
5 6 7 8
9 10 11 12
Gauge Grid node
Map of simulation region
Use kriging to create gridded daily dataset from simulations
Regular grid: 12 nodes (red squares)
Compare annual maxima / return levels for original & gridded data
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 28 / 50
![Page 54: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/54.jpg)
Dangers of interpolation: a simulation experiment
Results: distributions of annual maxima, and pooled return levelestimates
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
S00
1
S00
2
S00
3
S00
4
S00
5
S00
6
S00
7
S00
8
S00
9
S01
0
S01
1
S01
2
NA
.1
NA
.2
NA
.3
NA
.4
NA
.5
NA
.6
NA
.7
NA
.8
NA
.9
NA
.10
NA
.11
0
20
40
60
80
100
Location
mm
Simulation experiment: distributions of annual maxima in 30−year period
SitesGrid nodes
Return Estimate (mm)period Original Gridded10 yr 44.0 38.050 yr 57.8 49.4100 yr 63.9 54.4
Actual return periods for griddedestimates: 5, 19 and 34 years
Maxima for gridded data are smaller and less variable
Gridding reduces return level estimates by ∼ 15%
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 29 / 50
![Page 55: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/55.jpg)
Dangers of interpolation: a simulation experiment
Results: distributions of annual maxima, and pooled return levelestimates
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
S00
1
S00
2
S00
3
S00
4
S00
5
S00
6
S00
7
S00
8
S00
9
S01
0
S01
1
S01
2
NA
.1
NA
.2
NA
.3
NA
.4
NA
.5
NA
.6
NA
.7
NA
.8
NA
.9
NA
.10
NA
.11
0
20
40
60
80
100
Location
mm
Simulation experiment: distributions of annual maxima in 30−year period
SitesGrid nodes
Return Estimate (mm)period Original Gridded10 yr 44.0 38.050 yr 57.8 49.4100 yr 63.9 54.4
Actual return periods for griddedestimates: 5, 19 and 34 years
Maxima for gridded data are smaller and less variable
Gridding reduces return level estimates by ∼ 15%
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 29 / 50
![Page 56: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/56.jpg)
Dangers of interpolation: a simulation experiment
Results: distributions of annual maxima, and pooled return levelestimates
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
S00
1
S00
2
S00
3
S00
4
S00
5
S00
6
S00
7
S00
8
S00
9
S01
0
S01
1
S01
2
NA
.1
NA
.2
NA
.3
NA
.4
NA
.5
NA
.6
NA
.7
NA
.8
NA
.9
NA
.10
NA
.11
0
20
40
60
80
100
Location
mm
Simulation experiment: distributions of annual maxima in 30−year period
SitesGrid nodes
Return Estimate (mm)period Original Gridded10 yr 44.0 38.050 yr 57.8 49.4100 yr 63.9 54.4
Actual return periods for griddedestimates: 5, 19 and 34 years
Maxima for gridded data are smaller and less variable
Gridding reduces return level estimates by ∼ 15%
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 29 / 50
![Page 57: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/57.jpg)
Dangers of interpolation: a simulation experiment
Results: distributions of annual maxima, and pooled return levelestimates
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
S00
1
S00
2
S00
3
S00
4
S00
5
S00
6
S00
7
S00
8
S00
9
S01
0
S01
1
S01
2
NA
.1
NA
.2
NA
.3
NA
.4
NA
.5
NA
.6
NA
.7
NA
.8
NA
.9
NA
.10
NA
.11
0
20
40
60
80
100
Location
mm
Simulation experiment: distributions of annual maxima in 30−year period
SitesGrid nodes
Return Estimate (mm)period Original Gridded10 yr 44.0 38.050 yr 57.8 49.4100 yr 63.9 54.4
Actual return periods for griddedestimates: 5, 19 and 34 years
Maxima for gridded data are smaller and less variable
Gridding reduces return level estimates by ∼ 15%
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 29 / 50
![Page 58: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/58.jpg)
Incomplete data: summary and comments
Data availability is key consideration in choice of statistical methodologyPreferred solution: use methods that can work with information available(e.g. GLMs can handle missing responses but not missing covariates)
May need to compromise on ‘process-based modelling’ e.g for ThamesWG, physics suggests pressure is of ‘primary’ importance and should bemodelled as driving other variables — but availability and quality ofpressure data is poor
Alternatively, propagate interpolation (or other) uncertainties throughsubsequent analysis (VERY HARD!!! — but see literature on multipleimputation or survey non-response)
Similar comments apply to other forms of ‘preprocessing’ e.g. datahomogenisation: may be easiest to work with original observations andincorporate homogenisation adjustments directly into statistical model (e.g.Ambrosino & Chandler, J. Appl. Meteorol. Climatol. 2013).
NB interpolation / gridding is less dangerous for quantities that varysmoothly in space and time e.g. temperature / pressure fields — but stillneed to acknowledge uncertainties
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 30 / 50
![Page 59: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/59.jpg)
Incomplete data: summary and comments
Data availability is key consideration in choice of statistical methodologyPreferred solution: use methods that can work with information available(e.g. GLMs can handle missing responses but not missing covariates)
May need to compromise on ‘process-based modelling’ e.g for ThamesWG, physics suggests pressure is of ‘primary’ importance and should bemodelled as driving other variables — but availability and quality ofpressure data is poor
Alternatively, propagate interpolation (or other) uncertainties throughsubsequent analysis (VERY HARD!!! — but see literature on multipleimputation or survey non-response)
Similar comments apply to other forms of ‘preprocessing’ e.g. datahomogenisation: may be easiest to work with original observations andincorporate homogenisation adjustments directly into statistical model (e.g.Ambrosino & Chandler, J. Appl. Meteorol. Climatol. 2013).
NB interpolation / gridding is less dangerous for quantities that varysmoothly in space and time e.g. temperature / pressure fields — but stillneed to acknowledge uncertainties
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 30 / 50
![Page 60: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/60.jpg)
Incomplete data: summary and comments
Data availability is key consideration in choice of statistical methodologyPreferred solution: use methods that can work with information available(e.g. GLMs can handle missing responses but not missing covariates)
May need to compromise on ‘process-based modelling’ e.g for ThamesWG, physics suggests pressure is of ‘primary’ importance and should bemodelled as driving other variables — but availability and quality ofpressure data is poor
Alternatively, propagate interpolation (or other) uncertainties throughsubsequent analysis (VERY HARD!!! — but see literature on multipleimputation or survey non-response)
Similar comments apply to other forms of ‘preprocessing’ e.g. datahomogenisation: may be easiest to work with original observations andincorporate homogenisation adjustments directly into statistical model (e.g.Ambrosino & Chandler, J. Appl. Meteorol. Climatol. 2013).
NB interpolation / gridding is less dangerous for quantities that varysmoothly in space and time e.g. temperature / pressure fields — but stillneed to acknowledge uncertainties
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 30 / 50
![Page 61: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/61.jpg)
Combining information
![Page 62: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/62.jpg)
Combining information
Comprehensive analysis may require synthesis of information frommultiple data sources and multiple models.
Examples: combining satellite and station data, combining projectionsfrom multiple GCMs
Evidence from different sources may be apparently contradictory
Challenge: how to combine and integrate information — and providedefensible assessments of uncertainty?
ContentionRequires careful thought about structure of problem
Graphical methods help to clarify structure
Graphical methods widely used to simplify analysis of complex systems
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 32 / 50
![Page 63: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/63.jpg)
Combining information
Comprehensive analysis may require synthesis of information frommultiple data sources and multiple models.
Examples: combining satellite and station data, combining projectionsfrom multiple GCMs
Evidence from different sources may be apparently contradictory
Challenge: how to combine and integrate information — and providedefensible assessments of uncertainty?
ContentionRequires careful thought about structure of problem
Graphical methods help to clarify structure
Graphical methods widely used to simplify analysis of complex systems
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 32 / 50
![Page 64: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/64.jpg)
Combining information
Comprehensive analysis may require synthesis of information frommultiple data sources and multiple models.
Examples: combining satellite and station data, combining projectionsfrom multiple GCMs
Evidence from different sources may be apparently contradictory
Challenge: how to combine and integrate information — and providedefensible assessments of uncertainty?
ContentionRequires careful thought about structure of problem
Graphical methods help to clarify structure
Graphical methods widely used to simplify analysis of complex systems
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 32 / 50
![Page 65: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/65.jpg)
Graphical methods for problem structuring
Idea: break down complex problem into manageable chunks usingconditional independence judgements:
Judgements take the form “when using X to predict Y , does Z provide anyadditional information?”Often straightforward to make these judgements on subject-matter groundswhen combining information
Judgements encoded in DirectedAcyclic Graph (DAG) with nodesrepresenting all relevant quantities
Arrows represent conditionalprobability distributions — need tospecify models for these
(a) (b)
MW
ML
M̃W
M̃L
MW
ML
β
ψζ
B M̃W
νψε
M̃L
Nobs
ψγ
Mmin
i = 1, . . . , n
Probability calculus then gives distribution of quantities of interest givenall observed nodesAnalysis usually easiest in Bayesian framework
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 33 / 50
![Page 66: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/66.jpg)
Graphical methods for problem structuring
Idea: break down complex problem into manageable chunks usingconditional independence judgements:
Judgements take the form “when using X to predict Y , does Z provide anyadditional information?”Often straightforward to make these judgements on subject-matter groundswhen combining information
Judgements encoded in DirectedAcyclic Graph (DAG) with nodesrepresenting all relevant quantities
Arrows represent conditionalprobability distributions — need tospecify models for these
(a) (b)
MW
ML
M̃W
M̃L
MW
ML
β
ψζ
B M̃W
νψε
M̃L
Nobs
ψγ
Mmin
i = 1, . . . , n
Probability calculus then gives distribution of quantities of interest givenall observed nodesAnalysis usually easiest in Bayesian framework
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 33 / 50
![Page 67: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/67.jpg)
Graphical methods for problem structuring
Idea: break down complex problem into manageable chunks usingconditional independence judgements:
Judgements take the form “when using X to predict Y , does Z provide anyadditional information?”Often straightforward to make these judgements on subject-matter groundswhen combining information
Judgements encoded in DirectedAcyclic Graph (DAG) with nodesrepresenting all relevant quantities
Arrows represent conditionalprobability distributions — need tospecify models for these
(a) (b)
MW
ML
M̃W
M̃L
MW
ML
β
ψζ
B M̃W
νψε
M̃L
Nobs
ψγ
Mmin
i = 1, . . . , n
Probability calculus then gives distribution of quantities of interest givenall observed nodesAnalysis usually easiest in Bayesian framework
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 33 / 50
![Page 68: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/68.jpg)
Graphical methods: a simple example
Example: measurement error in regression models
Consider data generated as Yi = β0 + β1xi + εi (i = 1, . . . ,n), butcovariate measured as xobs
i = xi + ui where ui is measurement error
Regressing Y upon xobs givesregression dilution bias, i.e.systematic underestimation of slope(NB even if measurement errors arezero on average)
This doesn’t matter if you only wantto predict using xobs
−0.2 0.0 0.2 0.4 0.6 0.8
02
46
x.obs
y
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
True relationshipRegression on true x valuesRegression on observed x values
It does matter if you want to infer true relationship between x and Y
Few methods available to handle this, that are both user-friendly andgenerally applicable
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 34 / 50
![Page 69: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/69.jpg)
Graphical methods: a simple example
Example: measurement error in regression models
Consider data generated as Yi = β0 + β1xi + εi (i = 1, . . . ,n), butcovariate measured as xobs
i = xi + ui where ui is measurement error
Regressing Y upon xobs givesregression dilution bias, i.e.systematic underestimation of slope(NB even if measurement errors arezero on average)
This doesn’t matter if you only wantto predict using xobs
−0.2 0.0 0.2 0.4 0.6 0.8
02
46
x.obs
y
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
True relationshipRegression on true x valuesRegression on observed x values
It does matter if you want to infer true relationship between x and Y
Few methods available to handle this, that are both user-friendly andgenerally applicable
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 34 / 50
![Page 70: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/70.jpg)
Graphical methods: a simple example
Example: measurement error in regression models
Consider data generated as Yi = β0 + β1xi + εi (i = 1, . . . ,n), butcovariate measured as xobs
i = xi + ui where ui is measurement error
Regressing Y upon xobs givesregression dilution bias, i.e.systematic underestimation of slope(NB even if measurement errors arezero on average)
This doesn’t matter if you only wantto predict using xobs
−0.2 0.0 0.2 0.4 0.6 0.8
02
46
x.obs
y
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
True relationshipRegression on true x valuesRegression on observed x values
It does matter if you want to infer true relationship between x and Y
Few methods available to handle this, that are both user-friendly andgenerally applicable
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 34 / 50
![Page 71: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/71.jpg)
Graphical methods: a simple example
Example: measurement error in regression models
Consider data generated as Yi = β0 + β1xi + εi (i = 1, . . . ,n), butcovariate measured as xobs
i = xi + ui where ui is measurement error
Regressing Y upon xobs givesregression dilution bias, i.e.systematic underestimation of slope(NB even if measurement errors arezero on average)
This doesn’t matter if you only wantto predict using xobs
−0.2 0.0 0.2 0.4 0.6 0.8
02
46
x.obs
y
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
True relationshipRegression on true x valuesRegression on observed x values
It does matter if you want to infer true relationship between x and Y
Few methods available to handle this, that are both user-friendly andgenerally applicable
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 34 / 50
![Page 72: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/72.jpg)
Graphical methods: a simple example
Example: measurement error in regression models
Consider data generated as Yi = β0 + β1xi + εi (i = 1, . . . ,n), butcovariate measured as xobs
i = xi + ui where ui is measurement error
Regressing Y upon xobs givesregression dilution bias, i.e.systematic underestimation of slope(NB even if measurement errors arezero on average)
This doesn’t matter if you only wantto predict using xobs
−0.2 0.0 0.2 0.4 0.6 0.8
02
46
x.obs
y
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
True relationshipRegression on true x valuesRegression on observed x values
It does matter if you want to infer true relationship between x and Y
Few methods available to handle this, that are both user-friendly andgenerally applicable
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 34 / 50
![Page 73: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/73.jpg)
Graphical methods: a simple example
Measurement error: setting up the graphical structure
Start with true relationship: Yidepends on xi
Black node: observedBlue node: unobserved
Also, xobsi (measurement)
depends on xi (true value)Next, specify dependence ofeach node upon its ‘parents’:
Yi ∼ N(β0 + β1xi ,σ2)
xobsi ∼ N(xi ,τ
2)
Yixi
i = 1, . . . ,N
x(obs)i
β
σ2
τ2
Dependence specified using models with unknown parameters: τ2
(measurement error variance — may be known), β, σ2
Red node: quantity of interest
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 35 / 50
![Page 74: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/74.jpg)
Graphical methods: a simple example
Measurement error: setting up the graphical structure
Start with true relationship: Yidepends on xi
Black node: observedBlue node: unobserved
Also, xobsi (measurement)
depends on xi (true value)
Next, specify dependence ofeach node upon its ‘parents’:
Yi ∼ N(β0 + β1xi ,σ2)
xobsi ∼ N(xi ,τ
2)
Yixi
i = 1, . . . ,N
x(obs)i
β
σ2
τ2
Dependence specified using models with unknown parameters: τ2
(measurement error variance — may be known), β, σ2
Red node: quantity of interest
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 35 / 50
![Page 75: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/75.jpg)
Graphical methods: a simple example
Measurement error: setting up the graphical structure
Start with true relationship: Yidepends on xi
Black node: observedBlue node: unobserved
Also, xobsi (measurement)
depends on xi (true value)Next, specify dependence ofeach node upon its ‘parents’:
Yi ∼ N(β0 + β1xi ,σ2)
xobsi ∼ N(xi ,τ
2)
Yixi
i = 1, . . . ,N
x(obs)i
β
σ2
τ2
Dependence specified using models with unknown parameters: τ2
(measurement error variance — may be known), β, σ2
Red node: quantity of interest
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 35 / 50
![Page 76: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/76.jpg)
Graphical methods: a simple example
Measurement error: setting up the graphical structure
Start with true relationship: Yidepends on xi
Black node: observedBlue node: unobserved
Also, xobsi (measurement)
depends on xi (true value)Next, specify dependence ofeach node upon its ‘parents’:
Yi ∼ N(β0 + β1xi ,σ2)
xobsi ∼ N(xi ,τ
2)
Yixi
i = 1, . . . ,N
x(obs)i
β
σ2
τ2
Dependence specified using models with unknown parameters: τ2
(measurement error variance — may be known), β, σ2
Red node: quantity of interest
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 35 / 50
![Page 77: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/77.jpg)
Graphical methods: a simple example
Measurement error: prior distributions
Bayesian treatment needs prior distribution for unobserved ‘orphan’nodes (β, σ2, xi & possibly τ2)Use subject-matterconsiderations where possibleIf not, ‘uninformative’ priors e.g.
Normal distributions withlarge variances for regressioncoefficientsGamma distributions forprecision parameters(precision = 1/variance —standard choice)
Yixi
i = 1, . . . ,N
x(obs)i
β
σ2
τ2
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 36 / 50
![Page 78: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/78.jpg)
Graphical methods: comments
Posterior distribution for unknown quantities usually obtained most easilywith Markov Chain Monte Carlo (MCMC) methods e.g. using OpenBUGS(http://www.openbugs.net/), JAGS(http://mcmc-jags.sourceforge.net/), BayesX(http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html) etc.
MCMC methods produce sample from posterior distributionPosterior incorporates all sources of uncertainty represented in graph
Structure of problem automatically determines how to combine / weightinformation — eliminates need for heuristic proceduresBut . . .
Careful checks needed for MCMC outputsResults may be sensitive to prior if model too complex for available dataEffective use requires good understanding of model properties and theirimplications
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 37 / 50
![Page 79: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/79.jpg)
Graphical methods: comments
Posterior distribution for unknown quantities usually obtained most easilywith Markov Chain Monte Carlo (MCMC) methods e.g. using OpenBUGS(http://www.openbugs.net/), JAGS(http://mcmc-jags.sourceforge.net/), BayesX(http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html) etc.
MCMC methods produce sample from posterior distributionPosterior incorporates all sources of uncertainty represented in graph
Structure of problem automatically determines how to combine / weightinformation — eliminates need for heuristic procedures
But . . .Careful checks needed for MCMC outputsResults may be sensitive to prior if model too complex for available dataEffective use requires good understanding of model properties and theirimplications
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 37 / 50
![Page 80: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/80.jpg)
Graphical methods: comments
Posterior distribution for unknown quantities usually obtained most easilywith Markov Chain Monte Carlo (MCMC) methods e.g. using OpenBUGS(http://www.openbugs.net/), JAGS(http://mcmc-jags.sourceforge.net/), BayesX(http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html) etc.
MCMC methods produce sample from posterior distributionPosterior incorporates all sources of uncertainty represented in graph
Structure of problem automatically determines how to combine / weightinformation — eliminates need for heuristic proceduresBut . . .
Careful checks needed for MCMC outputsResults may be sensitive to prior if model too complex for available dataEffective use requires good understanding of model properties and theirimplications
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 37 / 50
![Page 81: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/81.jpg)
Combining information: multimodel ensembles
Multimodel ensembles in climate scienceClimate impact assessments / adaptation strategies mostly based onprojections derived from climate simulators e.g. GCMs, RCMs
Projections vary between simulators⇒ choice of simulator(particularly GCM) often represents significant source of uncertainty
Users advised to consider information from several simulators —multimodel ensemble (MME)
Simulators too computationally expensive for most users to run —usually use existing ensembles e.g. CMIP5
No statistical design — unsystematic sampling of simulators, choice ofruns etc., “ensemble of opportunity”
Questions:How to combine information from multiple simulators?
How to provide decision-relevant uncertainty assessment?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 38 / 50
![Page 82: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/82.jpg)
Combining information: multimodel ensembles
Multimodel ensembles in climate scienceClimate impact assessments / adaptation strategies mostly based onprojections derived from climate simulators e.g. GCMs, RCMsProjections vary between simulators⇒ choice of simulator(particularly GCM) often represents significant source of uncertainty
Users advised to consider information from several simulators —multimodel ensemble (MME)
Simulators too computationally expensive for most users to run —usually use existing ensembles e.g. CMIP5
No statistical design — unsystematic sampling of simulators, choice ofruns etc., “ensemble of opportunity”
Questions:How to combine information from multiple simulators?
How to provide decision-relevant uncertainty assessment?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 38 / 50
![Page 83: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/83.jpg)
Combining information: multimodel ensembles
Multimodel ensembles in climate scienceClimate impact assessments / adaptation strategies mostly based onprojections derived from climate simulators e.g. GCMs, RCMsProjections vary between simulators⇒ choice of simulator(particularly GCM) often represents significant source of uncertainty
Users advised to consider information from several simulators —multimodel ensemble (MME)
Simulators too computationally expensive for most users to run —usually use existing ensembles e.g. CMIP5
No statistical design — unsystematic sampling of simulators, choice ofruns etc., “ensemble of opportunity”
Questions:How to combine information from multiple simulators?
How to provide decision-relevant uncertainty assessment?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 38 / 50
![Page 84: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/84.jpg)
Combining information: multimodel ensembles
Multimodel ensembles in climate scienceClimate impact assessments / adaptation strategies mostly based onprojections derived from climate simulators e.g. GCMs, RCMsProjections vary between simulators⇒ choice of simulator(particularly GCM) often represents significant source of uncertainty
Users advised to consider information from several simulators —multimodel ensemble (MME)
Simulators too computationally expensive for most users to run —usually use existing ensembles e.g. CMIP5
No statistical design — unsystematic sampling of simulators, choice ofruns etc., “ensemble of opportunity”
Questions:How to combine information from multiple simulators?
How to provide decision-relevant uncertainty assessment?
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 38 / 50
![Page 85: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/85.jpg)
Combining information: multimodel ensembles
Example: global temperature projections, 2016–2035
Question: what will global temperatures be in the future?
Information available:‘Observed’temperatures,1986–2005(HadCRUT3)Outputs from 32GCMs, 1986–2005(CMIP5)2016–2035 GCMoutputs (RCP8.5)
1990 2000 2010 2020 2030
1314
1516
17
Global mean temperature
Year
°C
ObservedACCESS1−0ACCESS1−3bcc−csm1−1−mbcc−csm1−1BNU−ESM
CanESM2CCSM4CESM1−BGCCESM1−CAM5CESM1−WACCMCMCC−CESM
CMCC−CMCMCC−CMSCNRM−CM5CSIRO−Mk3−6−0FIO−ESMGFDL−CM3
GFDL−ESM2GGFDL−ESM2MHadGEM2−AOinmcm4IPSL−CM5A−LRIPSL−CM5A−MR
IPSL−CM5B−LRMIROC−ESM−CHEMMIROC−ESMMIROC5MPI−ESM−LRMPI−ESM−MR
MRI−CGCM3NorESM1−MNorESM1−ME
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 39 / 50
![Page 86: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/86.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: issues to considerIndividual simulators have strengths and weaknesses according todevelopers’ priorities
Implication: common approach of assigning (scalar) weights tosimulators is wrong (see next slide)
Simulators share code & have evolved from ‘common ancestors’Implication: simulators share discrepancies with real climate system
Simulators are expensive to develop⇒ full range of modellingdecisions not sampled (∼40 GCMs currently available worldwide)
Implication: must acknowledge potential for as-yet-undevelopedsimulators (as in random effects modelling)
Projections are not predictions: aim is to reproduce statisticalproperties of actual climate
Implication: combining information must focus on statisticaldescriptors of system behaviour
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 40 / 50
![Page 87: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/87.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: issues to considerIndividual simulators have strengths and weaknesses according todevelopers’ priorities
Implication: common approach of assigning (scalar) weights tosimulators is wrong (see next slide)
Simulators share code & have evolved from ‘common ancestors’Implication: simulators share discrepancies with real climate system
Simulators are expensive to develop⇒ full range of modellingdecisions not sampled (∼40 GCMs currently available worldwide)
Implication: must acknowledge potential for as-yet-undevelopedsimulators (as in random effects modelling)
Projections are not predictions: aim is to reproduce statisticalproperties of actual climate
Implication: combining information must focus on statisticaldescriptors of system behaviour
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 40 / 50
![Page 88: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/88.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: issues to considerIndividual simulators have strengths and weaknesses according todevelopers’ priorities
Implication: common approach of assigning (scalar) weights tosimulators is wrong (see next slide)
Simulators share code & have evolved from ‘common ancestors’Implication: simulators share discrepancies with real climate system
Simulators are expensive to develop⇒ full range of modellingdecisions not sampled (∼40 GCMs currently available worldwide)
Implication: must acknowledge potential for as-yet-undevelopedsimulators (as in random effects modelling)
Projections are not predictions: aim is to reproduce statisticalproperties of actual climate
Implication: combining information must focus on statisticaldescriptors of system behaviour
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 40 / 50
![Page 89: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/89.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: issues to considerIndividual simulators have strengths and weaknesses according todevelopers’ priorities
Implication: common approach of assigning (scalar) weights tosimulators is wrong (see next slide)
Simulators share code & have evolved from ‘common ancestors’Implication: simulators share discrepancies with real climate system
Simulators are expensive to develop⇒ full range of modellingdecisions not sampled (∼40 GCMs currently available worldwide)
Implication: must acknowledge potential for as-yet-undevelopedsimulators (as in random effects modelling)
Projections are not predictions: aim is to reproduce statisticalproperties of actual climate
Implication: combining information must focus on statisticaldescriptors of system behaviour
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 40 / 50
![Page 90: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/90.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: why (scalar) weights are silly
Toy example: two simulators Application: length of growingseason (monthly temp≥ 12◦C)
Simulator 1: reasonable meantemp, hopeless seasonality
Simulator 2: vice versa
Both underestimate growingseason length⇒ simulatorweighting alwaysunderestimates (unlessnegative weights allowed . . .)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 41 / 50
![Page 91: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/91.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: formalising the problem (I)
Any statistical descriptor is parameter vector θ in a statistical modelFit statistical models with common structure (‘mimics’) separately toclimate observations and to outputs from each simulator
Obtain estimates θ̂0 (for real climate system), θ̂1, . . . , θ̂m (for msimulators)
E.g. global temperatures show lineartrend in each 20-year period
Linear trend suggests mimic:
Yt = µ0 +µ1 (t− t)+εt , εt ∼N(0,σ2) .1990 2000 2010 2020 2030
1314
1516
17
Global mean temperature
Year
°C
ObservedACCESS1−0ACCESS1−3bcc−csm1−1−mbcc−csm1−1BNU−ESM
CanESM2CCSM4CESM1−BGCCESM1−CAM5CESM1−WACCMCMCC−CESM
CMCC−CMCMCC−CMSCNRM−CM5CSIRO−Mk3−6−0FIO−ESMGFDL−CM3
GFDL−ESM2GGFDL−ESM2MHadGEM2−AOinmcm4IPSL−CM5A−LRIPSL−CM5A−MR
IPSL−CM5B−LRMIROC−ESM−CHEMMIROC−ESMMIROC5MPI−ESM−LRMPI−ESM−MR
MRI−CGCM3NorESM1−MNorESM1−ME
Descriptor vector is
θ =(
µ(historical)0 µ(historical)
1 σ(historical) µ(future)0 µ(future)
1 σ(future))
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 42 / 50
![Page 92: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/92.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: formalising the problem (I)
Any statistical descriptor is parameter vector θ in a statistical modelFit statistical models with common structure (‘mimics’) separately toclimate observations and to outputs from each simulator
Obtain estimates θ̂0 (for real climate system), θ̂1, . . . , θ̂m (for msimulators)
E.g. global temperatures show lineartrend in each 20-year period
Linear trend suggests mimic:
Yt = µ0 +µ1 (t− t)+εt , εt ∼N(0,σ2) .1990 2000 2010 2020 2030
1314
1516
17
Global mean temperature
Year
°C
ObservedACCESS1−0ACCESS1−3bcc−csm1−1−mbcc−csm1−1BNU−ESM
CanESM2CCSM4CESM1−BGCCESM1−CAM5CESM1−WACCMCMCC−CESM
CMCC−CMCMCC−CMSCNRM−CM5CSIRO−Mk3−6−0FIO−ESMGFDL−CM3
GFDL−ESM2GGFDL−ESM2MHadGEM2−AOinmcm4IPSL−CM5A−LRIPSL−CM5A−MR
IPSL−CM5B−LRMIROC−ESM−CHEMMIROC−ESMMIROC5MPI−ESM−LRMPI−ESM−MR
MRI−CGCM3NorESM1−MNorESM1−ME
Descriptor vector is
θ =(
µ(historical)0 µ(historical)
1 σ(historical) µ(future)0 µ(future)
1 σ(future))
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 42 / 50
![Page 93: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/93.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: formalising the problem (II)
Graphical structure of multimodel climate projection problem
b θ0
Reality
bb θ̂0bθ1
bb θ̂1
b
θ2
bb θ̂2
b θ3
bb θ̂3
b θ4
bbθ̂4
bθ5bb θ̂5
Simulators
AYe root of afl evifl
r θ0 +AWant to use estimates θ̂0, θ̂1, . . . , θ̂m to learn about ‘reality’ θ0
Bayesian analysis gives posterior distribution π
(θ0|θ̂0, θ̂1, . . . , θ̂m
)NB explicit representation of shared simulator discrepancyA
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 43 / 50
![Page 94: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/94.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: Gaussian conditional distributionsSimplest case: everything Gaussian:
θ̂i ∼MVN(θi ,Ji) i = 0, . . . ,m
θi = θ0 + δi , δi ∼MVN(A,Ci
)i = 1, . . . ,m
θ0 ∼MVN(µ0,Σ0) , A∼MVN(0,Λ)
{δi} mutually independent givenA
Interpretation of covariance matrices:
{Ji} measures internal variability from each data source
Ci measures (lack of) consensus for simulator i
Λ measures collective tendency of simulators to deviate from reality
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 44 / 50
![Page 95: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/95.jpg)
Combining information: multimodel ensembles
Multimodel ensembles: Gaussian conditional distributionsSimplest case: everything Gaussian:
θ̂i ∼MVN(θi ,Ji) i = 0, . . . ,m
θi = θ0 + δi , δi ∼MVN(A,Ci
)i = 1, . . . ,m
θ0 ∼MVN(µ0,Σ0) , A∼MVN(0,Λ)
{δi} mutually independent givenA
Interpretation of covariance matrices:
{Ji} measures internal variability from each data source
Ci measures (lack of) consensus for simulator i
Λ measures collective tendency of simulators to deviate from reality
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 44 / 50
![Page 96: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/96.jpg)
Combining information: multimodel ensembles
Bayesian analysis of the Gaussian case
Posterior turns out to be MVN(τ,S), where:
S−1 = Σ−10 + J−1
0 +m
∑i=1
W−1i ;
τ = S
[Σ−1
0 µ0 + J−10 θ̂0 +
m
∑i=1
W−1i θ̂i
];
Wi = Di
(I +
m
∑k=1
D−1k Λ
); and Di = Ji + Ci
Colour coding: prior, observations, simulators.
NB Bayes estimate of θ0 is posterior mean τ
S−1 is posterior precision (precision ≡ inverse of variance)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 45 / 50
![Page 97: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/97.jpg)
Combining information: multimodel ensembles
Bayesian analysis of the Gaussian case
Posterior turns out to be MVN(τ,S), where:
S−1 = Σ−10 + J−1
0 +m
∑i=1
W−1i ;
τ = S
[Σ−1
0 µ0 + J−10 θ̂0 +
m
∑i=1
W−1i θ̂i
];
Wi = Di
(I +
m
∑k=1
D−1k Λ
); and Di = Ji + Ci
Colour coding: prior, observations, simulators.
NB Bayes estimate of θ0 is posterior mean τ
S−1 is posterior precision (precision ≡ inverse of variance)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 45 / 50
![Page 98: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/98.jpg)
Combining information: multimodel ensembles
Comments on the Gaussian posterior
Can avoid MCMC here — just plug covariance matrix estimates intoformulae for τ and S (easy to implement)
Can estimate covariance matrices using historical data & simulatorruns (Marianna Demetriou, UCL PhD thesis 2016)But MCMC needed to account for uncertainty in this estimation
τ is matrix-weighted average of simulator outputs and observationsMatrix-valued weights mean each simulator contributes only whereinformative (‘exploit strength, discount weakness’)
Posterior covariance matrix shows explicitly that uncertainty cannotbe reduced indefinitely in presence ofA
Implications for design of climate simulator experiments
Everything is entirely dictated by graphical structure of problem
More details: Chandler, Phil. Trans. Roy. Soc. A, 2013
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 46 / 50
![Page 99: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/99.jpg)
Combining information: multimodel ensembles
Comments on the Gaussian posterior
Can avoid MCMC here — just plug covariance matrix estimates intoformulae for τ and S (easy to implement)
Can estimate covariance matrices using historical data & simulatorruns (Marianna Demetriou, UCL PhD thesis 2016)But MCMC needed to account for uncertainty in this estimation
τ is matrix-weighted average of simulator outputs and observationsMatrix-valued weights mean each simulator contributes only whereinformative (‘exploit strength, discount weakness’)
Posterior covariance matrix shows explicitly that uncertainty cannotbe reduced indefinitely in presence ofA
Implications for design of climate simulator experiments
Everything is entirely dictated by graphical structure of problem
More details: Chandler, Phil. Trans. Roy. Soc. A, 2013
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 46 / 50
![Page 100: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/100.jpg)
Combining information: multimodel ensembles
Comments on the Gaussian posterior
Can avoid MCMC here — just plug covariance matrix estimates intoformulae for τ and S (easy to implement)
Can estimate covariance matrices using historical data & simulatorruns (Marianna Demetriou, UCL PhD thesis 2016)But MCMC needed to account for uncertainty in this estimation
τ is matrix-weighted average of simulator outputs and observationsMatrix-valued weights mean each simulator contributes only whereinformative (‘exploit strength, discount weakness’)
Posterior covariance matrix shows explicitly that uncertainty cannotbe reduced indefinitely in presence ofA
Implications for design of climate simulator experiments
Everything is entirely dictated by graphical structure of problem
More details: Chandler, Phil. Trans. Roy. Soc. A, 2013
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 46 / 50
![Page 101: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/101.jpg)
Combining information: multimodel ensembles
Comments on the Gaussian posterior
Can avoid MCMC here — just plug covariance matrix estimates intoformulae for τ and S (easy to implement)
Can estimate covariance matrices using historical data & simulatorruns (Marianna Demetriou, UCL PhD thesis 2016)But MCMC needed to account for uncertainty in this estimation
τ is matrix-weighted average of simulator outputs and observationsMatrix-valued weights mean each simulator contributes only whereinformative (‘exploit strength, discount weakness’)
Posterior covariance matrix shows explicitly that uncertainty cannotbe reduced indefinitely in presence ofA
Implications for design of climate simulator experiments
Everything is entirely dictated by graphical structure of problem
More details: Chandler, Phil. Trans. Roy. Soc. A, 2013
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 46 / 50
![Page 102: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/102.jpg)
Combining information: multimodel ensembles
Comments on the Gaussian posterior
Can avoid MCMC here — just plug covariance matrix estimates intoformulae for τ and S (easy to implement)
Can estimate covariance matrices using historical data & simulatorruns (Marianna Demetriou, UCL PhD thesis 2016)But MCMC needed to account for uncertainty in this estimation
τ is matrix-weighted average of simulator outputs and observationsMatrix-valued weights mean each simulator contributes only whereinformative (‘exploit strength, discount weakness’)
Posterior covariance matrix shows explicitly that uncertainty cannotbe reduced indefinitely in presence ofA
Implications for design of climate simulator experiments
Everything is entirely dictated by graphical structure of problem
More details: Chandler, Phil. Trans. Roy. Soc. A, 2013
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 46 / 50
![Page 103: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/103.jpg)
Combining information: multimodel ensembles
Incorporating ensemble genealogy
Simulators unlikely to be independent given consensus: GCMs haveevolved in ‘families’ & some have different ‘variants’ (e.g. Knutti et al.,2013, Geophys. Res. Lett.)
Suggests extending framework to accountfor ensemble genealogy
Graphical structure conceptuallystraightforward
Estimation hard due to sparse groupingstructure for covariance estimation e.g.some families with single members . . .
The CMIP5 ‘family tree’
From Knutti et al., 2013,Geophys. Res. Lett.
Obvious with hindsight: extension makes almost no diffferenceunless (a) simulator grouping structure is clear (b)A is small relativeto total variation (Marianna Demetriou, UCL PhD thesis 2016)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 47 / 50
![Page 104: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/104.jpg)
Combining information: multimodel ensembles
Incorporating ensemble genealogy
Simulators unlikely to be independent given consensus: GCMs haveevolved in ‘families’ & some have different ‘variants’ (e.g. Knutti et al.,2013, Geophys. Res. Lett.)
Suggests extending framework to accountfor ensemble genealogy
Graphical structure conceptuallystraightforward
Estimation hard due to sparse groupingstructure for covariance estimation e.g.some families with single members . . .
The CMIP5 ‘family tree’
From Knutti et al., 2013,Geophys. Res. Lett.
Obvious with hindsight: extension makes almost no diffferenceunless (a) simulator grouping structure is clear (b)A is small relativeto total variation (Marianna Demetriou, UCL PhD thesis 2016)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 47 / 50
![Page 105: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/105.jpg)
Combining information: multimodel ensembles
Incorporating ensemble genealogy
Simulators unlikely to be independent given consensus: GCMs haveevolved in ‘families’ & some have different ‘variants’ (e.g. Knutti et al.,2013, Geophys. Res. Lett.)
Suggests extending framework to accountfor ensemble genealogy
Graphical structure conceptuallystraightforward
Estimation hard due to sparse groupingstructure for covariance estimation e.g.some families with single members . . .
The CMIP5 ‘family tree’
From Knutti et al., 2013,Geophys. Res. Lett.
Obvious with hindsight: extension makes almost no diffferenceunless (a) simulator grouping structure is clear (b)A is small relativeto total variation (Marianna Demetriou, UCL PhD thesis 2016)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 47 / 50
![Page 106: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/106.jpg)
Combining information: multimodel ensembles
Incorporating ensemble genealogy
Simulators unlikely to be independent given consensus: GCMs haveevolved in ‘families’ & some have different ‘variants’ (e.g. Knutti et al.,2013, Geophys. Res. Lett.)
Suggests extending framework to accountfor ensemble genealogy
Graphical structure conceptuallystraightforward
Estimation hard due to sparse groupingstructure for covariance estimation e.g.some families with single members . . .
The CMIP5 ‘family tree’
From Knutti et al., 2013,Geophys. Res. Lett.
Obvious with hindsight: extension makes almost no diffferenceunless (a) simulator grouping structure is clear (b)A is small relativeto total variation (Marianna Demetriou, UCL PhD thesis 2016)
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 47 / 50
![Page 107: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/107.jpg)
Combining information: multimodel ensembles
CMIP5 global temperatures under RCP8.5: future reality
1990 2000 2010 2020 2030
1314
1516
17
Observations and model outputs
Year
°C
ObservedACCESS1−0ACCESS1−3bcc−csm1−1−mbcc−csm1−1BNU−ESMCanESM2CCSM4CESM1−BGC
CESM1−CAM5CESM1−WACCMCMCC−CESMCMCC−CMCMCC−CMSCNRM−CM5CSIRO−Mk3−6−0FIO−ESMGFDL−CM3
GFDL−ESM2GGFDL−ESM2MHadGEM2−AOinmcm4IPSL−CM5A−LRIPSL−CM5A−MRIPSL−CM5B−LRMIROC−ESM−CHEMMIROC−ESM
MIROC5MPI−ESM−LRMPI−ESM−MRMRI−CGCM3NorESM1−MNorESM1−ME
1990 2000 2010 2020 203013
1415
1617
Predictive distribution for future
Year
°C
ObservedBest estimateRange
98% interval90% interval80% interval
50% interval
Global temperature model outputs and projections, 1986 to 2035
Static display: showing intervals of specified probability
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 48 / 50
![Page 108: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/108.jpg)
Combining information: multimodel ensembles
CMIP5 global temperatures under RCP8.5: future reality
Dynamic display: showing what might happen
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 48 / 50
![Page 109: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/109.jpg)
Combining information: take-home messages
Graphical methods provide powerful tool for thinking about evidencesynthesis
Note ability to distinguish clearly between simulator outputs, data andreality
Optimum combination of information follows from problem structure — noneed for heuristics
Also offers insights into what additional information / simulator runs wouldbe most useful
Simplified representation of data structure may be adequate if somesources of variation are relatively unimportant (e.g. ensemblegenealogies)
Thorough exploratory analysis can save effort in the long run . . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 49 / 50
![Page 110: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/110.jpg)
Combining information: take-home messages
Graphical methods provide powerful tool for thinking about evidencesynthesis
Note ability to distinguish clearly between simulator outputs, data andreality
Optimum combination of information follows from problem structure — noneed for heuristics
Also offers insights into what additional information / simulator runs wouldbe most useful
Simplified representation of data structure may be adequate if somesources of variation are relatively unimportant (e.g. ensemblegenealogies)
Thorough exploratory analysis can save effort in the long run . . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 49 / 50
![Page 111: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/111.jpg)
Combining information: take-home messages
Graphical methods provide powerful tool for thinking about evidencesynthesis
Note ability to distinguish clearly between simulator outputs, data andreality
Optimum combination of information follows from problem structure — noneed for heuristics
Also offers insights into what additional information / simulator runs wouldbe most useful
Simplified representation of data structure may be adequate if somesources of variation are relatively unimportant (e.g. ensemblegenealogies)
Thorough exploratory analysis can save effort in the long run . . .
Richard Chandler ([email protected]) Data, models and reality CORDEX 2016 49 / 50
![Page 112: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/112.jpg)
Summary
Modern statistics involves consideration of data structure and underlyingprocess mechanisms
Model-based statistical thinking helps to tackle complex problemsconvincingly
NB also provides complete transparency because assumptions are definedby model
Most appropriate statistical methodology requires consideration of dataavailability as well as process mechanisms
If interpolation / gridding is required to use a statistical method, askwhether there is a more appropriate method
Note some ideas for visualising uncertainty e.g. fog, animations
It’s hard but worth it!
, Thank you for your attention ,
![Page 113: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/113.jpg)
Summary
Modern statistics involves consideration of data structure and underlyingprocess mechanismsModel-based statistical thinking helps to tackle complex problemsconvincingly
NB also provides complete transparency because assumptions are definedby model
Most appropriate statistical methodology requires consideration of dataavailability as well as process mechanisms
If interpolation / gridding is required to use a statistical method, askwhether there is a more appropriate method
Note some ideas for visualising uncertainty e.g. fog, animations
It’s hard but worth it!
, Thank you for your attention ,
![Page 114: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/114.jpg)
Summary
Modern statistics involves consideration of data structure and underlyingprocess mechanismsModel-based statistical thinking helps to tackle complex problemsconvincingly
NB also provides complete transparency because assumptions are definedby model
Most appropriate statistical methodology requires consideration of dataavailability as well as process mechanisms
If interpolation / gridding is required to use a statistical method, askwhether there is a more appropriate method
Note some ideas for visualising uncertainty e.g. fog, animations
It’s hard but worth it!
, Thank you for your attention ,
![Page 115: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/115.jpg)
Summary
Modern statistics involves consideration of data structure and underlyingprocess mechanismsModel-based statistical thinking helps to tackle complex problemsconvincingly
NB also provides complete transparency because assumptions are definedby model
Most appropriate statistical methodology requires consideration of dataavailability as well as process mechanisms
If interpolation / gridding is required to use a statistical method, askwhether there is a more appropriate method
Note some ideas for visualising uncertainty e.g. fog, animations
It’s hard but worth it!
, Thank you for your attention ,
![Page 116: Data, models and reality: a modern statistical perspective](https://reader033.vdocuments.mx/reader033/viewer/2022051504/5868d4b21a28ab847d8b82f6/html5/thumbnails/116.jpg)
Summary
Modern statistics involves consideration of data structure and underlyingprocess mechanismsModel-based statistical thinking helps to tackle complex problemsconvincingly
NB also provides complete transparency because assumptions are definedby model
Most appropriate statistical methodology requires consideration of dataavailability as well as process mechanisms
If interpolation / gridding is required to use a statistical method, askwhether there is a more appropriate method
Note some ideas for visualising uncertainty e.g. fog, animations
It’s hard but worth it!
, Thank you for your attention ,