introduction to modelling extremes marian scott (with thanks to clive anderson, trevor hoey) nerc...

38
Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Upload: amelia-malloy

Post on 28-Mar-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Introduction to modelling extremes

Marian Scott(with thanks to Clive Anderson, Trevor Hoey)

NERC August 2009

Page 2: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Introduction

• Examples of extremes in environmental contexts

• Some statistical models for extremes– Block maxima, Peak over threshold– Including return levels– Return period

• Statistical models for extremes are concerned with the tails of the distributions

Page 3: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Problems

• Normal distribution inappropriate• Bulk of data not informing us about

extremes• Extremes are rare, so not much data• But there are some special statistical

models for extremes– Block maxima, Peak over thresholdRequire parameter estimation which may prove

difficult

Page 4: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Introduction

• Modelling extremes, because we need to know about maxima and minima in many environmental systems to ensure that we know– How strong to make buildings– How high to make sea walls– How to plan for floods– etc

Page 5: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Stream flow

www.nerc-wallingford.ac.uk/ih/nrfa/river_flow_data

Page 6: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 7: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 8: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 9: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 10: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 11: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Background

• Assume typically that we have a time series of observations (eg maximum daily temperature for the last 20 years)

• Assume that the data are independent and identically distributed (e.g might a Normal or Exponential be sensible or do we need other types of distributions?)

• Interest is in predicting unusually high (or low) temperatures

• Our statistical model needs to be good for the ‘tails’ of the distribution

• Meet the distribution and cumulative distribution function

Page 12: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Background

• The usual notation is that assume we have a series of random variables X1,X2,… each with cumulative distribution function F

• Then F(x) is the probability (X<= x)

• Values xp with a specified probability p, of values lying above them in a distribution, known as quantiles– Xp is the (1-p) quantile

• The inverse cumulative distribution function F-1(xp) is such that xp is the value of X such that Prob(X<=xp) =1-p

Page 13: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 14: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

How to communicate risk

• Return level xp is the value associated with the return period 1/p. That is xp is the level expected to be exceeded on average once every 1/p years.

• xp =F-1(1-p)

• P=0.01 corresponds to the 100 year return period

• The return level and return period are some of the most important quantities to derive from the fitted model (and as such are subject to uncertainty).

• A plot of xp vs –log(-log(1-p)) is called a return level plot

Page 15: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Background

• There exists a class of statistical models developed specifically for dealing with this situation

• Generalised Extreme value (GEV) distribution, with three parameters and depending on the values of such parameters, can simplify to give Gumbel, Frechet and Weibull distributions for the maximum over particular blocks of time.

• Assumptions relating to the original time series: should be stationary (ie no trend)

Page 16: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Some simulations

• From the extremes script

– A) simulation of 1000 values from different distributions and draw histograms

– Expect to see very different shapes

– B) use block maxima to look at the distributional shapes for the maximum

Page 17: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

GEV distribution

• Generalised Extreme value (GEV) distribution, has three parameters, location, scale and shape (usually written as , (>0) and

• G(z) =exp{-[1+ (z- )/ ]-1/ }

• The Gumbel, Frechet and Weibull are all special cases depending on value of

Page 18: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Block maxima

• We can also break our time series X1, X2..into blocks of size n and only deal with the maximum or minimum in the block.

• E.g if we have a daily series for 50 years, we could calculate the annual maximum and fit one of the statistical models mentioned earlier to the 50 realisations of the maxima.

• GEV can then be applied to the block maxima etc• Quite wasteful of data (throws lots away)

Page 19: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Fitting and model diagnostics for GEV

• Fitting by maximum likelihood (may need to be done numerically, so convergence issue)

• Probability plot• Quantile plot• Return level plot• Density plot

• Probability and quantile plot should be straight lines.• All possible in the ismev library

Page 20: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

POT modelling

• There exists another type of statistical model developed specifically for dealing with this situation- known as Peak over threshold- (POT) modelling

• Again we assume that we have a time series of observations, and define (somehow) a threshold u.

• Typical distributions used here are Pareto, Beta and Exponential derived from the Generalised Pareto distribution (GPD) for the exceedances

• How to define the threshold u is a practical issue.

Page 21: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009
Page 22: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

GPD model

• Asymptotic (so as u-> ) then distribution of y (given y>u) is

• H(y) = 1-(1+y/)-1/

and are shape and scale parameters

=0 gives the exponential distribution with mean =

• How to define the threshold u is the big practical question

Page 23: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Definition of return levels for POT

• The level xm that is exceeded once every m observations is the solution of

u[1+ (x-u)/]-1/ = 1/m

• where u is Pr(X>u)

• Choose u such that GPD is a ‘good’ fit

Page 24: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

issues

• Non-stationarity- eg in climate change there are trends in frequency and intensity of extreme weather events

• There are cycles- annual, diurnal etc these are rather common

• other.

• What should be done?• If there is a trend or cyclical component, then we need to

de-trend/deseasonalise• Perhaps introduce covariates that can explain the non-

stationarity

Page 25: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Issues specifically for POT modelling

• Often threshold exceedances are not independent.• Various ways to deal with this

– Model the dependence

– declustering

• Another approach (depending on the application) might be to model the frequency and intensity of threshold excesses

• Mean number of events in an interval [0, T] is T, where is the frequency of occurrence of an event (so a rate)

Page 26: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Example: Flood Estimation

• AIM: to estimate the probability of an extreme event occurring in a given time period

• In hydrology, there is a long history of methods designed to deal with extremes

Page 27: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Annual Floods

• pq = the probability that discharge equals or exceeds q at least once in any given year; pq = annual exceedence probability

• (1 – pq) = probability that this flood does NOT occur in a given year

• Assume: stationarity; no long-memory

Page 28: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Recurrence Interval

• Often refer to recurrence interval of floods (eg 1 in 200 year flood)

• Recurrence interval: the average time between floods equaling or exceeding q

• Recurrence interval (RIq) is the inverse of the exceedence probability (1/pq)

Page 29: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Flow frequency distributions

River Dove

Page 30: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Flow frequency distributions

Page 31: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Estimating RIq

• One approach to estimate the q-year flood from N-years of data

• rank the data from highest (q1) to lowest (qN)

• The exceedence probability and recurrence interval can be estimated from the rank order

• With N = 50, what is the rarest flood that can be estimated?

Page 32: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Estimating Extremes: Graphical Method

• Rank the data from highest (rank=1) to lowest (rank=N)• Estimate plotting positions from the ranks• Compute recurrence intervals

• Plot of q(m) vs RIq(m)

• Fit a line to the data• Extrapolate the best-fit line to the required RI

Page 33: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Example: annual maximum data, Skykomish R, Gold Bar

http://web.mst.edu/~rogersda/umrcourses/ge301/press&siever13.15.png

Page 34: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Analytical Techniques

• Fit an appropriate cumulative distribution function (CDF) to the data

• Fitting requires use of estimation procedures (distribution shapes are not known in advance)

• Use the CDF to estimate the discharge for a particular RI

Page 35: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Example

Gulungul Ck example

Page 36: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

• use the extremes.r script to try out some of simpler of these analyses

Page 37: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Summary

• estimating extremes is inherently unreliable, even with large data sets

• many environmental data sets are short,

•various distributions may be used for estimation – which ones fit best in a particular situation is difficult to assess but diagnostic tools exist

• data are assumed to be stationary – changing driving conditions, and long memory processes, may violate this assumption for many environmental data

Page 38: Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

software and references

• some R packages available –ismev, evd

•Good book –Coles S, An introduction to modelling extremes

•Lots of very recent work looking at statistical models for extremes over space and time