parameter and quantile estimation for the generalized extreme-value distribution

ENVIRONMETRICS, VOL. 5,417-432 (1994)

PARAMETER AND QUANTILE ESTIMATION FOR THE GENERALIZED EXTREME-VALUE DISTRIBUTION

ENRIQUE CASTILLO Depurtnient of Applied Muthematics and Computationul Sciences, University of Cantubria, 39005 Satitunder. Spain

AND

ALI S. HAD1 Department of Statistics. Cornell Universitv, U.S.A.

SUMMARY

The generalized extreme-value distribution (GEVD) was introduced by Jenkinson (1955). It is now widely used to model extremes of natural and environmental data. The GEVD has three parameters: a location parameter (-cc < X < m), a scale parameter (a. > 0) and a shape parameter (-m < k < cc). The tra- ditional methods of estimation (e.g., the maximum likelihood and the moments-based methods) have problems either because the range of the distribution depends on the parameters, or because the mean and higher moments d o not exist when k 5 - 1. The currently favoured estimators are those obtained by the method of probability-weighted moments (PWM). The PWM estimators are good for cases where -1 /2 < k < 1/2. Outside this range of k , the PWM estimates may not exist and if they do exist they cannot be recommended because their performance worsens as k increases. In this paper, we propose a method for estimating the parameters and quantiles of the GEVD. The estimators are well-defined for all possible combinations of parameter and sample values. They are also easy to compute as they are based on equations which involve only one variable (rather than three). A simulation study is implemented to evaluate the performance of the proposed method and to compare it with the PWM. The simulation results seem to indicate that the proposed method is comparable to the PWM for - I / ? < k < 1/2 but outside this range it gives a better performance. Two real-life environmental data sets are used to illustrate the methodology.

K E Y WORDS Least median of squares Maximum likelihood Median Method of moments Order statistics Probability-weighted moments

1. INTRODUCTION

The generalized extreme-value distribution (GEVD) is a three-parameter distribution whose cumulative distribution function ( C D F ) is given by

exp{-[I - k ( x - X ) / C Y ] ' / ~ } , k # 0,

exp{-exp[-(s - X)/a]}, k = 0. F(.u; k , A, a ) =

Here, X,n > 0 and k are location, scale and shape parameters, respectively. The range of .Y is .Y < X + a / k for k > 0 and x > X + a / k f o r k < 0. T h e corresponding inverse distribution

CCC 1 1 80-4009/94/0404 17- 16 0 1994 by John Wiley & Sons, Ltd.

Received 13 January 1994 Revised 30 May 1994

418

function is given by

E. CASTILLO AND A. HAD1

X + a{ 1 - (-log ~ ) ~ } / k , k # 0, A - alog(-log p ) , k = 0,

4 P ) =

where 0 5 p 5 I . This distribution was introduced by Jenkinson.' The Gumbel distribution is obtained as a

special case of the GEVD when k = 0. The GEVD has been used to model extremes of natural and environmental data. For example, according to Hosking et ~ l . , ~ the GEVD was recommended by the Flood Studies Report of the Natural Environment Research Council3 for modelling the distribution of annual maxima of daily streamflows of British rivers. Other examples and uses of the GEVD (e.g. floods, waves, winds, etc.) are given in Castillo4 and in several articles in Tiago de Oliveira;' see also Castillo.6

Several methods have been proposed for estimating the parameters of the GEVD. Jenkinson' uses the method of sextiles to estimate the parameter and quantiles of the GEVD. The maximum likelihood method (ML) has been considered by Jenkinson' and Prescott and Walden.8'9 Smith" considers the applicability of ML and discusses non-regular cases. The maximum likelihood estimates (MLE) require numerical solutions, and for some samples, the likelihood may not have a local maximum. Furthermore, for k > 1, the likelihood can be made infinite and hence the MLE do not exist. Hosking et ~ 1 . ~ suggest estimating the parameters and quantiles of the GEVD by the probability-weighted moments (PWM). The PWM is introduced by Greenwood et al." Hosking et aI.* compare the sextiles, the ML, and PWM by a simulation study. They find that the PWM outperform the sextiles and the ML in many cases. Hosking et ~ l . , ~ however, consider only cases where the shape parameter k lies in the range - 1 /2 < k < 1 / 2 because it has been observed in practice that k usually lies in this range. While the PWM performs quite admirably within the above restricted range of k , it has problems outside this range. First, when k 5 -1, the mean and other moments of the GEVD do not exist. Second, the PWM estimators for the GEVD as suggested by Hosking el a/.' are based on low-order polynomial approximations which are approximately linear only over this restricted range of k. The simulation study in Section 3 indicates that the performance of the PWM deteriorates as k increases.

In this paper, we introduce an alternative method for estimating the parameters and quantiles of the GEVD. The method is applicable in all possible values of the parameters even in those cases where the moments do not exist. This method is presented in Section 2. In Section 3, we carry out a simulation experiment to study the performance of the proposed estimators and to compare them with the currently favoured PWM estimators. The simulation results seem to indicate that: (a) within the above restricted range, the performance of the proposed estimators is comparable with that of the PWM estimators - perhaps the PWM estimators may be preferred for some values of k within this range; (b) outside the above restricted range, the proposed estimators outperform the PWM estimators. Two examples (one for modelling maximum significant wave heights and the other for modelling yearly maximum water discharge of a river) illustrating the proposed method are presented in Section 4. A summary and specific recommendations are given in Section 5 .

2. THE PROPOSED METHOD

The proposed method is a two-stage procedure for estimating the parameters and quantiles of the GEVD. The estimators are based on the order statistics. They are obtained by first equating the CDF evaluated at the observed order statistics to their corresponding percentile values and then

PARAMETER AND QUANTILE ESTIMATION FOR GEVD 419

using the resulting equations as a basis for obtaining initial estimates of the parameters (first stage). These estimates are then combined to obtain a statistically more efficient estimates of the parameters (second stage).

2.1. The first stage: initial estimates

Let 5 xZIn 5 . . . 5 x,,:~ be the order statistics obtained from a random sample from F ( x ; k , & a ) in ( I ) . Let I = { i , j , r } be a set of three distinct indices, where i < j < r E {1,2, . . . ,n}. Then, we write

F(x,y:n; k , 0) 2 Ps:n , s E 1, (3)

where

are suitable plotting positions. The system in (3) is a set of three independent equations in three unknowns, k , X and a. Estimates of k , A, and (Y can then be obtained by solving (3) fork, A, and a. Substituting each of the three elements of I in (3), we obtain

F(xi:n; k , Xi 0) - P L ~ 0,

F ( x ~ : ~ ; k , A, a) - piin % 0 , ( 5 ) F(X,, , ; k , A, (Y) - P ~ , ~ % 0.

Substituting (1) in (5) and arranging terms, we obtain

% X + a{ 1 - (- log p i : n ) k } / k , xi:. E X + a{ 1 - (- log p j , n ) k } / k ,

xr:n 2 X + a{ 1 - (- log p r : J k } / k ,

for k # 0, and

X L n % X - (Y log( - log xi:. N X - (Y log(- log pj:J, (7)

if k is known to be zero. Note that when k = 0, the GEVD is the Gumbel distribution which depends on only two parameters, hence we need only two equations as in (7). Let us first consider the case where k = 0 is not assumed. The solution of (6) is obtained by the elimination method as follows. Eliminating X and a, we obtain.

where Ci = - log(pi:,) and Air = Ci/C,. An initial estimate of k , kiir which depends on x ~ : ~ , x ~ : , and x,:,, is obtained by solving (8). Equation (8) involves only one variable, hence it can be solved easily using the bisection method as outlined in the Appendix. Once Lor is obtained, hjjr and j vr are obtained in a closed form as follows. The estimate kiir is used to obtain

420 E. CASTILLO AND A. HAD1

The estimates kilr and &,ir are then used to obtain

. ..

Now, we consider the case of k = 0. Estimates of N and X can be obtained, by solving (7), as -XI n - AX/ n &,/ =

log c/ - log c, and

Aii = + kii log ci.

Thus, for k = 0, the initial estimates of cr and X depend only on .Y, and x, ,I and are obtained in closed form.

2.2. The second stage: final estimates

The above initial estimates are based on only three-order statistics (two for the case of the Gumbel distribution). More statistically efficient estimates are obtained using other order statistics as follows.

1. Let i = 1 and r = n and compute k, , , hlln and Xl ,n , j = 2 ,3 , . . . , n - 1. 2. Apply a robust function R( . ) to each of the above sets of estimates and obtain

The reason for setting i = 1 and r = n in Step 1 is that when k # 0, the range of the random variable in this case depends on the parameters. We therefore have to ensure that x, , < i + & / k when k > 0 and x1 > + & / k when i < 0. In this way we force parameter estimates to be consistent with the observed data. Examples of the robust function, R(.) in Step 2, include the median and the least median of squares (LMS).I2 Thus, final and overall estimates of k , a and X can be defined as

corresponding overall estimates of k , a, and A.

or

where median(yl,y2,.. . , y n ) is the median of {y1,y2 ,... ,y,}, and L M S ( y l , y 2 , . . . , y n ) is the estimate obtained using the LMS methods, which in this case is equal to the midpoint of the shortest interval containing half of the numbers y l , y2 , . . . , y n (see Rousseeuw and Leroy,13 p. 169).

The estimators in (13) are analogous to the TheilI4 estimator of the slope in the simple regression case which is defined as the median of all possible slopes computed using only two points (see also Sen"). They are also analogous to the Hodges-LehmannI6 estimator of a location parameter which is the median of all pairwise means of the sample.

PARAMETER AND QUANTILE ESTIMATION FOR GEVD 42 1

The quantile estimates for any desired p are then obtained by substituting the above parameter estimates in (2). Note that since the parameter and quantile estimates are well defined for all possible combinations of parameter and sample values, the variances of these estimates (hence, confidence intervals for the corresponding parameter or quantile values) can be obtained using sampling based methods such as the jackknife and the bootstrap meth~ds.’’ , ’~

3. SIMULATIONS

In this section we describe a simulation study carried out to:

(i) study some of the properties of the proposed estimators introduced in Section 2; (ii) evaluate the performance of these estimators; and (iii) compare the proposed estimators with the PWM which is the currently favoured method

for estimating the parameters and quantiles of the GEVD. The PWM estimators are given by

kpwM = 7 . 8 5 9 ~ + 2.9554c2, (15)

where X is the sample mean, 2b, - x log2 c=--- 362 - 2 10g3’

Table I . Bias of parameter and quantile estimators for n = 15

k Method k & i i(0.90) i(0.99) i(0.999)

- 1 .oo -1.00 - 1.00 -0.40 -0.40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00

LMS M E D PWM LMS M E D PWM LMS M E D PWM LMS MED PWM LMS MED PWM LMS M ED PWM LMS M E D PWM

-0.19 -0.19

-0.1 1 -0.11 -0.1 1 -0.09 -0.10 -0.06 -0.06 -0.06 -0.02 -0.05 -0.05 -0.01

0.04 0.03 0.29 0.46 0.45 2.34

* 0.09 0.04

0.06 0.06

-0-03 0.05 0.05

-0.01 0.05 0.05 0.01 0.04 0.05 0.01

-0.1 1 -0.10 -0.09 -2.12 -2.44

-11.80

* -0.03 0.12 -0.08 0.05

-0.03 0.1 1 -0.05 0.10 -0.05 0.06 -0.02 0.1 1 -0.04 0.10 -0.04 0.02 -0.01 0.10 -0.01 0.10 -0.02 0.01

0.01 0.09 0.0 1 0.10

-0.01 0.0 1 0.07 -0.03 0.09 0.03 0.10 -0.13 0.54 0.2 1 0.57 0.03 1.71 - 12.79

* * -0.95 - 1.26 * -0.08 -0.10

0.07 0.06 0.05

-0.02 0.09 0.10

-0.04 0.08 0.09

-0.04 -0.05

0.01 -0.18

0.21 0.03

-12.85

-11.92 - 12.82 * - 1.06 -1.16 -0.10 -0.23 -0.23 -0.22

0.03 0.05

0.03 0.06

-0.1 1 -0.06

0.01 -0.18

0.2 1 0.03

- 12.85

-0.14

* Moments do not exist


Table 11. Bias of parameter and quantile estimators for n = 25

k Method k dr i i(0.90) P(0.99) i(0.999)

- 1 .oo -1.00 - 1 .oo -0.40 -0.40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00

LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM

-0.14 -0.14

-0.10 -0.10 -0.08 -0.08 -0.08 -0.04 -0.05 -0.05 -0.01 -0.04 -0.04 -0.01

0.03 0.03 0.26 0.36 0.37 2.26

* 0.06 0.02

0.05 0.05

-0.02 0.04 0.04 0.00 0.04 0.05 0.01 0.03 0.03 0.01

-0.08 -0.09 -0.1 I -1.54 - 1.70 - 1 I .99

* -0.03 0.04 -0.07 -0.02

-0.03 0.10 -0.04 0.09 -0.04 0.04 -0.0 1 0.10 -0.02 0.09 -0.02 0.02

0.00 0.09 -0.0 1 0.09

0.00 0.01 0.00 0.07 0.00 0.07

-0.0 1 0.0 1 0.04 -0.03 0.06 0.02 0.09 -0.12 0.34 0.02 0.38 0.0 1 1.70 - 12.75

* * - 1.29 - 15.24 - 1.95 -29.61 * *

0.02 -0.39 0.01 -0.45 0.06 -0.06 0.08 -0.05 0.08 -0.06

-0.01 -0.14 0.10 0.09 0.11 0.10

-0.02 -0.07 0.07 0.07 0.08 0.08

-0.01 -0.04 -0.04 -0.04

0.01 0.0 1 -0.14 -0.14

0.02 0.02 0.01 0.01

- 12.79 -12.79


Table 111. Bias of parameter and quantile estimators for n = 50 _ _ _ _ ~ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ~ ~ ~ ~ ~ ~~ ~~ ~ ~

k Method k d i 2( 0.90) P(0.99) P(0.999)

-1.00 LMS -0.10 0.05 -0.02 0.02 -0.85 -7.62 -1.00 MED -0.1 1 0.02 -0.05 -0.01 -1.14 -12.06 -1.00 PWM * * * * * * -0.40 LMS -0.07 0.04 -0.0 1 0.08 0.05 -0.16 -0.40 MED -0.07 0.04 -0 02 0.08 0.04 -0.18 -0.40 PWM -0.04 -0.01 -0.02 0.03 0.03 -0.06 -0.20 LMS -0.05 0.03 -0.02 0.07 0.08 0.04 -0.20 M ED -0.05 0.03 -0.02 0.06 0.08 0.04 -0.20 PWM -0.02 -0.01 -0.01 0.00 -0.02 -0.09

0.20 LMS -0.03 0.03 0.00 0.06 0.07 0.07 0.20 M ED -0.03 0.03 -0.01 0.06 0.08 0.08 0.20 PWM 0.00 0.00 0.00 0.00 -0.01 -0.04 0.40 LMS -0.02 0.03 0.0 1 0.05 0.05 0.05 0.40 MED -0.02 0.03 0.00 0.05 0.06 0.06 0.40 PWM 0.0 1 0.02 0.00 0.01 -0.01 -0.03 2.00 LMS 0.02 -0.05 0.02 -0.03 -0.04 -0.04 2.00 M ED 0.02 -0.05 0.03 0.0 1 0.00 0.00 2.00 PWM 0.23 -0.10 0.07 -0.10 -0.12 -0.12 5.00 LMS 0.20 -0.97 0.21 -0.01 -0.01 -0.01 5.00 MED 0.25 -1.12 0.25 0.00 0.00 0.00 5.00 PWM 2.16 -11.57 1.62 -12'15 -12.18 -12.18



Table IV. Bias of parameter and quantile estimators for n = 100

k Method k (i i f(0.90) f(0.99) f(0.999)

- 1 .oo - 1 .oo - 1 .oo -0.40 -0.40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00


-0.08 -0.09 * -0.06 -0.06 -0.03 -0.04 -0.04 -0.0 1 -0.03 -0.03

0.00 -0.01 -0.01

0.00 0.02 0.02 0.22 0.15 0.17 2.10

0.05 0.03

0.04 0.04 0.00 0.03 0.03 0.00 0.02 0.02 0.01 0.02 0.02 0.00

-0.03 -0.03 -0.12 -0.98 -1.11

-11.82

* 0.00

-0.03

-0.01 -0.02 -0.01 -0.01 -0.02

0.00 -0.01 -0.01

0.00 0.00

-0.01 0.00 0.01 0.02 0.08 0.34 0.26 1.60

* 0.04 0.02

0.08 0.07 0.02 0.06 0.06 0.01 0.04 0.04 0.00 0.03 0.04 0.00

-0.01 0.00

-0.10 0.52 0.01

- 12.43

* -0.46 -0.56

0.08 0.08 0.02 0.08 0.08

-0.01 0.06 0.06 0.00 0.04 0.04 0.00

-0.02 0.00

-0.1 1 0.52 0.00

-12.46

* -3.14 -4.02

0.00 -0.01 -0.02

0.07 0.07

0.07 0.07

-0.01 0.04 0.04

-0.01 -0.02

0.00 -0.1 1

0.52 0.00

- 12.46

*

-0.04


Table V. RMSE of parameter and quantile estimators for n = 15 ~~~~

k Method k I5 i

-1.00 LMS 0.45 0.45 0.38 - 1 .oo MED 0.45 0.44 0.38 - 1 .oo PWM -0.40 LMS 0.32 0.30 0.33 -0.40 MED 0.3 1 0.29 0.33 -0.40 PWM 0.27 0.33 0.32 -0.20 LMS 0.27 0.27 0.33 -0.20 MED 0.27 0.27 0.32 -0.20 PWM 0.24 0.28 0.3 1

0.20 LMS 0.24 0.25 0.3 1 0.20 MED 0.23 0.25 0.29 0.20 PWM 0.23 0.22 0.29 0.40 LMS 0.24 0.24 0.3 1 0.40 MED 0.23 0.24 0.29 0.40 PWM 0.24 0.22 0.29 2.00 LMS 0.57 0.65 0.35 2.00 MED 0.57 0.63 0.35 2.00 PWM 0.56 0.61 0.37 5.00 LMS 1.22 6.80 1.77 5.00 MED 1.21 7.93 1.61 5.00 PWM 2.38 42.46 4.28

* * *

i(0.90) f(0.99) i(0.999)

0.90 1.01

0.49 0.48 0.40 0.39 0.39 0.36 0.25 0.25 0.23 0.21 0.21 0.18 0.18 0.08 0.18 3.20 0.24

54.17

* 6.97 8.02

1.48 1.58 0.67 0.77 0.79 0.60 0.29 0.29 0.33 0.23 0.20 0.26 0.21 0.03 0.25 3.21 0.24

54.23

* 119.80 92.89

7.14 8.10 1.38 2.27 2.37 1.31 0.56 0.47 0.60 0.48 0.28 0.43 0.23 0.03 0.26 3.21 0.24

54.23

*



Table VI. RMSE of parameter and quantile estimators for n = 25

k Method k cr i i(0.90) i(0.99) i(0.999)

- 1.00 - 1 .oo -1.00 -0.40 -0.40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00

LMS M ED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM

0.39 0.39

0.24 0.24 0.2 1 0.20 0.20 0.18 0.16 0.16 0.16 0.16 0.16 0.17 0.43 0.43 0.45 1.02 1.01 2.29

* 0.33 0.32

0.24 0.23 0.26 0.2 1 0.21 0.2 1 0.20 0.20 0.17 0.20 0.20 0.16 0.5 1 0.50 0.48 4.3 1 4.39

27.10

* 0.28 0.28

0.26 0.25 0.24 0.25 0.24 0.24 0.25 0.23 0.23 0.24 0.23 0.23 0.26 0.25 0.27 0.94 0.87 3.01

* 1 .oo 1.14

0.44 0.43 0.33 0.33 0.32 0.27 0.2 1 0.20 0.18 0.16 0.16 0.14 0.12 0.04 0.15 2.16 0.04

32.74

* 10.69 17.15

1.10 1.16 0.59 0.63 0.63 0.48 0.23 0.23 0.24 0.15 0.15 0.19 0.13 0.02 0.18 2.16 0.03

32.78

* 164.62 423.08

3.56 3.99 1.23 1.40 1.41 1.02 0.30 0.30 0.39 0.19 0.19 0.28 0.13 0.02 0.18 2.16 0.03

32.78

*


Table VII. RMSE of parameter and quantile estimators for n = 50

k Method k 6 i i(0.90) i(0.99) i(0.999)

-1.00 -1.00 -1.00 -0.40 -0.40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00

LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS MED PWM LMS M ED PWM

0.32 0.32

0.18 0.18 0.15 0.15 0.15 0.13 0.1 1 0.11 0.1 1 0.10 0.1 1 0.12 0.32 0.33 0.36 0.73 0.75 2.18

* 0.26 0.23

0.17 0.16 0.18 0.15 0.15 0.14 0.15 0.15 0.11 0.15 0.16 0.12 0.40 0.37 0.33 2.63 2.75

21.27

* 0.21 0.2 1

0.18 0.18 0.17 0.18 0.17 0.17 0.18 0.17 0.16 0.17 0.16 0.16 0.19 0.17 0.19 0.52 0.55 2.34

* 0.81 0.85

0.33 0.3 1 0.23 0.25 0.24 0.20 0.14 0.14 0.12 0.12 0.12 0.11 0.12 0.02 0.11 0.33 0.0 I

24.90

* 6.04 8.47

0.78 0.79 0.44 0.41 0.42 0.34 0.16 0.16 0.17 0.11 0.1 1 0.13 0.13 0.00 0.13 0.33 0.01

24.93

* 58.78

1 12.42

2.18 2.28 0.91 0.73 0.74 0.61 0.21 0.2 1 0.26 0.12 0.12 0.18 0.13 0.00 0.14 0.33 0.0 I

24.93

*



Table VIII. RMSE of parameter and quantile estimators for n = 100

k Method k h x i(0.90) i(0.99) i(0.999)

- I .oo - 1 .oo - 1 .oo -0.40 -0-40 -0.40 -0.20 -0.20 -0.20

0.20 0.20 0.20 0.40 0.40 0.40 2.00 2.00 2.00 5.00 5.00 5.00


0.27 0.28

0.14 0.15 0.1 I 0.1 1 0.12 0.09 0.08 0.08 0.07 0.07 0.07 0.08 0.25 0.26 0.30 0.59 0.60 2.1 1

* 0.20 0.17

0.14 0.13 0.12 0.12 0.12 0.10 0.12 0.12 0.08 0.12 0.12 0.08 0.29 0.27 0.26 4.00 4.10

1755

* 0.14 0.14

0.14 0.13 0.12 0.13 0.12 0.1 1 0.13 0.12 0.1 1 0.12 0.1 1 0.1 1 0.13 0.12 0.15 3.61 1.40 1.99

* 0.69 0.67

0.27 0.26 0.16 0.20 0.19 0.14 0.1 1 0.11 0.09 0.09 0.09 0.07 0.06 0.01 0.10

16.65 0.08

20.27

* 3.70 4.08

0.54 0.54 0.32 0.34 0.34 0.25 0.13 0.13 0.12 0.08 0.08 0.09 0.06 0.00 0.12

16.65 0.08

20.29

* 23.44 28.55

1.06 1.10 0.61 0.58 0.58 0.43 0.16 0.16 0.17 0.09 0.09 0.12 0.06 0.00 0.12

16.65 0.08

20.29

*


and

( i - I ) ( i - 2 ) . . . ( i - j ) b, = n-’ C , Xpn,j = 1,2. (n - l ) (n - 2 ) . . . (n -1) i= 1

Both the PWM and the proposed estimators depend on the plotting positions in (4). Hosking et al.’ use p i :n = (i - 0 . 3 5 ) / n for the PWM and we use the same value for the proposed method.

Without loss of generality, we set the location parameter X = 0 and the scale parameter a = 1 (the results are invariant with respect to changes in location and/or scale). We consider the following values of k : { - 1, -0.4, -0.2,0.2,0.4,2,5}. The values of k = - 1 represent cases where the moments of the GEVD do not exist, whereas k > 1 represent cases outside the range of values considered by Hosking et af.* We consider sample sizes n = { 15,25,50,100}. The results are based on 1000 simulation runs for each combination of k and n. In each case, the parameters k , a and X as well as the quantiles xCp), for p = {0.90,0.99,0.999}, as only the upper quantile values of the GEVD are usually of interest, are estimated by both methods.

The bias for the estimates of k , a and X and of the quantiles are given in Tables I-IV. The corresponding root mean squared errors (RMSE) are given in Tables V-VIII. Following Hosking et al.,’ the biases and RMSEs of the quantile estimators have been normalized by their true values. The results are summarized as follows:

1. When k = -1, the moments of the GEVD do not exist, hence the LMS and the MED methods are the only viable options. Notwithstanding the non-existence of the moments, the LMS and the MED perform quite well in estimating the parameters of the GEVD.

2 . As can be expected, estimation of extreme quantiles are generally difficult. The poor performance of the LMS and MED for the case k = -1 is due to the fact that the distribution function is almost flat at the true quantiles and also to small sample sizes.


6 -

4

Table IX. Wave-heights data

0

5.6 6.55 6.65 7.35 7.8 7.9 8.0 8.5 9.05 9.1 5 9.4 9.6 9.8 9.9 10.85 10.9

11.1 11.3 11.3 1 1 5 5 11.75 12.85 12.9 13.4

As pointed out by Hosking et d.,' a quantile x(p) cannot be estimated reliably for a sample of size n when p > 1 - l / n . For n = 15, 1 - 1/n = 0.93, and, accordingly, only x(p), for p < 0.93, can be expected to be estimated accurately. For n = 15 (see Table V), the estimates of ~ ( 0 . 9 0 ) have RMSE of 0.90 and 1.01 as compared to 119.8 and 92.89 for ~ (0 .999) . Similarly, for n = 100, 1 - l /n = 0.99, hence ~ ( 0 . 9 0 ) and ~ ( 0 . 9 9 ) are estimated more reliably than x(0.999), as can be seen in Table VIII.

n

Xi:n

4 6 n 10 I 2 1 4

X i:n

4 6 8 10 1 2 1 4

Xi:n

Figure 1. Wave-height data: scatter plots of the estimated versus observed order statistics for the LMS, MED and PWM


3. For k = -0.4 and k = -0.2, the PWM has smaller bias and RMSE than both the LMS and the MED methods.

4. For k = 0.4 and k = 0.2, the three methods produce similar results for estimating the parameters, but the LMS and MED seem to be better for estimating extreme quantiles.

5 . For k = 2 and k = 5 , the LMS and the MED are clear winners. For estimating quantiles, the MED is the best and the LMS is a close second. The RMSE for the PWM are very large as compared to those of the MED and the LMS methods. For example, for k = 5 and n = 50 (see Table VII), the RMSE of the estimate of ~(0 .999) is 0.01 for the MED and 0.33 for the LMS, compared to 24.93 for the PWM estimate.

4. TWO APPLICATIONS

In this section we apply the methods of Section 2 to two real-life data sets: the wave-heights data and the water-discharge data.

4.1. The wave-heights data

The yearly maximum significant wave height in Nyken-Skomvaer (Norway) measured during the period 1949-1976 are taken from Houmb and Overvik’’ and shown in Table IX. The data can be used for the design of sea structures. Figure 1 shows the scatter plot of the observed ( x k n ) versus the estimated (ikn) order statistics. Table X shows the estimated parameters and quantiles obtained from fitting the GEVD to these data for the LMS, MED, and PWM methods. The standard deviations, shown in parentheses in Table X, are obtained from 1000 bootstrap samples. The bootstrap sampling can be performed in two ways: the samples can be drawn directly from the data or they can be drawn parametrically from F ( x ; k, &, i). We have used the first way because the various methods would be applied to the same samples. On the other hand, when the bootstrap samples are taken from F ( x ; k^, 6, i), we would obtain different samples. In practice, however, one may prefer using the parametric bootstrap to obtain the variance of the estimates of a particular method. We should note here that we have also computed the standard deviations using the parametric bootstrap and the results (not shown) are essentially the same. Therefore, we have generated 1000 samples from the data. These samples have been used to obtain the standard deviation of the three estimators.

It can be seen from Figure 1 and Table X that the three methods provide similar results for the parameter estimation, but for estimating the extreme quantile (p = 0.999) the standard deviation for the PWM is about twice the standard deviation for the other two methods.

Table X. The wave-height data: estimated parameters and quantiles and their standard deviations in parentheses

Method k d i i(0.90) i(0.99) X(0.999)

LMS 0.38 2.07 9.00 12.15 13.55 14.13

M ED 0.37 2.06 8.97 12.13 13.56 14.15

PWM 0.34 2.26 9.00 12.56 14.26 15.02

(0.14) (0.25) (0.55) (0.46) (0.47) (0.85)

(0.13) (0.24) (0.51) (0.43) (0.43) (0.75)

(0.16) (0.29) (0.53) (0.46) (0.91) (1.63)

428 E. CASTILLO AND A . HAD1

It is interesting to see from Table X that the bootstrap standard deviation for the PWM estimate of k (0.16) is very close to the asymptotic standard deviation which is given by (0.5633/n)’” = 0.15. Using the bootstrap standard deviations to compute z-statistics for testing the null hypothesis that k = 0, that is, the observations came from a Gumbel distribution, we obtain 2.7,2.9 and 2.1 for the LMS, MED and PWM, respectively. Therefore, one may conclude that k # 0 and the null hypothesis is rejected, that is, the GEVD provides a better fit for these data than does the Gumbel distribution.

Furthermore, the three methods give a narrow range in estimating k (0.34-0.38). It is then reasonable to expect that k is between 0 and 1/2. For this range of values, the simulation results of Section 3 seem to indicate that the three methods give similar parameter estimates, but the LMS and the MED give better quantile estimates than does the PWM method. As can be seen from Table X, this conclusion holds for this data set as well.

The above z-statistics are based on the assumption that if k = 0, then the distribution of k is approximately normal with mean zero and standard deviation equal to the bootstrap estimate. A referee has asked for a validation of this assumption. We, therefore, have performed a simulation for the case k = 0. The normal probability plots for k ( L M S ) obtained from 1000 simulation runs for n = 15 and n = 100 are shown in Figure 2. The plots clearly indicate that the distribution of i ( L M S ) is normal even for sample sizes as small as n = 15. All the plots (not shown) for b ( L M S ) and i ( L M S ) , i ( M E D ) , &(,WED) and i ( M E D ) also exhibit the same structure. Thus the simulation results indicates that the proposed estimators are normal. Furthermore, one can argue that the estimators are also consistent. This argument is given in the Appendix.

A

k

- 2 I J 2

Normal scores

I I I 1

Normal scores

Figure 2. Normal probability plot for k for n = 15 and 100 for the case k = 0 based on 1000 simulation runs


Table XI. Water-discharge data

4.8 7.3 7.9 8.5 10.7 14.2 14.3 16.9 19.0 19.1 19.6 21.0 22.7 24.0 25.4 28.3 28.3 28.8 31.0 31.0 32.6 33.3 33.9 37.0 40.4 44.8 44.8 47.1 47.8 50.2 51.0 51.6 64.4 65.3 66.2 72.5 73.4 73.4 78.6 84.0

4.2. The water-discharge data

The yearly maximum water discharge of the Ocmulgee river measured at Macon between 1910 and 1949, as reported in Gumbel,” are shown in Table XI. The data can be used for flood protection design. The estimated parameters and the quantiles obtained from fitting the GEVD to the data using the LMS, MED and PWM methods are shown in Table XII. The standard deviations are based on 1000 bootstrap samples. The same bootstrap samples are used for the three methods. Although the standard deviations of the parameter estimates for the three methods are comparable, the PWM has much higher standard deviations for the extreme quantile estimators.

Figure 3 shows the scatter plots of the observed versus the estimated order statistics. Although the scatter plots look similar, Table XI1 shows that the PWM has somewhat larger standard deviations than the LMS and MED estimators. Therefore, the MED and the LMS methods seem to provide a better fit than the PWM method.

As in the wave-heights data, the bootstrap standard deviation for the PWM estimate of k , which is 0.10, is close to the asymptotic standard deviation which is (0.5633/n)’/’ = 0.12. The z- statistics for testing k = 0 are 0.6, 1.0 and 0.4 for the LMS, MED and PWM, respectively. Thus, the null hypothesis cannot be rejected in this case and one may conclude that the Gumbel distribution provides a simpler model for the water-discharge data than does the GEVD .

5. SUMMARY AND CONCLUDING REMARKS

In this paper we propose a method for estimating the parameters and the quantiles of the GEVD. The proposed estimators are defined for all possible combinations of parameter and sample values. Because the estimates are based on solving only one equation in one unknown and upper and lower bounds are given, they are easy to compute.

The methods are illustrated by two real-life data sets. In addition, a simulation study is carried out to evaluate the performance of the proposed estimators and to compare them with the PWM estimators. The results seem to indicate that the proposed method works well over a wide range

Table XII. Water-discharge data: estimated parameters and quantiles and their standard deviations in parentheses

~ ~

Method k d i i(0.90) i (0.99) i(0.999) ~~ ~ ~ ~ ~

LMS 0.05 13.87 25.19 54.78 82 49 106.85

MED 0.08 14.46 26.40 56.34 82-63 104.30

PWM 0.04 19.01 26.82 67.63 106.27 140.64

(0.09) (2.02) (4.17) (5.38) (5.08) (11.19)

(0.08) (1.65) (3.65) (4.54) (3.89) (8.87)

(0.10) (2.62) (3.58) (5.24) (9.14) (23.45)


0 2 0 4 I1 h i , H 0 IUU

1 0 0 , H O 1 MED

1) 2 0 4 0 I> u HU 1 0 0

1 2 0 0

PWM I n o 0

Figure 3. Water-discharge data: scatter plots of the estimated versus observed order statistics for the LMS, MED and PWM estimates

of parameter values even in cases where the moments do not exist. They are comparable to the PWM estimators in the range -1/2 < k < 1/2, but they give a better performance outside this range.

APPENDIX: AN ALGORITHM FOR SOLVING EQUATION (8)

In this Appendix we give one method for solving equation (8) and provide a heuristic proof that the proposed estimators are consistent.

To solve equation (8) we need to find the zero, ko, of the function

1 - A,”, 1 - A% h(k ) = ___ - Dijr


where

Since .xkn L xitn I s,:,, we have C, > C, > C,. Then it is easy to verify that the function h(k) has the following properties:

1. h ( k ) is a decreasing function of k;

3 . h(0) = % - D,, =+ 1 5 h(0) L: -1;

2. It(-..) = 1 - D,, + 0 5 15-03) 5 1;

4. h(03) = -D,,, + 0 L: h(m) 5 -1;

5. h ( ~ ) = o if D ~ , ~ = 2; 6. if D,,, < then h(0) > 0, which implies ko > 0;

7. if D,,, > % then h(0 ) < 0, which implies ko < 0.

In addition, for k > 0 we have

h+(k) = A; - D,,, > h ( k ) ,

that is, h+(k) bounds h(k ) from above, and then its zero

is an upper bound for the zero of h ( k ) . Similarly, for k < 0 we have

K ( k ) = 1 -A:, - D,,. < h ( k ) , (22) that is, h-(k) bounds h(k ) from below, and then its zero

is a lower bound for the zero of h(k) . Thus, (2 1) and (23) suggest the following algorithm for computing k which satisfies (8):

1. If D,,, < log(AJr)/log(Air), then use the bisection method on the interval (O,k+) to find k . 2. If Do, > log(Ajr)/ log(Ajr), then use the bisection method on the interval ( k - , 0) to find k.

We now argue for the consistency of the estimator kirj obtained above. We note the following:

(i) h ( k ) in (18) is a continuous function of k for all n > 2; (ii) as n + 00, pirn - F ( . y j r n ) + 0 or, equivalently, -+ Xjrn, where X j Z n is the true population

(iii) since i < j < r the denominator of (19)

We therefore conclude that, as n + 03, equation (18) holds exactly, which implies that k;, is

value; and - xryn # 0 with probability 1, as n -+ 03.

consistent.

ACKNOWLEDGEMENTS

We thank an anonymous referee for the careful reading of an ealier version of this article and for the helpful suggestions which have given rise to a substantially improved manuscript. This paper


was written while Ali S. Hadi was visiting the Department of Applied Mathematics and Computing Sciences of the University of Cantabria (Spain). The authors are also grateful to the Spanish Ministry of Education for a partial support of this visit.

REFERENCES

1. Jenkinson, A. F. ‘The frequency distribution of the annual maximum (or minimum) of meteorological events’, Quarterly Journal of the Royal Meteorological Society, 81, 158- 171 (1955).

2. Hosking. J. R. M.. Wallis, J. R. and Wood, E. F. ‘Estimation of the generalized Pareto distribution by the method of probability-weighted moments’, Technometrics, 27, 25 1-261 (1985).

3. Natural Environment Research Council. Flood Studies Report, Vol. 1, Natural Environment Research Council, London, 1975.

4. Castillo, E. ‘Extremes in engineering applications’. Proceedings of the Conference on Extreme Value Theory and Applications, Volume 1, (Janos Galambos, James Lechner and Emil Simiu (eds)), Gaithersburg, Maryland, 15-42, 1994.

5. Tiago de Oliveira, J . (1984), Statistical Extremes and Applications. NATO AS1 Series, D. Reidel, Dordrecht.

6. Castillo, E. Extreme Value Theory in Engineering, Academic Press, Boston, 1988. 7. Jenkinson, A. F. ‘Statistics of extremes’. Technical Note 98. World Meteorological Ojice, Geneva,

1969. 8. Prescott, P. and Walden, A. T. ‘Maximum likelihood estimation of the parameters of the generalized

extreme-value distribution’, Biometrika, 67, 723-724 (1980). 9. Prescott, P. and Walden. A. T. ‘Maximum likelihood estimation of the parameters of the three-

parameter generalized extreme-calue distribution’, Journal of Statistical Computing and Simulation, 16,

10. Smith, R. L. ‘Maximum likelihood estimation in a class of nonregular cases’, Biometrika, 72, 67-90 ( 1 985).

11. Greenwood, J. A,, Landwehr, J. M., Matalas, N . C. and Wallis, J. R. ‘Probability weighted moments: definition and relation to parameters of several distributions expressible in inverse form’, Water Resources Research, 15, 1049-1054 (1979).

12. Rousseeuw, P. J . ‘Least median of squares regression’, Joiirnal ofthe Ainerican Statistical Association,

13. Rousseeuw, P. J. and Leroy, A. M. Robust Regression and Outlier Detection, Wiley, New York, 1987. 14. Theil, H. ‘A rank invariant method of linear and polynomial regression analysis’, (Parts l-3), Ned.

Akad. Wetensch. Proc. Ser. A , 53, 386--392, 521-525, 1397-1412 (1950). IS. Sen, P. K. ‘Estimates of the regression coefficient based on Kendall’s tau’. Journal of the American

Statistical Association, 63, 1379-1389 (1968). 16. Hodges, J. L., Jun., and Lehmann, E. L. ‘Estimates of location based on rank tests’, The Annuls of

Mathematical Statistics, 34, 598-61 1 (1963). 17. Efron, B. ‘Bootstrap methods: another look at the jackknife’, The Annals of Statistics, 7, 1-26 (1979). 18. Diaconis, P. and Efron, B. ‘Computer intensive methods in statistics’, ScienriJk American, 248, 116-

130 (1983). 19. Houmb, 0. G. and Overvik, J. ‘On the statistical properties of 155 wave records from the Norwegian

continental shelf, Division of Ports and Ocean Engineering, The University of Trondheim, Norway, 1974.

20. Gumbel, E. J. ‘Technische Anwendung der Statistischen Theorie der Extreme-Werte’, Schwvizer Archiv, 30, 33-47 (1964).

241-250 (1983).

79, 871-880 (1984).

parameter and quantile estimation for the generalized extreme-value distribution

Documents