a note on approximating the p-values of the k-sample modified baumgartener statistic

www.srl-journal.org Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

96

A Note on Approximating the p-values of the k-sample Modified Baumgartener Statistic Augustine Wong

Department of Mathematics and Statistics, York University 4700 Keele Street, Toronto, Ontario, Canada. M3J 1P3 [email protected]

Abstract

Murakami et al. (2009) proposed a k-sample modified Baumgartner statistic (Vk) to test the equality of k independent distributions. In this paper, the Barndorff-Nielsen formula was proposed to approximate p-values from the limiting distribution of Vk. The main advantages of the proposed method are the efficiency in computations and the implementation simplicity to standard statistical software.

Keywords

Anderson-Darling Test; Barndorff-Nielson Formula; Lugannani and Rice Formula; Saddlepoint Approximation; Singularity

Introduction

Let xjh be the hth observation from the jth population with cumulative distribution function Fj, where h = 1, …, nj and j = 1, …, k. Moreover, Rjh is denoted as the combined-ranks of the k random samples. Assume that F1, …, Fk are independent.

For testing H0: F1 = … = Fk vs Ha: not all equal,

Murakami et al. (2009) proposed the k-sample modified Baumgartner statistic

where

and . They showed that the limiting distribution of Vk is a weighted Chi-square distribution with k-1 degrees of freedom and the corresponding characteristic function of Vk is

Note that c2(t) is the asymptotic characteristic function of the Anderson-Darling test statistic given in Anderson and Darling (1954).

Following the derivation of the limiting distribution of the Anderson-Darling test statistic in Anderson and Darling (1952), Murakami et al. (2009) derived the limiting distribution of V3. Moreover, they pointed out the problems associated to the methodology and recommended the saddlepoint method used in Giles (2001) with the characteristic function (1), to approximate the limiting distribution of Vk for k ≥ 4. More specifically, the moment generating function for the limiting distribution of Vk is written as

and the corresponding cumulant generating function is

By applying the Lugananni and Rice (1980) formula, we have

where and are the density and cumulative distribution functions of the standard normal distribution, respectively,

and is the saddlepoint satisfying

It is well-known that the Lugannani and Rice (1980) method has third-order accuracy. Another well-known fact is that the Lugannani and Rice formula has a singularity point at . Daniels (1987) provided a formula to calculate at the singularity which required the third derivative of . Numerically,

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013 www.srl-journal.org

97

the calculations are unstable in the neighborhood of the singularity point, which may result in having

outside the interval [0, 1]. Fraser et al. (2003) gives a bridging formula to smooth the approximated cumulative distribution function obtained by the Lugannani and Rice formula.

Murakami et al. (2009) also considered using the characteristic function (2) and applied the Lugannani and Rice formula to approximate . In Section 2, we will point out Murakami et al. (2009) did not use the complete moment generating function of Vk. Upon correcting the mistake, an althernate third order method is proposed to approximate . This proposed approximation still has the singularity point at , but for all other values of v, it will always be within [0, 1]. Numerical results are presented in Section 3, and some concluding remarks are given in Section 4.

Alternate Method of Obtaining the Asymptotic Cumulative Distribution Function of Vk

Since

we have

Hence the asymptotic mgf for Vk based on (2) can be written as

and the corresponding cumulant generating function is

Note that for -1/8 < t < 1 the moment generating function of Vk given above is equation (2.2) of Murakami et al. (2009). In other words, the complete moment generating function is exclusive from consideration.

The Lugannani and Rice formula can be applied to approximate . Computationally, using (1) has the advantage that the derivatives of the cumulant generating function can be obtained easily. However, it involves infinite summation and this leads to two problems: slower computation as a large upper limit of the summation has to be used, and problem in solving for the saddlepoint. Although the derivatives

of the cumulant generating function obtained from (2) are more complicated, the saddlepoint computation is more straight forward and can be easily handled by simple numerical methods. In the Appendix, a sample R code is given to illustrate the simplicity of the proposed calculations, without inputing the explicit forms of the derivatives of the cumulant generating function.

Since the Lugannani and Rice formula may give outside the range of [0, 1], an alternate

approximation is preferred. In literature, there has many other methods that have third-order accuracy among which, the method developed in Barndorff-Nielsen (1981, 1986) is proposed because it has the same input as the Lugannani and Rice formula. More specifically, the Barndorff-Nielsen formula takes the form

where r and u are defined in (6) and (7) respectively.

Jensen (1992) showed that the Lugannani and Rice formula and the Barndorff-Nielsen formula are asymptotically equivalent up to third-order of accuracy. Notice approximated by the Barndorff-Nielsen formula will always be between [0, 1].

Numerical Study

For k = 2, Lewis (1961) used extensive Monte Carlo simulation to provide critical points for the Anderson-Darling test. His result is generally treated as the bench mark of all the other approximations. Table 1

www.srl-journal.org Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

98

records obtained from 1. Lewis (1961) method 2. Giles-saddlepoint approximation proposed in

Giles (2001) which is the same as that obtained in Murakami et al. (2009) with k = 2

3. LR-Lugannani and Rice formula using (8) as the mgf for V2

4. BN - Barndorff-Nielsen formula using (8) as the mgf for V2.

The three approximations (Giles, LR, and BN) are almost identical to the results obtained by Lewis (1961).

As in Murakami et al. (2009), we compare obtained from

1. Limiting distribution-method derived from Murakami et al. (2009)

2. Murakami-saddlepoint approximation proposed in Murakami et al. (2009)

3. LR-Lugannani and Rice formula using (8) as the mgf for V3

4. BN-Barndorff-Nielsen formula using (8) as the mgf for V3.

Results are recorded in Table 2. As expected, all the methods give very similar results.

Table 3 records the critical value v obtained from the methods discussed in this paper for various k. Again the results are almost indifferentiable.

Although numerical results obtained by the proposed method are almost identical to those obtained by Maurakami et al. (2009), the advantages of the proposed method are the efficiency in computation because it does not require infinite summation and the simplicity in implementation of the proposed method to standard statistical software such as R. Moreover, the preference of the Barndorff-Nielsen formula over the Lugannani and Rice formula is that

calculated from the Barndorff-Nielsen formula will always be between [0, 1] whereas the Lugannani and Rice formula may produce results beyond this range. Theoretically, both methods have the singularity point but the Daniels (1987) result can be applied to obtain at the singularity point.

Conclusion

In this paper, we corrected the mistake in Maurakami et al. (2009) and proposed an alternate way to approximate the limiting distribution of the k-sample modified Baumgartner statistic to test the equality of k independent distributions. The proposed method is more efficient in computation and can be easily implemented to commonly used statistical software such as R.

Appendix

This is the R code to approximate P(Vk > v) = P(Vk > 3.39934). # specify k and v k <- 3 v <- 3.39934 # cumulant generating function and its first two derivatives cgf <- function(s) { if (s < -0.125) (k-1)/2*log(-2*pi*s/cosh(pi/2*sqrt(-1-8*s))) else (k-1)/2*log(-2*pi*s/cos(pi/2*sqrt(1+8*s))) } dcgf <- function(s){ if (s < -0.125) eval(D(expression((k-1)/2*log(- 2*

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013 www.srl-journal.org

99

pi*s/cosh(pi/2*sqrt(-1-8*s)))), "s")) else eval(D(expression((k-1)/2*log(-2* pi*s/cos(pi/2*sqrt(1+8*s)))), "s")) } d2cgf <- function(s) { if (s < -0.125) eval(D(D(expression((k-1)/2*log(-2* pi*s/cosh(pi/2*sqrt(-1-8*s)))), "s"), "s")) else eval(D(D(expression((k-1)/2*log(-2* pi*s/cos(pi/2*sqrt(1+8*s)))), "s"), "s")) } # solving for saddlepoint if (v <= 1) { t1 <- -0.2 f1 <- dcgf(t1) – v tol <- 0.000001 error <- 1 while (abs(error) > tol) { t0 <- t1 f0 <- f1 t1 <- t0 - f0/d2cgf(t0) f1 <- dcgf(t1) – v error <- t1-t0 } that <- t1 } if (v > 1) that <- uniroot(function(t0) dcgf(t0)-v, lower=0.1, upper=0.99)$root # proposed methods r <- sign(that)*sqrt(2*(that*v - cgf(that))) u <- that*sqrt(d2cgf(that)) cdflr <- pnorm(r)-dnorm(r)*(1/u-1/r) cdfbn <- pnorm(r + log(u/r)/r) print(cbind(that, v, 1-cdflr, 1-cdfbn))

REFERENCES

Anderson, T.W. and D.A. Darling. “Asymptotic Theory of

Certain Goodness-of-Fit Criteria Based on Stochastic

Processes.” Annals of Mathematical Statistics 23 (1952):

193-212.

Anderson, T.W. and D.A. Darling. “A Test of Goodness of

Fit.” Journal of the American Statistics Association 49

(1954): 765-769.

Barndorff-Nielsen, O.E. “Inference on Full and Partial

Parameters, Based on the Standardized Signed Log-

Likelihood Ratio.” Biometrika 73 (1986): 307-322.

Barndorff-Nielsen, O.E. “Modified Signed Log-Likelihood

Ratio.” Biometrika 78 (1991): 557-563.

Daniels, H.E. “Tail Probability Approximations.”

International Statistics Review 55 (1987): 37-48.

Fraser, D.A.S., N. Reid, R. Li and A. Wong A. “P-Value

Formulas from Likelihood Asymptotics: Bridging the

Singularities.” Journal of Statistical Research 37 (2003): 1-

15.

Giles, D.A.E. “A Saddlepoint Approximation to the

Distribution Function of the Anderson-Darling Test

Statistic.” Communications in Statistics: Simulation and

Computation 30 (2001): 899-905.

Jensen, J.L. “The Modified Signed Log Likelihood Statistic

and Saddlepoint Approximations.” Biometrika 79 (1992):

693-704.

Lewis, P.A.W. “Distribution of the Anderson-Darling

Statistic.” Annals of Mathematical Statistics 32 (1961):

1118-1124.

Lugannani, R. and S.O. Rice. “Saddlepoint Approximation

for the Distribution of the Sum of Independent Random

Variables.” Advance Applied Probability 12 (1980): 475-

490.

Murakami, H., T. Kamakura and M. Taniguchi. “A

Saddlepoint Approximation to the Limiting Distribution

of a k-Sample Baumgartner Statistic.” Journal of the

Japan Statistical Society 39 (2009): 133-141. Augustine Wong is currently a professor in the Department of Mathematics and Statistics of York University, Toronto, Ontario, Canada.

a note on approximating the p-values of the k-sample modified baumgartener statistic

Documents