estimation of the shannon’s entropy of several shifted exponential populations

9
Statistics and Probability Letters 83 (2013) 1127–1135 Contents lists available at SciVerse ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro Estimation of the Shannon’s entropy of several shifted exponential populations Suchandan Kayal, Somesh Kumar Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur - 721302, India article info Article history: Received 22 January 2011 Received in revised form 10 January 2013 Accepted 11 January 2013 Available online 20 January 2013 Keywords: Entropy Equivariant estimator Inadmissibility Monotone likelihood ratio Brewster–Zidek technique abstract Estimation of the entropy of several exponential distributions is considered. A general in- admissibility result for the scale equivariant estimators is proved. The results are extended to the case of unequal sample sizes. Risk functions of proposed estimators are compared numerically. © 2013 Elsevier B.V. All rights reserved. 1. Introduction The concept of entropy was introduced by Clausius, Boltzmann and Gibbs in thermodynamics and statistical mechanics in the nineteenth century as a measure of disorder of a physical system. A major boost to the concept was provided by Shannon (1948) who related it to the theory of communication as a measure of information. Suppose a random variable X has the probability density function f θ (x), θ Θ. Then the Shannon’s entropy of the random variable X is defined by H(θ) = E θ (ln f θ (X )). Presently the term entropy has applications in such diverse areas as molecular biology, hydrology, computer science and meteorology. For example, molecular biologists use the concept of Shannon’s entropy in the analysis of patterns in gene sequences. In dynamical systems, entropy is used to measure the exponential complexity of the system. In social studies, entropy is used as a measure of the decay of systems such as organizations, social orders or practices. For a detailed account of importance and applications of the principles of entropy in various disciplines one may refer to Cover and Thomas (1999), Adami (2004), Misra et al. (2005), Robinson (2011) and Liu et al. (2011). There have been attempts by several authors for the parametric estimation of entropy. Lazo and Rathie (1978) obtained entropy expressions of various univariate continuous probability distributions. Ahmed and Gokhale (1989) derived the expressions of entropy of several multivariate distributions. In particular, they studied multivariate normal and exponential distributions and obtained uniformly minimum variance unbiased estimator (UMVUE ) of the entropy. The problem of estimating the entropy of a multivariate normal distribution with respect to the squared error loss function has been further investigated by Misra et al. (2005). They showed that the best affine equivariant estimator (BAEE ) is unbiased and is also generalized Bayes. Further improved estimators were obtained dominating the BAEE. Corresponding author. Tel.: +91 3222283662; fax: +91 3222255303. E-mail addresses: [email protected] (S. Kayal), [email protected], [email protected] (S. Kumar). 0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2013.01.012

Upload: somesh

Post on 03-Jan-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Estimation of the Shannon’s entropy of several shifted exponential populations

Statistics and Probability Letters 83 (2013) 1127–1135

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

Estimation of the Shannon’s entropy of several shiftedexponential populationsSuchandan Kayal, Somesh Kumar ∗

Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur - 721302, India

a r t i c l e i n f o

Article history:Received 22 January 2011Received in revised form 10 January 2013Accepted 11 January 2013Available online 20 January 2013

Keywords:EntropyEquivariant estimatorInadmissibilityMonotone likelihood ratioBrewster–Zidek technique

a b s t r a c t

Estimation of the entropy of several exponential distributions is considered. A general in-admissibility result for the scale equivariant estimators is proved. The results are extendedto the case of unequal sample sizes. Risk functions of proposed estimators are comparednumerically.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

The concept of entropy was introduced by Clausius, Boltzmann and Gibbs in thermodynamics and statistical mechanicsin the nineteenth century as a measure of disorder of a physical system. A major boost to the concept was provided byShannon (1948) who related it to the theory of communication as a measure of information. Suppose a random variable Xhas the probability density function fθ (x), θ ∈ Θ . Then the Shannon’s entropy of the random variable X is defined by

H(θ) = Eθ (− ln fθ (X)).

Presently the term entropy has applications in such diverse areas as molecular biology, hydrology, computer science andmeteorology. For example, molecular biologists use the concept of Shannon’s entropy in the analysis of patterns in genesequences. In dynamical systems, entropy is used to measure the exponential complexity of the system. In social studies,entropy is used as a measure of the decay of systems such as organizations, social orders or practices. For a detailed accountof importance and applications of the principles of entropy in various disciplines onemay refer to Cover and Thomas (1999),Adami (2004), Misra et al. (2005), Robinson (2011) and Liu et al. (2011).

There have been attempts by several authors for the parametric estimation of entropy. Lazo and Rathie (1978) obtainedentropy expressions of various univariate continuous probability distributions. Ahmed and Gokhale (1989) derived theexpressions of entropy of several multivariate distributions. In particular, they studiedmultivariate normal and exponentialdistributions and obtained uniformly minimum variance unbiased estimator (UMVUE) of the entropy. The problem ofestimating the entropy of amultivariate normal distributionwith respect to the squared error loss function has been furtherinvestigated by Misra et al. (2005). They showed that the best affine equivariant estimator (BAEE) is unbiased and is alsogeneralized Bayes. Further improved estimators were obtained dominating the BAEE.

∗ Corresponding author. Tel.: +91 3222283662; fax: +91 3222255303.E-mail addresses: [email protected] (S. Kayal), [email protected], [email protected] (S. Kumar).

0167-7152/$ – see front matter© 2013 Elsevier B.V. All rights reserved.doi:10.1016/j.spl.2013.01.012

Page 2: Estimation of the Shannon’s entropy of several shifted exponential populations

1128 S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135

The problem of estimating the Shannon’s entropy in exponential populations is considered here. The exponential distri-bution can be obtained as a distribution with the maximum entropy when a continuous random variable has a given meanand support on the positive real line. Cover and Thomas (1999) describe an application in atmospheric physics. They con-sider the distribution of the height of molecules in the atmosphere. Here the average potential energy of molecules is fixedand the gas tends to the distribution with the maximum entropy subject to the restriction that the average potential energyis constant. In fact the density of atmosphere is known to have an exponential distribution. If σ is the scale parameter ofthe exponential distribution then the expression for the entropy is 1 + ln σ . Therefore in an exponential population, theestimation of entropy is equivalent to estimation of the logarithm of the scale parameter.

It was first observed by Stein (1964) that the BAEE of the normal variance is inadmissible. Brown (1968) gave generalconditions under which Stein type results can be obtained for scale parameter families. However, his results are not applica-ble to many situations such as a shifted exponential distribution. Arnold (1970) proved that the BAEE of the scale parameterin a shifted exponential distribution is inadmissible with respect to a squared error loss. Zidek (1973) extended the result ofArnold to a larger class of bowl-shaped loss functions. The estimators of Arnold and Zidek are not smooth. Brewster (1974)derived a smooth improved estimator, however, it does not dominate the BAEE in the whole parameter space. An improve-ment over the BAEE of the reciprocal of the scale parameter was derived by Sharma (1977). Petropoulos and Kourouklis(2002) derived a class of improved estimators with respect to a scale invariant loss function. Recently Bobotas and Kourouk-lis (2009) have obtained a new class of improving estimators for the scale parameter in the presence of a nuisance parameterunder a scale invariant loss. In particular the result yields a class of estimators improving upon the BAEE of the scale param-eter in an exponential population.

Kayal and Kumar (2011a) considered the problemof estimating the entropy of an exponential distributionwith respect toa linex loss function. For the negative exponentialmodel they proved that the best scale equivariant estimator of the entropyis admissible and minimax. However, for the shifted exponential distribution, due to the presence of nuisance parameter,the sufficient statistic changes and the BAEE of the entropy is shown to be inadmissible (Kayal and Kumar, 2011a). Theestimation of the entropy of k (≥2) negative exponential populations was considered by Kayal and Kumar (2011b) withrespect to the squared error and linex loss functions.

In this paperwe consider the estimation of entropy of k (≥2) shifted exponential populations, when they have a commonscale parameter σ and different location parametersµ1, . . . , µk. Note that thismodel is not covered by theworkmentionedin the previous two paragraphs. Exponential distribution is one of themost widely used distributions in describing lifetimesof components, service times in queueing systems, time periods between two successive occurrences in a Poisson processetc. Recently Pal et al. (2006) have demonstrated that real life data sets on stems sizes of male and female species of dieciousplants as obtained from Sakai and Burries (1985) are fitted by exponential distributions. Dragulescu and Yakovenko (2001)have shown that individual annual income data in USA is fitted verywell by exponential distribution. Here onemay considertheparametersµ1, . . . , µk to denote the income levels belowwhich the tax filing is not required in different states. However,the average income levels may be same due to overall economic policies of the country which is applicable to all citizens.Similarly onemay consider service times at check-in counters of kdifferent airlines at different airports. Here due to differentstarting times, the parameters µ1, . . . , µk may be different but average service times (once the service has started) may besame due to similar nature of trained service persons and equipment used.

In Section 2, we obtain the BAEE for the Shannon’s entropy for our model. A general inadmissibility result for the scaleequivariant estimators is proved. Consequently, a new estimator is obtained which dominates the BAEE under the squarederror loss function. Further, problems of estimating the entropy are considered in restricted parameter spaces and improvedestimators are derived. In Section 3 the results are extended to the casewhen sample sizes are unequal. A heuristic discussionis added in Section 4. A numerical comparison of the risk values of the proposed estimators is presented in Section 5.

2. The best affine equivariant estimator

Let (Xi1, . . . , Xin) be a random sample taken from the populationΠi, i = 1, . . . , k (k ≥ 2). We assume that the k samplesare taken independently. The probability density associated with the populationΠi is given by

fi(x) =

exp

−x − µi

σ

, if x > µi,

0, otherwise.(1)

The expression of the Shannon’s entropy is H(σ ) = k(1 + ln σ). We consider an equivalent problem of estimatingQ (σ ) = ln σ under the squared error loss

L(σ , δ) = (δ − ln σ)2. (2)

On the basis of the i-th sample {Xi1, . . . , Xin}, (Xi(1), Yi) is a complete and sufficient statistic for (µi, σ ), where Xi(1) =

min1≤j≤n Xij, Yi =n

j=1 Xij. Further, we define Zi = Yi − nXi(1). Then Xi(1) and Zi are independently distributed. AlsoXi(1) follows an exponential distribution with location parameter µi and scale parameter σ/n, whereas 2Zi/σ followsa chi-square distribution with 2(n − 1) degrees of freedom (see, for example, Lehmann and Casella, 1998, p. 43). LetX (1) = (X1(1), . . . , Xk(1)) and T =

ki=1 Zi. Then (X (1), T ) is complete and sufficient for (µ, σ ), where µ = (µ1, . . . , µk).

Page 3: Estimation of the Shannon’s entropy of several shifted exponential populations

S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135 1129

It should be noted that X (1) and T are independently distributed. Further, using the additive property of the chi-squaredistribution, it can be shown that 2T/σ follows a chi-square distribution with 2k(n− 1) degrees of freedom. The maximumlikelihood estimator (MLE) of Q (σ ) is δML = ln T − ln(kn). We derive the UMVUE of Q (σ ) as δMV = ln T − ψ(k(n − 1)),where ψ denotes Euler psi (digamma) function, defined as ψ(x) =

ddx (lnΓ (x)).

Consider the transformations ga,bi(xij) = axij + bi, j = 1, . . . , n, i = 1, . . . , k. Here a is kept the same so as to have thecommon scale property to be sustained after transformation. Writing b = (b1, . . . , bk) and ga,b = (ga,b1 , . . . , ga,bk), we seethat under the transformation ga,b,

(µ, σ ) → (aµ+ b, aσ), (X (1), T ) → (aX (1) + b, aT ).

Consequently, we get ln σ → ln σ + ln a. The loss function (2) is invariant under the group Ga,b of affine transformationsga,b, a > 0, b ∈ Rk, if δ → δ + ln a. The form of an affine equivariant estimator is obtained as

δc(X (1), T ) = ln T − c (3)

for any constant c. The following theorem gives the BAEE of Q (σ ).

Theorem 1. Under the squared error loss function (2), the BAEE of Q (σ ) is δc0(X (1), T ), where c0 = ψ(k(n − 1)).

Proof. The risk of the estimators of the form (3) is

R(σ , δc) = E(ln T − c − ln σ)2,

which is minimized for

c = E(ln(T/σ)) = ψ(k(n − 1)) = c0, say.

Hence the result follows. �

Remark 1. The BAEE is also the UMVUE. Also using Jensen’s inequality it can be shown that ψ(k(n − 1)) < ln(kn) whichmeans that the MLE underestimates Q (σ ).

2.1. Improving upon the best affine equivariant estimator

To get an improvement over the BAEE δc0 , we consider a larger class of estimators. Consider the scale group oftransformations Ga = {ga : ga(x) = ax, a > 0}. The problem of estimating Q (σ ) remains invariant with respect to thegroup Ga. Under the transformation ga, we have

(µ, σ ) → (aµ, aσ), (X (1) , T ) →

aX (1), aT )

and therefore, ln σ → ln σ+ln a. It can be also shown that the loss function (2) is invariant under the groupGa if δ → δ+ln a.Therefore we get the form of a scale equivariant estimator as

δφ(W , T ) = ln T + φ(W ), (4)

where W = (W1, . . . ,Wk), Wi = Xi(1)/T and φ is a real valued measurable function. A general inadmissibility result forthe estimators of the form (4) is proved in the theorem below. Let B1 = {w : w(1) > 0}, B2 = {w : u < exp(φ(w)+ ψ(kn))}, B3 = {w : w(k) < 0}, u = n

ki=1wi + 1, w(1) = min{w1, . . . , wk}, w(k) = max{w1, . . . , wk} andwi = xi(1)/t

for i = 1, . . . , k. Also define for a function φ(w) as in (4),

φ0(w) =

ln

n

ki=1

wi + 1

− ψ(kn), ifw ∈

B1

B2

B3

Bc2

φ(w), otherwise.

(5)

Theorem 2. Let δφ be a scale equivariant estimator of the form (4) and φ0(w) be as defined in (5). If there exists some (µ, σ )such that P(µ,σ )(φ0(W ) = φ(W )) > 0, then under the squared error loss function (2), the estimator δφ0 dominates δφ .

Proof. The risk function of the estimators of the form δφ given in (4) can be written as

R(µ, σ , δφ) = EWR1(µ, σ ,W , δφ),

where R1(µ, σ ,w, δφ) denotes the conditional risk of δφ givenW = w given by

R1(µ, σ ,w, δφ) = E[(δφ − ln σ)2|W = w]

= E[(ln(T/σ)+ φ(W ))2|W = w]. (6)

Page 4: Estimation of the Shannon’s entropy of several shifted exponential populations

1130 S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135

We notice that the conditional risk R1(µ, σ ,w, δφ) in (6) is only a function of the ratio µ/σ . Therefore, without loss ofgenerality we can take σ = 1. Again the conditional risk R1 is a convex function of φ, and the choice of φ minimizing R1 canbe obtained as

φ(w,µ) = −E(ln T |W = w). (7)

In order to evaluate the term in (7), we derive the conditional distribution of T given W = w. The joint probability densityof X (1) and T is

f(X(1),T )(x(1), t) =nk

Γ (k(n − 1))e−

nk

i=1(xi(1)−µi)+ttk(n−1)−1, t ≥ 0, xi(1) ≥ µi, i = 1, . . . , k. (8)

Now using the transformationsw1 = x1(1)/t, . . . , wk = xk(1)/t and t = t , we get the joint density ofW and T , as

f(W ,T )(w, t) =nk

Γ

k(n − 1)

e−

nk

i=1(wit−µi)+ttkn−1, t ≥ 0, twi ≥ µi.

To find the marginal density ofW , we integrate f(W ,T )(w, t)with respect to t .Case (i) Suppose all µi’s are non-negative, i = 1, . . . , k:

In this case, t varies from η1 to ∞, where η1 = max{µ1/w1, . . . , µk/wk}. Therefore, the marginal density ofW is

fW (w) =nk

Γ (k(n − 1))

η1

e−

nk

i=1(wit−µi)+ttkn−1dt, wi > 0.

Consequently, the conditional density of T givenW = w is given by

fT |W (t|w) =e−

nk

i=1(wit−µi)+ttkn−1

η1e−

nk

i=1(wit−µi)+ttkn−1dt

, t > η1.

Therefore, we get

E(ln T |W = w) =

η1ln t e−ut tkn−1dt∞

η1e−ut tkn−1dt

.

Substituting the expression of E(ln T |W = w) in (7), we get

φ(w,µ) = ln u −

η′1ln p e−p pkn−1dp∞

η′1e−p pkn−1dp

= ln u − h1(η′

1), say (9)

where η′

1 = η1u. In order to apply the Brewster–Zidek technique (1974) we need to find the supremum and infimum ofφ(w,µ) given in (9). To this end, we show that the density function

e−p pkn−1∞

η′1e−p pkn−1dp

, η′

1 < p < ∞,

has a monotone likelihood ratio property in η′

1 and then apply Lemma 3.4.2, in Lehmann and Romano (2009). Now it can beshown that h1(η

1) is a nondecreasing function in η′

1 and η′

1 lies between 0 to ∞. Thus we get

supη′1

h1(η′

1) = +∞ and infη′1

h1(η′

1) = ψ(kn).

Therefore, from (9) we get

supµ

φ(w,µ) = ln u − ψ(kn) and infµφ(w,µ) = −∞.

Case (ii) Suppose all µi’s are negative, i = 1, . . . , k:In this case several possibilities inwi’s may arise, which are (a) allwi’s are non-negative, (b) allwi’s are negative and (c)

somewi’s are non-negative and remaining are negative. In the following discussion, we investigate all these cases in detail.(a) When allwi’s are non-negative, the range of t is from 0 to ∞. Therefore the marginal density ofW is

fW (w) =nk

Γ (k(n − 1))

0e−

nk

i=1(wit−µi)+ttkn−1dt, wi > 0, i = 1, . . . , k.

Page 5: Estimation of the Shannon’s entropy of several shifted exponential populations

S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135 1131

Consequently, the conditional density of T givenW = w can be obtained as

fT |W (t|w) =e−

nk

i=1(wit−µi)+ttkn−1

0 e−

nk

i=1(wit−µi)+ttkn−1dt

, t > 0.

Therefore, the conditional expectation of ln T givenW = w is given by

E(ln T |W = w) =

0 ln t e−ut tkn−1dt∞

0 e−ut tkn−1dt.

Substituting the expression of E(ln T |W = w) in (7) and integrating, we get

φ(w,µ) = ln u − ψ(kn). (10)

(b) Now we consider the case whenwi’s are negative:In this case, t varies from 0 to η2, where η2 = min{µ1/w1, . . . , µk/wk}. Similar to the Case (a) we derive the conditional

expectation of ln T givenW = w as

E(ln T |W = w) =

η20 ln t e−ut tkn−1dt η2

0 e−ut tkn−1dt.

When u > 0, we have from (7)

φ(w,µ) = ln u −

η′2

0 ln p e−p pkn−1dp η′2

0 e−p pkn−1dp= ln u − h2(η

2), say

whereη′

2 = η2u. Usingmonotone likelihood ratio property as in Case (i),we can show that h2(η′

2) is a nondecreasing functionin η′

2. Thus we get

supη′2

h2(η′

2) = ψ(kn) and infη′2

h1(η′

2) = −∞.

Therefore,

supµ

φ(w,µ) = +∞ and infµφ(w,µ) = ln u − ψ(kn).

Similarly, when u < 0, we get

supµ

φ(w,µ) = +∞ and infµφ(w,µ) = −∞.

(c) For the case when somewi’s are non-negative and the remaining are negative, we show that the results are permutationinvariant. Let (i1, . . . , ik) be a permutation of (1, . . . , k). We assume wij ≥ 0 for j = 1, . . . , r and wij < 0 for j = r + 1,. . . , k, r = 1, . . . , k − 1. Thus the range of t is from 0 to η3, where η3 = min{µir+1/wir+1 , . . . , µik/wik}. In this case theconditional expectation of ln T givenW = w is obtained as

E(ln T |W = w) =

η30 ln t e−ut tkn−1dt η3

0 e−ut tkn−1dt.

Using the arguments as in Part (b), we get the supremum and infimum of φ given in (7), as

supµ

φ(w,µ) = +∞, and infµφ(w,µ) = ln u − ψ(kn),

when u > 0; and

supµ

φ(w,µ) = +∞ and infµφ(w,µ) = −∞,

when u < 0.Case (iii) Some of µi’s are non-negative and remaining are negative:

In this case we show that finding the supremum and infimum of φ(w,µ) is invariant under different permutations inµi’s. We consider the case that within all µi’s any r (r = 1, . . . , k − 1) terms are non-negative and remaining (k − r)terms are negative. Let (i1, . . . , ik) be a permutation of (1, . . . , k) so that µij ≥ 0 for j = 1, . . . , r , and µij < 0 forj = r + 1, . . . , k. Therefore, when µi1 , . . . , µir ≥ 0, all correspondingwij ’s are also non-negative, for j = 1, . . . , r , whereas

Page 6: Estimation of the Shannon’s entropy of several shifted exponential populations

1132 S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135

when µir+1 , . . . , µik < 0 there are several possibilities: all (k − r) wi’s are non-negative, all (k − r)wi’s are negative, someofwi’s are non-negative and remaining are negative. To find the supremum and infimum of φ(w,µ) given in (7) we use thetechnique used in Case (i).

(a) Let us consider the casewi1 , . . . , wir , . . . , wik > 0:Under this case the range of t is from η11 = max{µi1/wi1 , . . . , µir /wir } to ∞. The conditional expectation of ln T given

W = w is given by

E(ln T |W = w) =

η11ln t e−ut tkn−1dt

η11e−ut tkn−1dt

.

Hence, we get

supµ

φ(w,µ) = ln u − ψ(kn) and infµφ(w,µ) = −∞.

(b) Supposewi1 , . . . , wir > 0 andwir+1 , . . . , wik < 0:The range of t is from η12 = max{µi1/wi1 , . . . , µir /wir } to η13 = min{µir+1/wir+1 , . . . , µik/wik}. Therefore, the condi-

tional expectation of ln T givenW = w can be obtained as

E(ln T |W = w) =

η12η11

ln t e−ut tkn−1dt η12η11

e−ut tkn−1dt.

It can be shown as before that

supµ

φ(w,µ) = +∞ and infµφ(w,µ) = −∞.

(c) Letwi1 , . . . , wir > 0 and within (k − r), somewi’s are non-negative and remaining are negative:In this case we again show that the results are also permutation invariant. We consider the case: let (j1, . . . , jk−r) is a

permutation of (ir+1, . . . , ik). Suppose wj1 , . . . , wjm ≥ 0 and wjm+1 , . . . , wjk−r < 0. The range of t is from max{µi1/wi1 ,. . . , µir /wir , µjr+1/wjr+1 , . . . , µjm/wjm} to min{µjm+1/wjm+1 , . . . , µjk−r /wjk−r }. Arguing as earlier, it can be shown that

supµ

φ(w,µ) = +∞ and infµφ(w,µ) = −∞.

An application of the Brewster–Zidek technique (1974) on the function R1(µ, σ ,w, δφ) then completes the proof of thetheorem. �

As a consequence of this theorem we get the following corollary.

Corollary 1. Let C1 = {w : u < ed} and d = ψ(kn) − ψ(k(n − 1)). The BAEE δc0 of Q (σ ) is inadmissible and dominated bythe estimator given by

δIB =

ln(uT )− ψ(kn), if w ∈

B1

C1

B3

C c1

,

ln T − ψ(k(n − 1)), otherwise.

Remark 2. We also study the entropy estimation problemwhen it is known a priori that allµi’s are bounded below. Such asituationmay arisewhen theminimumguarantee timeof components is known tobemore than apre-specified constant dueto physical constraints. In this case one may take without loss of generality that µ(1) ≥ 0, where µ(1) = min{µ1, . . . , µk}.Here the MLE of Q (σ ) is same as the MLE obtained for unrestricted parameter space. The inadmissibility of the BAEE δc0 ofQ (σ ) can be established using the steps of Case (i) of the proof of the Theorem 2. The improved estimator is given by

δ+

IB =

ln(uT )− ψ(kn), ifw ∈ C1,ln T − ψ(k(n − 1)), otherwise.

Remark 3. Wehave also considered the entropy estimationwhen, contrary to the case in Remark 2, the guarantee times areknown to be bounded from above. Here onemay assume a priori thatµ(k) < 0, whereµ(k) = max{µ1, . . . , µk}. In this case,the MLE of Q (σ ) gets modified as δRM = ln T 0

− ln(kn), where T 0=k

i=1(Yi − nX0i(1)), X

0i(1) = min{0, Xi(1)}, i = 1, . . . , k.

This is the restricted maximum likelihood estimator (RMLE) of the entropy. Further, the inadmissibility of the BAEE δc0 isproved using the steps used in Case (ii) of the proof of the Theorem 2. Let C2 = {w : w(r) < 0}, C3 = {w : w(r+1) > 0}. Theimproved estimator is then

δ−

IB =

ln(uT 0)− ψ(kn), ifw ∈ B1

B3

C c1

C2

C3

C c1

ln T − ψ(k(n − 1)), otherwise.

Page 7: Estimation of the Shannon’s entropy of several shifted exponential populations

S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135 1133

3. Unequal sample sizes

The results of the previous section can be extended to the case when random samples with unequal sample sizes aredrawn from k exponential populations. The proofs, though somewhat more complicated, are similar to those of the resultsin Section 2, and hence are omitted. For the sake of completeness, the notation and results have been stated here in fulldetail. Suppose (Xi1, . . . , Xini), i = 1, . . . , k be independent random sample drawn from the populations Π1, . . . ,Πkrespectively with pdf of the i-th population given by (1). On the basis of the i-th sample {Xi1, . . . , Xini}, (X

i(1), Y∗

i ) is acomplete and sufficient statistic for (µi, σ )where X∗

i(1) = min1≤j≤ni Xij, Y ∗

i =ni

j=1 Xij. Let, X∗

(1) = (X1(1)∗ , . . . , X∗

k(1)), T∗

=ki=1ni

j=1(Xij − X∗

i(1)), and N =k

i=1 ni. Therefore, (X∗

(1), T∗) is a complete and sufficient statistic for (µ, σ ). X∗

(1) and T ∗

are independently distributed. Also X∗

i(1) follows exponential distribution with location parameter µi and scale parameterσ/ni and 2T ∗/σ follows Chi-square distribution with 2(N − k) degrees of freedom. The MLE and the UMVUE of Q (σ ) areδ∗

ML = ln T ∗− lnN and δ∗

MV = ln T ∗− ψ(N − k) respectively. The problem under study is also invariant with respect to

Ga,b, the group of the affine transformations. The form of the affine equivariant estimator will be δ∗c (X

(1), T∗) = ln T ∗

− cfor some real value constant c. In the following theorem we get the BAEE.

Theorem 3. Under the squared error loss function (2), the BAEE of Q (σ ) is δ∗

c∗0(X∗

(1), T∗), where c∗

0 = ψ(N − k).

As in Section 2, we can obtain the form of the scale equivariant estimator of Q (σ ) as

δ∗

φ∗(W ∗, T ∗) = ln T ∗+ φ∗(W ∗), (11)

where W ∗= (W ∗

1 , . . . ,W∗

k ) and W ∗

i = X∗

i(1)/T∗. Suppose B∗

1 = {w∗: w∗

(1) > 0}, B∗

2 = {w∗: u < exp(φ(w∗) + ψ(kn))},

B∗

3 = {w∗: w∗

(k) < 0}, u∗=k

i=1 niw∗

i + 1, w∗

(1) = min{w∗

1, . . . , w∗

k }, w∗

(k) = max{w∗

1, . . . , w∗

k } and w∗

i = x∗

i(1)/t∗ for

i = 1, . . . , k. For a function φ∗ in (11), define

φ∗

0 (w∗) =

ln

k

i=1

niw∗

i + 1

− ψ(N), ifw∗

B∗

1

B∗

2

B∗

3

B∗

2c

φ∗(w∗), otherwise.(12)

The following theorem proves a general inadmissibility result for the estimators of the form (11).

Theorem 4. Let δ∗

φ∗ be a scale equivariant estimator of the form (11) andφ∗

0 (w∗) be as defined in (12). If there exists some (µ, σ )

such that P(µ,σ )(φ∗

0 (W∗) = φ∗(W ∗)) > 0, then under the squared error loss function (2), the estimator δ∗

φ∗0dominates δ∗

φ∗ .

In the following corollaries, the improved estimator of the BAEE is given for various cases.

Corollary 2. The BAEE δ∗

c∗0of Q (σ ) is inadmissible and dominated by the estimator given by

δ∗

IB =

ln(u∗T ∗)− ψ(N), if w∗

B∗

1

C∗

1

B∗

3

C∗

1c

ln T ∗− ψ(N − k), otherwise,

where C∗

1 = {w∗: u∗ < ed

} and d∗= ψ(N)− ψ(N − k).

Corollary 3. The BAEE δ∗

c∗0of Q (σ ) is inadmissible when µ(1) ≥ 0 and dominated by the estimator given by

δ∗

IB+

=

ln(u∗T ∗)− ψ(N), if w∗

∈ C∗

1

ln T ∗− ψ(N − k), otherwise.

Corollary 4. The estimator

δ∗

IB−

=

ln(u∗T ∗)− ψ(N), if w∗

∈ B∗

1

B∗

3

C∗

1c

C∗

2

C∗

3

C∗

1c

ln T ∗− ψ(N − k), otherwise,

where C∗

2 = {w∗: w∗

(r) < 0}, C∗

3 = {w∗: w∗

(r+1) > 0}, r = 1, . . . , k − 1 dominates the BAEE δ∗

c∗0of Q (σ ) when µ(k) < 0.

Page 8: Estimation of the Shannon’s entropy of several shifted exponential populations

1134 S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135

0.04

0.102

0.1

0.098

1 0 -1 10.5

0-0.5

-1R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1

µ 2

R

µ1µ 2

R

µ1µ 2

R

µ1

µ 2

a b c

d e f

g h i

j k l

Fig. 1. The risk plot of the estimators δIB, δ+

IB , δ−

IB and δRM for n = (4, 6, 8). Graphs (a, b, c) for δIB , Graphs (d, e, f ) for δ+

IB , Graphs (g, h, i) for δ−

IB and Graphs(j, k, l) for δRM respectively.

4. Heuristic discussion

We have considered the problem of estimating entropy of k shifted exponential populations with a common scale butdifferent locations. The entropy expression is related to the logarithm of the scale parameter. Stein (1964) first showedthat the best equivariant estimator of normal variance is inadmissible. Later this phenomenon was observed for some other

Page 9: Estimation of the Shannon’s entropy of several shifted exponential populations

S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135 1135

scale parameter families including exponential distribution (see Brown, 1968 and Arnold, 1970). Misra et al. (2005) obtainedStein type and Brewster–Zidek type estimators for the entropy for a multivariate normal population. In this paper we derivedominating estimators over the BAEE for the entropy of k shifted exponential populations. The model is important as thestructure of the sufficient statistics gets modified.

5. Numerical comparisons

In this section we compare numerically the risk performance of the improved estimators δIB, δ+

IB and δ−

IB with theBAEE δc0 . It is noticed that for all cases of µi’s the risk differences become small for large values of n. For n ≥ 100 therisk values are same up to six decimal places. For the purpose of presentation of the numerical study, we have takenn = 4, 6, 8, 10, 15, 20, 25, 30 and 50 and k = 2. The risk values of the proposed estimators are calculated using simulationsbased on 10000 samples of size n. Since the risk functions of the estimators are functions of (µ1/σ , . . . , µk/σ), we takeσ = 1without loss of generality. The results of the numerical study are presented through graphs. The graphs correspondingto values of n = 4, 6 and 8 are presented in Fig. 1 in this paper, whereas for values of n = 10, 15, 20, 25, 30 and 50, they areplaced on thewebsite: http://www.facweb.iitkgp.ernet.in/∼smsh/graph.pdf. The following observations aremade based onthe risk values.

(a) Under the squared error loss function the risk values of the MLE δML are 0.320865, 0.161249, and 0.104970 and thatof the BAEE δc0 are 0.178992, 0.104975, and 0.075129 for n = 4, 6, 8 respectively. Graphs (a), (b), (c) in the Fig. 1 representthe risk plot of the estimator δIB. We observe that for different values of n the improved regions of the estimator δIB over δc0are different. Keeping µ1 fixed, if we decrease the magnitude of µ2, then margin of improvement is more. It is also noticedthat we get considerable improvement when both µ1 and µ2 are close to zero. In this case, the region of improvement isapproximately |µ1| ≤ 0.5 and |µ2| ≤ 0.5. The maximum improvement observed is about 12%.

(b) When both µ1 and µ2 are non-negative, the risk values of the estimators are plotted in graphs (d), (e), (f ) in theFig. 1. For large values of µ1 and µ2, approximately (≥1), δ+

IB takes the value of the risk equal to the R(δc0).For the values of µ1 and µ2 approaching towards zero risk of δ+

IB decreases and before 0, it stops decreasing and startsincreasing. The maximum improvement observed is about 12%.

(c) When both µ1 and µ2 are negative, graphs (g), (h), (i) and (j), (k), (l) in the Fig. 1 represent the risk plot of theestimators δ−

IB and δRM respectively. From the numerical risk values it is observed that risk values of δRM and δ−

IB decreasewhen both µ1 and µ2 increase. The performance of δRM is always better than that of δML. We also see that the estimator δ−

IBalways performs better than δRM . The maximum improvement observed is about 27%.

Acknowledgments

The authors thank the reviewers and a co-editor-in-chief for their valuable suggestions which have considerablyimproved the content and the presentation of the paper.

References

Adami, C., 2004. Information theory in molecular biology. Phys. Life Rev. 1, 3–22.Ahmed, N.A., Gokhale, D.V., 1989. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inf. Theory 35, 688–692.Arnold, B.C., 1970. Inadmissibility of the usual scale estimate for a shifted exponentail distribution. J. Amer. Statist. Assoc. 65, 1260–1264.Bobotas, P., Kourouklis, S., 2009. Strawderman-type estimators for a scale parameter with application to the exponential distribution. J. Statist. Plann.

Inference 139, 3001–3012.Brewster, J.F., 1974. Alternative estimators for the scale parameter of the exponential distribution with unknown location. Ann. Statist. 2, 553–557.Brewster, J.F., Zidek, J.V., 1974. Improving on equivariant estimators. Ann. Statist. 2, 21–38.Brown, L.D., 1968. Inadmissibility of the usual estimators of scale parameters in problems with unknown location and scale parameters. Ann. Math. Statist.

39, 29–48.Cover, T.M., Thomas, J.A., 1999. Elements of Information Theory. Wiley, New York.Dragulescu, A., Yakovenko, V.M., 2001. Evidence for the exponential distribution of income in the USA. Eur. Phys. J. B 20, 585–589.Kayal, S., Kumar, S., 2011a. Estimating entropy of an exponential population under linex loss function. J. Indian Statist. Assoc. 49, 91–112.Kayal, S., Kumar, S., 2011b. On estimating the Shannon entropy of several exponential populations. Int. J. Stat. Econ. 7, 42–52.Lazo, A.C.G., Rathie, P.N., 1978. On the entropy of continuous probability distributions. IEEE Trans. Inf. Theory 24, 120–122.Lehmann, E.L., Casella, G., 1998. Theory of Point Estimation, second ed. Springer, New York.Lehmann, E.L., Romano, J.P., 2009. Testing Statistical Hypotheses. Springer, New York.Liu, Y., Liu, C., Wang, D., 2011. Understanding atmospheric behaviour in terms of entropy: a review of applications of the second law of thermodyanamics

to meteorology. Entropy 13, 211–240.Misra, N., Singh, H., Demchuk, E., 2005. Estimation of the entropy of a multivariate normal distribution. J. Multivariate Anal. 92, 324–342.Pal, N., Jin, C., Lim, W., 2006. Handbook of Exponential and Related Distributions for Engineers and Scientists. Chapman and Hall/CRC, Boca Raton.Petropoulos, C., Kourouklis, S., 2002. A class of improved estimators for the scale parameter of an exponential distribution with unknown location. Comm.

Statist. Theory Methods 31, 325–335.Robinson, D.W., 2011. Entropy and uncertainty. Entropy 10, 493–506.Sakai, A.K., Burries, T.A., 1985. Growth in male and female aspen clones: a twenty-five year longitudinal study. Ecology 66, 1921–1927.Shannon, C., 1948. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423.Sharma, D., 1977. Estimation of the reciprocal of the scale parameter in a shifted exponential distribution. Sankhya Ser. A 39, 203–205.Stein, C., 1964. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math. 16, 155–160.Zidek, J.V., 1973. Estimating the scale parameter of the exponential distribution with unknown location. Ann. Statist. 1, 264–278.