bounds on the tail probability and absolute difference between two distributions

15
This article was downloaded by: [Universita di Trento] On: 10 July 2013, At: 02:17 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 Bounds on the Tail Probability and Absolute Difference Between Two Distributions M. N. Goria a & A. Tagliani a a Department of Computer and Management Sciences, Faculty of Economics, Trento University, Trento, Italy Published online: 02 Sep 2006. To cite this article: M. N. Goria & A. Tagliani (2003) Bounds on the Tail Probability and Absolute Difference Between Two Distributions, Communications in Statistics - Theory and Methods, 32:3, 519-532, DOI: 10.1081/STA-120018549 To link to this article: http://dx.doi.org/10.1081/STA-120018549 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: unitn

Post on 30-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [Universita di Trento]On: 10 July 2013, At: 02:17Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

Bounds on the Tail Probability and Absolute DifferenceBetween Two DistributionsM. N. Goria a & A. Tagliani aa Department of Computer and Management Sciences, Faculty of Economics, TrentoUniversity, Trento, ItalyPublished online: 02 Sep 2006.

To cite this article: M. N. Goria & A. Tagliani (2003) Bounds on the Tail Probability and Absolute Difference Between TwoDistributions, Communications in Statistics - Theory and Methods, 32:3, 519-532, DOI: 10.1081/STA-120018549

To link to this article: http://dx.doi.org/10.1081/STA-120018549

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

COMMUNICATIONS IN STATISTICS

Theory and Methods

Vol. 32, No. 3, pp. 519–532, 2003

Bounds on the Tail Probability and Absolute

Difference Between Two Distributions

M. N. Goria* and A. Tagliani

Department of Computer and Management

Sciences, Faculty of Economics, Trento University,Trento, Italy

ABSTRACT

A fractional moment bound on the tail probabilities of a distributionis proposed and illustrated with examples. It is then combined with

the moment bound of Philips–Nelson, resulting in an improvementover the latter. Two bounds on the absolute difference between twodistributions, namely the fractional moment and entropy based

bounds are introduced. The first compares well with Lindsay–Basak’s window function while the second can be made as preciseas desired, according to the number of common moments assumed.

*Correspondence: M. N. Goria, Department of Computer and ManagementSciences, Faculty of Economics, Trento University, 38100 Trento, Italy; E-mail:[email protected].

519

DOI: 10.1081/STA-120018549 0361-0926 (Print); 1532-415X (Online)

Copyright & 2003 by Marcel Dekker, Inc. www.dekker.com

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

Key Words: Entropy; Fractional moments; Moments; Tail

probability.

1. INTRODUCTION

Bounds on probabilities in the central part and tails of a distribu-tion can be found in almost every text of probability and statistics.These are designed either to provide a rough estimate of the probabil-ity where the effective value is either difficult or unavailable for lackof information on the distribution function or exploited fruitfullyto establish theoretical results such as law of large number andcentral limit theorems. Among all bounds, Chernoff ’s bound is usedalmost exclusively, when a tight bound on the tail of distribution isrequired.

Recently Philips and Nelson (1995) proposed a new bound on thetails probabilities based on sequence of moments and showed that it issuperior to Chernoff ’s bound. Naveau (1997) reached the same conclu-sion for the discrete r.v using a sequence of factorial moments.

At first sight the result appears a bit surprizing since, underrestrictions, the moment generating function (Chernoff ’s bound) andsequence of moments (Philips–Nelson bound) are equivalent. The onlyplausible explanation seems that the moment generating functionprovides a global behavior of distribution, whereas the momentsdescribe its local characteristics, e.g., the concepts such as location,scale, symmetry and kurtosis, well familiar to the readers.

These bounds clearly exclude heavy tailed distributions such asPareto, Cauchy or more generally stable distributions. For these distri-butions however the fractional moments do exist (Goria, 1978) forCauchy distribution.

In Sec. 2 we first develop a bound based on a sequence of fractionalmoments and compute it for Cauchy and Pareto distributions inExample 1. Next we combine it with Philips–Nelson’s moment bound,leading to a new and more precise bound compared to the abovementioned one, as discussed in Example 2.

In Sec. 3 we propose two bounds on the absolute difference betweentwo distributions. The first is based on the fractional moments andmoments which compares well with the window bound of Lindsay andBasak (2001). The second, based on the entropy which can be made asprecise as desired, according to the number of common moments shearedby the two distributions.

520 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

2. FRACTIONAL MOMENTS BOUND ON THE TAIL

PROBABILITY OF DISTRIBUTION

Philips and Nelson (1995) consider a random variable X having anabsolutely continuous distribution function FX with support on the realline such that it has finite moments of all order and propose the followingbound on the tail of the distribution

PðX � tÞ � infk� 0

EðXþ=tÞk ¼ MðtÞ, t > 0,

where X þ is postive part of r.v X . They showed that this inequality iseven sharper than Chernoff ’s bound.

These bounds are not however applicable to heavy tailed distribu-tions, e.g., Cauchy and Pareto distributions, due to nonexistence of evenfirst moment but the fractional moments do exist for these distributions.Their exclusion from text books probably can be attributed to theirunclear meanings, unlike the first four moments.

Lin (1992) showed that the sequence of fractional moments underrestrictions uniquely determine their distribution, it follows thenthat the fractional moments like moments also do explain the variousfacits of a distribution. This result is particularly relevant to the heavytailed distributions, e.g., stable distributions. Consequently it seemsof interest to develope a bound based on fractional moments andfurther modify it appropriately in case certain number of momentsdo exist.

Let �1=k ¼ EðXþÞ1=k, k � 2 denote the kth order fractional moment,

then from Markov’s inequality, we have

PðX � tÞ � infk� 2

�1=k

t1=k¼ Mf ðtÞ

Note that the above bound is computable for any distribution particu-larly for those, not having even the mean finite. We illustrate it belowwith an example.

Example 1: Pareto and Cauchy Distributions

Consider the Pareto distribution with density

f ðxÞ ¼ �x���1, x � 1, � < 1:

Tail Probability Bounds 521

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

If we let

gð�Þ ¼��

t�¼

1

t�ð1� �=�Þ, 0 � � < � < 1,

then we find that the function gð�Þ has minimum at

�0 ¼ð� log t� 1Þ

log t, t > eð1=aÞ:

We define k0, for �0 < 1=2 as

k0 ¼ ½1=�0 þ 1:1=�0 � ½1=�0 � 0:5

otherwise k0 ¼ ½1=�0. While for �0 � 1=2 we take k0 ¼ 2 and k0 ¼ 1,t � eð1=aÞ. As a result the fractional moment bound is

Mf ðtÞ ¼ gðk0Þ ¼1

t1=k0ð1� 1=ð�k0ÞÞ

We compared the exact probability with Mf ðtÞ for � ¼ 2=3, 3=4, 9=10.It turned out that the bound becomes sharper as � increases.

For Cauchy distribution with

gð� Þ ¼1

2t� cosð��=2Þ, � � 1=2

we find

k0 ¼ 1, t � 1, k0 ¼ 2, t � 4:81057

and with analogous procedure to above, �0 was found as

�0 ¼ ð2=�Þarc cosð�½�2þ 4ðlog tÞ2�1=2

Þ, t 2 ð1, 4:81057Þ,

from which k0 was determined.For both distributions, the bound Mf ðtÞ compared to exact prob-

ability was quite wide particularly for large t-values. This seems obviousas Mf ðtÞ tends to zero at the rate of t�1=2 compared to the exact valuewhich converge at the rate t�2, t�� respectively for Cauchy and Pareto

522 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

distribution. However if we do not insist on integer solution and considerinfimum over k > 1, the bound becomes better.

To rectify the above defect specially for large t-values we brought inthe mean of the Pareto distribution, i.e., we assumed 1 < � < 2 andconsidered the bound

M1ðtÞ ¼ infk� 1

�1=k

t1=k� Mf ðtÞ:

We compared M1ðtÞ with exact probability for � ¼ 1:5. This indeedshowed some improvement over Mf ðtÞ for large t-values.

From the above discussion it is clear how to modify the fractionalmoment bound in case the distribution has certain number of finitemoments. Next we compared Mf ðtÞ with very precise bounds ofChernoff and Philips–Nelson for the Normal and Gamma distributions.It turned out that the results are very similar in both cases, therefore weshall discuss Gamma distribution in details in the next example.

Example 2: Gamma Distribution

Consider Gamma distribution with density

f ðxÞ ¼x��1e�x

�ð�Þ, x � 0, � > 0:

From Philips–Nelson, we have

CðtÞ ¼ 1, t < �

¼ ðt=�Þ�eð��tÞ, t � �:

To evaluate Mf ðtÞ,MðtÞ, we consider

gð�Þ ¼ t�� �ð�þ �Þ

�ð�Þ, � � 0

The function gð�Þ has minimum at �0 given by solution of the followingequation

1

�ð�þ �Þ

d �ð�þ �Þ

d�� log t ¼ 0

The exact solution �0 was determined numerically using Mathematica foreach fixed value of t and � ¼ 1=2, 1, 2, 3. It turned out that t� t0 <�0 < t� � and �0 < 0, t < t0. Further the numerical solution can beapproximated nicely as follows.

Tail Probability Bounds 523

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

First for each � we determined tpð�Þ, p ¼ 0, 0:5, 1, the t-value corre-sponding to the solution �0 ¼ p, then divided the t-values into threedisjoint intervals ½t0, t0:5Þ, ½t0:5, t1Þ, t � t1 and finally found that �0 can bevery closely approximated by

�̂� ¼ t� t0 þ cið�Þð�� t0Þ, i ¼ 0, 0:5, 1,

where c0, c0:5 were determined at mid value of the interval and c1 at t1such that �̂� matches exactly with �0. The quantities ci are of small mag-nitude, in fact c0 < c0:5 < c1 with c1 � 0:0094.

The approximation has no impact on the value of MðtÞ but negligibleeffect for t � t1, where t is large. Like wise it has no effect on Mf ðtÞ fort � t0:5 but little effect for small values of t where k0 is very large.

For the bound MðtÞ, k0 ¼ 0, t � t0, otherwise found from �̂� as in thefirst example, whereas for Mf ðtÞ, k0 ¼ 1, t � t0 and for other t-valuesfrom 1=�̂�.

On comparison of Mf ðtÞ with MðtÞ and CðtÞ, it turned out that formoderate t-values, i.e., t � 1:09� and t < 1:6�, the fractional momentbound Mf ðtÞ is better than MðtÞ and CðtÞ respectively and thereafterboth CðtÞ and MðtÞ override Mf ðtÞ. This was to be expected as Mf ðtÞgoes to zero at t�1=2 rate.

From Example 1 it is clear that if we bring in the moments to modifyMf ðtÞ, the new bound

PðX > tÞ � infs�þs =t

s¼ M1ðtÞ,

where s 2 f. . . , 1=k, . . . , 1=2, 0, 1, . . . , k, . . .g will lead to an improvement,particularly for large t-values, as can be seen in the section of the graphbelow where we have presented M1ðtÞ,MðtÞ and CðtÞ for � ¼ 3.

From Fig. 1 it is clear that M1ðtÞ out performs Chernoff ’s boundand has an edge over MðtÞ for moderate values of t.

Figure 1. CðtÞ (continuous), MðtÞ (dashed), M1ðtÞ (dots).

524 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

Needless to say that the bound M1ðtÞ besides being preciser for allt > 0 than other bounds has the clear advantage over others of widerapplicability, i.e., it is usable even for the distribution with no finitemoment. Further with �̂� determined above, it was found that someimprovement still can be made over M1ðtÞ if we do not insist on integersolution.

3. BOUNDS ON THE ABSOLUTE DIFFERENCE

BETWEEN TWO DISTRIBUTION FUNCTIONS

We shall propose two bounds namely fractional moment–momentbound and an entropy based bound and discuss their relativeperformances

3.1. Fractional Moment–Moment Bound

Consider any two absolutely continuous distributions F and G suchthat both distributions have the same first k moments and fractionalmoments, i.e.,

�iðFÞþ¼ �iðGÞ

þ, i ¼ 1=k, . . . , 1, 2, . . . , k

of the positive part of r.vs. If we let

mkðtÞ ¼ minf1=k,..., 0, 1,..., kg

ð�iðFÞþ=tiÞ,

then we have the following inequality

jFðtÞ � GðtÞj ¼ j �FFðtÞ � �GGðtÞj � maxð �FFðtÞ, �GGðtÞÞ � mkðtÞ, t > 0:

ð3:1Þ

We compared the fractional moment–moment bound with the windowfunction of Lindsay and Basak, 2001, i.e.,

jFðtÞ � GðtÞj � kðtÞ ¼1

VkðtÞ0

MðkÞ�1VkðtÞ, t 2 R

þð3:2Þ

whereMðkÞ is a Hankel symmetric matrix defined by the first 2kmomentsand Vt ¼ ð1, t, t2, . . . , tkÞ, for Normal distribution with several values of k.

Tail Probability Bounds 525

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

It turned out that the difference between two bound varies from aminimum of 1% to a maximum of 4% in favor of the window function.Needless to say that our bound has a computational advantage.

3.2. Entropy Bound

Consider a class F of absolutely continuous distributions such thatits member have the same first k moments �i, i ¼ 1, 2, . . . , k. We derivethe entropy bound as follows:

We fix a member of F say F and call it the target distribution. Forthe distribution G we choose distribution Fk having the same first kmoments, i.e, a member of F but with maximum Shannon-entropy andshow in Appendix that the following inequality holds.

jFkðxÞ � FðxÞ j� 3 �1þ 1þ4

9

�H ½ fk �H ½ f

�� �1=2" #1=2

ð3:3Þ

Here H ½ fk and H ½ f denote the entropy of fkðxÞ and f ðxÞ respectively.Using Eq.ð3:3Þ, we have for any two distributions with k common

moments

ðF ,GÞ ¼ supx

jFðxÞ �GðxÞ j� supx

jFðxÞ � FkðxÞ j þ supx

jFkðxÞ �GðxÞ j

� 3 �1þ 1þ4

9

�H ½ fk �H ½ f

�� �1=2" #1=2

þ 3 �1þ 1þ4

9

�H ½ fk �H ½g

�� �1=2" #1=2

ð3:4Þ

3.3. Discrete Case

Let us consider the probability mass functions ðxi, f ðxiÞÞ, the targetand ðxi, fkðxiÞÞ maximum entropy with same support fi ¼ 1, 2, . . . ,1g

such that they have first k moments �i, i ¼ 1, 2, . . . , k in common. Thenwe show in Appendix that the following inequality holds

jFkðtÞ � FðtÞ j� V �

h2 log 2

�H ½ fk �H ½ f

�i1=2, ð3:5Þ

526 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

where F ,Fk respectively denote the distribution function correspondingto f , fk:

In analogy with the continuous case, for each pair of distributions,we obtain the window estimate in discrete case as

ðF ,GÞ �h2 log 2

�H ½ fk �H ½ f

�i1=2þ

h2 log 2

�H ½ fk �H ½g

�i1=2:

ð3:6Þ

Remark

The inequalities (3.3) and (3.5) remain unaltered for the nonnegativevariable and further in such case we can consider distributions having anarbitrary number k of assigned moments (Tagliani, 2000).

Note that in the entropy bound, the two distributions involved donot play an equal role, in that the target distribution F has to be specifiedin advance. This disadvantage is overshadowed by the fact that the aboveinequalities can be made as precise as we like, by an appropriate choice ofk, i.e. the number of common moments.

3.4. Choice of k

In our previous discussion we assumed that the first k commonmoments of distributions F and G are preassigned. This may be justifiedin a theoretical work but in practice it is often necessary to decide howlarge k should be in given situation. To this end we restrict our attentiononly to those distributions having a determinate moment problem anddiscuss the choice of k.

Now for a class of distributions having a determinate momentproblem, we have a well defined sequence of entropies fH ½ fk, k ¼

1, 2, . . .1g. From Appendix, it is clear that the convergence in entropyentails convergence in divergence measure or equivalently in variationmeasure.

We know from a theorem Frontini and Tagliani (1997) that themonotonic decreasing sequence fH ½ fk, k ¼ 1, 2, . . .1g converges toH ½ f . To obtain an estimate of the error for large values of k, we shalluse Aitken �2-method.

For every distribution unlike the maximum entropy distribution,contents of information contained in it, are shared by the whole sequenceof moments and a further addition of a moment causes an entropydecrease. The size of decrease is not the same for every moment. Infact for most of the distributions met in practice, we find that the first

Tail Probability Bounds 527

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

few moments generate a higher decrease than those later in the list.Consequently it seems reasonable to expect that the sequence of pointsð j,H ½ fjÞ, j ¼ 1, 2, . . . ,1 lie on a convex curve, i.e.,

D2H ½ fj ¼ H ½ fj � 2H ½ fð j�1Þ þH ½ fð j�2Þ > 0, j > 2

Next by Aitken’s �2-method, we define a new accelerated sequence as

H acc½ fj ¼ H ½ fj �

�H ½ fj �H ½ fðj�1Þ

�2D2H ½ fj

, j > 2

which converges faster to H ½ f than the initial sequence.If we take H ½ f ’ H acc

½ fk, for some k, we obtain the followingestimate of the error term

H ½ f �H acc½ fk ’

�H ½ fk �H ½ fðk�1Þ

�2D2H ½ fk

ð3:7Þ

Using k found above in Eqs. ð3:4Þ and ð3:6Þ, we obtain an estimate ofðF ,GÞ which we denote by ̂ðF ,GÞ. Note that for any prefixed errorbound, we can determine an optimal value of k. We illustrate these resultswith two examples where in addition to the window estimate we alsoreport Ið fk, f Þ:

In both examples the determination of optimal value of k run in twosteps as follows: since F is known here, we first compute

�kF ¼:kFkðxÞ � FðxÞk1 ,

and Ið f k, f Þ for each k � 1, needed to evaluate window estimate. Nextwe find minimum value of k such that �kF � ̂ðF ,GÞ and declare it as anoptimal choice of k.

Example 1 (continuous case). To appreciate the size of entropy bound, wechoose F for illustrative purpose with the following density

f ðxÞ ¼�

2sinð�xÞ, x 2 ½0, 1

and zero elsewhere. Note that f ðxÞ has a determinate moment problem,with H ½ f ’ �0:144729886. The approximate density fkðxÞ, x 2 ½0,1Þ isgiven by Eq. (A.1). In Table 1 we report optimal value of k along withvarious quantities need for its determination.

528 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

Example 2: Binomial distribution. Here we consider binomial distributionwith parameters n ¼ 15, p ¼ 0:2 and find H ½ f ’ 1:8383844236 (Table2).

From both examples it clearly emerges that

Ið k, f Þ < �kF < ̂ðFk,FÞ

and that all three entities, �kF , Ið f , fkÞ, ̂ðFk,FÞ, decrease quite rapidlywith the increase of k.

APPENDIX

Kesavan and Kapur (1992) show that for a class of absolutely con-tinuous positive distributions F having first k moments �i, i ¼ 0, . . . , kin common, its member Fk with maximum Shannon-entropy has thefollowing density function

Table 1. Optimal k, absolute difference, divergence

measure and estimated entropy difference.

k �kF Ið f , fkÞ ̂ðF ,FkÞ

4 0.6463E�2 0.5051E�2 0.4604E�1

6 0.2369E�2 0.1626E�2 0.3682E�19 0.1001E�2 0.6382E�3 0.2832E�1

Table 2. Optimal k, absolute difference, divergencemeasure and estimated entropy difference.

k �kF Ið fk, f Þ ̂ðF ,FkÞ

3 0.6575E�2 0.8915E�3 0.8337E�14 0.2067E�2 0.1098E�3 0.1526E�1

5 0.1017E�2 0.2290E�4 0.3885E�26 0.8267E�3 0.5010E�5 0.2535E�2

Tail Probability Bounds 529

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

fkðxÞ ¼ exp �Xkj¼0

jxj

!, x 2 ½0,1Þ, ðA:1Þ

and fkðxÞ ¼ 0 elsewhere. The i, i ¼ 0, 1, . . . , k are Lagrange’s multi-pliers, obtainable through �i and its Shannon-entropy H ½ fk is given by

H ½ fk ¼Xkj¼0

j�j

The main result on the absolute difference between the two distribu-tions is given in theorem below.

Theorem. For any member F 2 F with density f , the inequalities (3.3) and(3.5) hold.

Proof.

jFkðxÞ � FðxÞ j �

Z x

0

j fkðuÞ � f ðuÞ j du

Z þ1

0

j fkðuÞ � f ðuÞ j du ¼ V ðA:2Þ

and the above inequality remains valid for discrete case. It is straight-forward to see that the Kullback–Leibler’s divergence measure betweenfk, f is

Ið fk, f Þ ¼ �H ½ f þXkj¼0

j�j ¼ H ½ fk �H ½ f ðA:3Þ

and further for any other member G 2 F=F , the following holds

Iðg, f Þ � Ið fk, f Þ ¼ H ½ fk �H ½ f , G 2 F=F :

From Kullback (1967) we have its lower bound as

I �V 2

2þV 4

36:

Equivalently

530 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

V � 3 �1þ 1þ4

9I

� �1=2" #1=2

, ðA:4Þ

Substituting Eq. (A.4) and then Eq. (A.3) in Eq. (A.2), leads to Eq. (3.3).The proof of Eq. (3.5) is identical to Eq. (3.3) except that here we useCover (1991) lower bound

I �1

2 log 2V2

in place of Kullback’s bound.

Remark. Bound (3.1) may be sharpened further by taking into accountthe following inequality

I �V 2

2þV 4

36þ

V 6

288ðA:5Þ

as in Toussaint (1975) but this evidently leads to more complicatedinequality.

REFERENCES

Cover, T. M., Thomas, J. A. (1991). Elements of Information Theory.New York: John Wiley Sons, Inc.

Frontini, M., Tagliani, A. (1997). Entropy-convergence in Stiltjes andHamburger moment problem. Applied Math. and Computation88:39–51.

Goria, M. N. (1992). Fractional Absolute moments of the CauchyDistribution, Quaderni di Statistica e Matematica applicata allescienze Economico-Sociali 1:3–9. Trento University.

Kesavan, H. K., Kapur, J. N. (1992). Entropy Optimization Principleswith Applications. Academic Press.

Kullback, S. (1967). A lower bound for discrimination information interms of variation. IEEE Transaction on Information TheoryIT-13:126–127.

Lin, G. D. (1992). Characterizations of Distributions via moments.Sankhya: The Indian Journal of Statistics 54(Series A):128–132.

Lindsay, B. G., Basak, P. (2001 to appear). Moments determine the tailof a distribution (but not much else). The American Statistician.

Tail Probability Bounds 531

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3

©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

Naveau, P. (1997). Comparison between the Chernoff and factorialmoment bounds for discrete random variables. The AmericanStatistician 51(1):40–41.

Philips, T. K., Nelson, R. (1995). The moment bound is tighter thanChernoff’s bound for positive tail probabilities. The AmericanStatistician 49(2):175–178.

Tagliani, A. (2000). Inverse Z transform and moment problem.Probability in Engineering and Informational Sciences 14:393–404.

Toussaint, G. T. (1975). Sharper lower bounds for discrimination infor-mation in terms of variation. IEEE Transaction on InformationTheory (Corresp.) IT-21:99–100.

532 Goria and Tagliani

Dow

nloa

ded

by [

Uni

vers

ita d

i Tre

nto]

at 0

2:17

10

July

201

3