moment information and entropy evaluation for probability densities

15
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Upload: unitn

Post on 30-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Moment information and entropy evaluation for probability densities

Mariyan Milev a, Pierluigi Novi Inverardi b, Aldo Tagliani b,⇑a Dept. of Informatics and Statistics, University of Food Technologies, 4002 Plovdiv, Bulgariab Dept. of Computer and Management Sciences, University of Trento, 38100 Trento, Italy

a r t i c l e i n f o

Keywords:EntropyEntropy convergenceHausdorff moment problemKullback–Leibler distanceMaximum entropy

a b s t r a c t

How much information does the sequence of integer moments carry about the correspond-ing unknown absolutely continuous distribution? We prove that a reliable evaluation ofthe corresponding Shannon entropy can be done by exploiting some known theoreticalresults on the entropy convergence, uniquely involving exact moments without solvingthe underlying moment problem. All the procedure essentially rests on the solution of lin-ear systems, with nearly singular matrices, and hence it requires both calculations in highprecision and a pre-conditioning technique. Numerical examples are provided to supportthe theoretical results.

� 2011 Elsevier Inc. All rights reserved.

1. Introduction

Let be X an absolutely continuous random variable with support D. The moment problem consists of recovering anunknown probability density function (pdf) f ðxÞ 2 L2

D from the knowledge of its associated sequence fljgMj¼0 of integer

moments, with M finite or infinite, where lj ¼ EðXjÞ ¼R

D xjf ðxÞdx, j P 0, l0 = 1. In actual practice, the problem consists of

determining approximations to the unknown density function f(x) from a finite collection fljgMj¼0 of its integer moments.

In such a case the moment problem is called ‘‘truncated moment problem’’, else, if M is infinite the problem is called ‘‘fullmoment problem’’. If D is bounded, typically D = [0,1], we say ‘‘Hausdorff moment problem’’, whilst if D ¼ Rþ Stieltjesmoment problem and if D ¼ R Hamburger moment problem. In Hausdorff case the infinite sequence of moments determinesa unique distribution (we say determinate moment problem), unlike Stieltjes and Hamburger cases where supplementaryconditions are requested to guarantee an unique distribution. The above terminology and general results about existenceand determinacy are found in the classical book of Shohat and Tamarkin [1]. Once assigned the first moments fljg

Mj¼1 we

define the Mth moment space VM � RMþ as the convex hull of the curve {x,x2, . . . ,xM}, x 2 D; in Hausdorff case VM is closed,

bounded and convex [2, Theorem 7.2] and its geometrical properties are well described in [2].In the last decade several papers have been devoted on determining how the partial information contained in the finite

sequence of integer moments can be used to describe various distributional characteristics and properties. Lindsay and Basak[3] show tail bounds for an unknown distribution in terms of moments by exploiting a classical bounding result due toAkhiezer [4] and Shohat and Tamarkin [1]. Goria and Tagliani [5] present a moment bound for the absolute differencebetween two distributions, based on maximum entropy (MaxEnt) method; Gavriliadis and Athanassoulis [6] and Gavriliadis[7] obtain some results about the separation of the main mass interval, the tail interval and the position of the mode. In somerecent papers Mnatsakanov [8,9] provides a procedure to recover a probability density function (pdf) f(x) and the associateddistribution function F(x) directly from the associated moments without resorting to an approximation f (app)(x) of f(x) or

0096-3003/$ - see front matter � 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.amc.2011.11.093

⇑ Corresponding author.E-mail addresses: [email protected] (M. Milev), [email protected] (P.N. Inverardi), [email protected] (A. Tagliani).

Applied Mathematics and Computation 218 (2012) 5782–5795

Contents lists available at SciVerse ScienceDirect

Applied Mathematics and Computation

journal homepage: www.elsevier .com/ locate /amc

Author's personal copy

F (app)(x) of F(x); in the framework of the Hausdorff moment problem Mnatsakanov aims to construct a stable approximationf (app)(x) of f(x) from its moments sequence and proves some asymptotic properties of the approximant and finally derives theuniform and L1-rates of convergence.

All the previous results come from an obvious fact that is, in a determinate moment problem the information contentabout a distribution is spread on the infinite sequence of its moments. In the present paper we run along the guidelinesof the above quoted papers. In more details, we tackle the issue to determining the Shannon-entropy of a distribution withsupport D = [0,1] and admitting pdf f(x), without solving the inverse moment problem, avoiding consequently all the insta-bility drawbacks related to the solution of inverse problem. A viable route to calculating Shannon-entropyH½f � ¼ �

RD f ðxÞ ln f ðxÞdx consists in approximating f(x) by a density f (app)(x), constrained by a finite set of moments. ThenR

D xjf ðappÞðxÞdx ¼R

D xjf ðxÞdx, j = 0, . . . ,M and H[f] is replaced by H½f ðappÞ� ¼ �R

D f ðappÞðxÞ ln f ðappÞðxÞdx.The procedure that we propose here requires an accurate recovering of f(x) through f(app)(x) by involving a high number M

of moments fljgMj¼0. In general, high M values arise ill-conditioning issues in the involved Hankel matrices. In Hausdorff case,

for instance, [10] proved that all Hankel matrices generated by moments of positive density functions are conditioned essen-tially the same as Hilbert matrices. Then each order M Hankel matrix has condition number which grows at exponentiallyrate ’e3.525M for large M. Condition number of Hankel matrices generated by moments arising from a distribution with sup-port Rþ or R may be found in [11,12]. All the latter matrices have condition number growing at exponentially rate. Then, totake ill conditioning into account, only few moments have to be involved with a consequent poor estimate of H[f], unless asuitable preconditioner is available, as in Hausdorff case [10]. Indeed, although the infinite sequence of moments fljg

1j¼0

completely determines the unknown distribution, the information content carried out by each moment lj depends on thesequence itself; in general, it could be poor or even insignificant when the corresponding distribution exhibits fat tails aswell as few moments may sometimes characterize completely the distribution: in such a case they coincide with the ‘‘char-acterizing moments’’. We recall that the characterizing moments of a probability distribution are the moment constraintssubject to which the probability distribution is obtained as the MaxEnt probability distribution [13, p. 135]. They are ableto synthesize all the relevant information on the distribution and characterize it.

For instance, if D = [0,1], when M takes a high value from Taylor expansion one has xM ’P

n�M x� 12

� �n. Taking theexpected value one gets

lM ¼ EðXMÞ ’Xn�M

n¼0

EðX � 12Þn ¼

Xn�M

n¼0

Xn

j¼0

n

j

� ��1

2

� �n�j

lj: ð1:1Þ

Equivalently, higher moments may be obtained combining lower order moments. Heuristically speaking, different powers ofx differ very little from each other if x is restricted by 0 < x < 1 and the exponents are large. Then a high number of momentshas to be involved in an accurate recovering with the consequent risk to run into numerical instability.

An upper bound for entropy H[f] uniquely in terms of first M + 1 moments l = (l0, . . . ,lM) had been provided [14, Eq.

(3.8)]. If the moment vector lU ¼ 1jþ1

n oM

j¼0of the uniform distribution in [0,1] is considered, it holds

H½f � 6 infM� kl� lUk2

2

2kmaxfl;lUgk1

" #: ð1:2Þ

The bound (1.2) is tight whenever the moment vector l is close enough to the moment vector lU, or equivalently H[f] ’ 0(see [14] for details).

Exploiting previous results of convergence characterizing MaxEnt approximations of f(x) we try to bypass that instabilityand we illustrate a viable way to calculate the entropy of the underlying distribution uniquely in terms of moments. Theneeded procedure requires uniquely the solution of linear systems and the evaluation of definite integrals ((2.9) and(2.10) below). An accurate evaluation of H[f] requires both a high number of moments and a high precision computationalprocedure jointly with a suitable preconditioner for the matrix involved in the linear system.

The paper is organized as follows. In Section 2 the MaxEnt problem is formulated and convergence theorems, as entropy,directed divergence and almost everywhere are reviewed. Based upon above theorems an approximate procedure for entro-py evaluation is formulated. In Section 3 a suitable preconditioner for solving ill conditioned linear systems is illustrated, aswell as numerical examples validating the above approximate method. In Section 4 expected values to be used in MaxEntsetup alternative to integer moments are suggested. An extension to Stieltjes or Hamburger moment problems of the aboveprocedure for approximate entropy evaluation is discussed.

2. Formulation of the problem

Let us consider a random variable X with support D = [0,1], admitting pdf f(x) with moments fljg1j¼0 and underlying deter-

minate moment problem (so called Hausdorff moment problem). For practical purposes, only a finite sequence fljgMj¼0 has to

be considered. Then there exist an infinite set of densities f (app)(x) compatible with the information carried from the finitemoment sequence and hence a choice criterion is required. The MaxEnt approach is widely used in the practice and providesthe approximation of f(x) [13]

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5783

Author's personal copy

f ðappÞðxÞ ¼: fMðxÞ ¼ exp �XM

j¼0

kjxj

!; ð2:1Þ

constrained byZD

xjfMðxÞdx ¼ lj; j ¼ 0; . . . ;M; ð2:2Þ

with Shannon entropy

H½fM � ¼ �Z

DfMðxÞ ln fMðxÞdx ¼

XM

j¼0

kjlj ð2:3Þ

and 0 P H[f1] P � � �P H[fM] P � � �P H[f] as is known from the MaxEnt context. Lagrange multipliers (k0, . . . ,kM) are evalu-ated by solving the minimization problem [13]

fkjgMj¼1 : min

k1 ;...;kM

lnZ

Dexp �

XM

j¼1

kjxj

!dx

!þXM

j¼1

kjlj

" #ð2:4Þ

and

k0 ¼ lnZ

Dexp �

XM

j¼1

kjxj

!dx:

Then a direct evaluation of H[fM] is obtained by H½fM � ¼PM

j¼0kjlj. A moment problem solution produces instability issueswhen a high number M of moments is involved. On the other hand the entropy H[fM] of fM(x) represents one of the mainindicators to test the reliability of the approximation. Combining entropy convergence (Theorem 1 below) and the obvioussequence of inequalities 0 P H[f1] P � � �P H[fM] P � � �P H[f] the criterion for the choice of the number M of moments thatyield the best approximation fM(x) before incurring in numerical instability (equivalently, the sequence {H[fj]} fails to bemonotonic decreasing) should be

M � j : minj

H½fj�:

This fact is due to conditioning of H[fM](lM) (which we consider depending on lM only, whilst the remaining moments(l0, . . . ,lM�1) are held fixed), whose value is given by Tagliani [15]

CondðH½fM�Þ ¼H½fM�ðlM þ DlMÞ � H½fM�ðlMÞ

H½fM�

�DlM

lM

��������P eps

ðlMÞ2

jH½f �j 24M�3;

where eps = jDlMj/lM may be identified with the machine error. Then the condition of H[fM] grows exponentially and justi-fies the use of a low number M of integer moments in recovering fM(x) by (2.4). Such a drawback may be bypassed by resort-ing to some known theoretical results, arising an alternative procedure to calculate the entropy.

In the context of MaxEnt the following convergence result holds [15].

Theorem 1. Whenever the underlying moment problem is determinate, MaxEnt approximations converge in entropy, as Mincreases, to f(x), i.e.

limM!1

H½fM� ¼ H½f �: ð2:5Þ

Denote I½f ; fM � ¼R

D f ðxÞ ln f ðxÞfM ðxÞ

dx the Kullback–Leibler distance between f(x) and fM(x); since f(x) and fM(x) share same first M

moments fljgMj¼0, it follows [13]

H½fM � � H½f � ¼Z

Df ðxÞ ln f ðxÞ

fMðxÞdx: ð2:6Þ

Combining (2.5) with (2.6) it follows that

limM!1

H½fM� � H½f � ¼ limM!1

ZD

f ðxÞ ln f ðxÞfMðxÞ

dx ¼ 0: ð2:7Þ

Invoking the following result.

Theorem 2. [16, p. 252]

I½f ; fM �P 0;

with equality if and only if f and fM coincide almost everywhere.

5784 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

we conclude that limM?1fM(x) = f(x) almost everywhere. Combining entropy convergence and Kullback–Leibler distance,the following approximate procedure to calculating the entropy arises. For high M values we have

lj ¼ ljðf Þ ¼Z

Dxjf ðxÞdx ¼ ljðf ðappÞÞ ¼

ZD

xjf ðappÞðxÞdx; j P M: ð2:8Þ

Once calculated fMðxÞ ¼ exp �PM

j¼0kjxj�

, higher moments lj(fM), j > M may be found in terms of fljgMj¼1 and fkjgM

j¼1 integrat-ing by parts (2.2) with the following result

XM

k¼1

kkkðlk � lkþjÞ ¼ 1� ðjþ 1Þlj; j ¼ 1; . . . ;M: ð2:9Þ

The linear system (2.9) admits solution being an identity relating k = (k1, . . . ,kM) with fljðfMÞg2Mj¼Mþ1. From (2.2) (with j = 0), k0

may be obtained integrating fM(x) to one

k0 ¼ lnZ 1

0exp �

XM

j¼1

kjxj

!dx: ð2:10Þ

Remark 2.1. (2.9) is not new in literature. Similar relationships had been previously obtained by several authors. We restrictourselves to recall the system of moment equations obtained in [17,18] for the Hamburger case (here an even number M ofmoments has to be considered and D � R)

Han2M � ½k1;2k2; . . . ;MkM�0 ¼ ½0;1l0;2l1; . . . ; ðM þ 1ÞlM �0; ð2:11Þ

with rectangular matrix Han2M

Han2M ¼

l0 � � � lM�1

..

. ... ..

.

lMþ1 � � � l2M

2664

3775:

Here the authors [17] observe what is remarkable about (2.11) is that, not only they are exact, but they exhibit a strikinglysimple bilinear form, despite the strong non-linearity of the involved pdf fM. From the third equation on, (2.11) involves mo-ments of higher order than those which were constrained. Therefore, these equations can be used to obtain successfulapproximations to the closure relations interesting the Fluid Dynamics, i.e. to express the higher order moments in termsof the lower order ones and Lagrange multipliers (k1, . . . ,kM).

Remark 2.2. Alternatively, k0 may be obtained directly from fljgMj¼1 and fkjgM

j¼1 integrating by parts (2.2) (with j = 0)

k0 ¼ � ln

PMj¼1

jkjlj�1

1� exp �PMj¼1

kj

!266664

377775 ð2:12Þ

or

k0 ¼ � ln 1�XM

j¼1

jljkj

!exp

XM

j¼1

kj

!" #: ð2:13Þ

Being the procedure (2.9) for calculating k = (k1, . . . ,kM) only approximate, the use of (2.12) or (2.13) leads to inaccurate re-sults, as clarified by their numerical analysis. The procedure (2.9) for calculating k = (k1, . . . ,kM) is approximate in the follow-ing sense: for a fixed M, (2.9) involves lj(fM), which are unknown for j = M + 1, . . . ,2M, whilst lj(fM) = lj(f), j = 1, . . . ,M areknown. For practical purposes such lj(fM), j = M + 1, . . . ,2M are replaced by the assigned lj(f), with error decreasing as M in-creases, given as follows.

From Pinsker inequality [19, p. 390] and (2.6)

H½fM� � H½f � ¼Z

Df ðxÞ ln f ðxÞ

fMðxÞdx P

12

Z 1

0jfMðxÞ � f ðxÞjdx

� �2

and thenZ 1

0jfMðxÞ � f ðxÞjdx 6

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ

q:

Comparing lj(f) and lj(fM), j = M + 1, . . . ,2M, calling Dlj = lj(fM) � lj(f) and taking into account Theorem 1, one has

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5785

Author's personal copy

jDljj ¼ jljðfMÞ � ljðf Þj ¼Z 1

0xj fMðxÞ � f ðxÞð Þdx

�������� 6

Z 1

0xjjfMðxÞ � f ðxÞjdx 6

Z 1

0jfMðxÞ � f ðxÞjdx

6

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ

q! 0: ð2:14Þ

Then lj(fM) closely approximates lj(f), j = M + 1, . . . ,2M, as M increases. For high M values, after replacing lj(fM) with lj(f),j = M + 1, . . . ,2M, the linear system (2.9) continues to admit solutions as a consequence of Theorem 2.

From a combined use of (2.9) and (2.10) we have as follows. We call.

(i) H½fM� ¼ k0 þPM

j¼1kjlj the exact entropy of fM(x), where kj are obtained from (2.9) involving flj � ljðfMÞg2Mj¼Mþ1;

(ii) HðappÞ½fM� ¼ ðk0 þ Dk0Þ þPM

j¼1ðkj þ DkjÞlj the approximate entropy of fM(x), where kj + Dkj are obtained from (2.9)

involving flj � ljðf Þg2Mj¼Mþ1 and k0 + Dk0 from (2.10);

(iii) error DH = H[fM] � H(app)[fM].

From Theorem 2, as M !1; fljðfMÞ ¼ ljðf Þg2Mj¼Mþ1 and then H[fM] = H(app)[fM]. From Theorem 1, as M ?1, H[fM] = H[f] and

then both H[fM] = H(app)[fM] and H[f] = H(app)[fM]. Equivalently, H[f] is evaluated using directly fljðf Þg2Mj¼1. The computational

cost amounts to solving a linear system (2.9) and a definite integral (2.10).

Remark 2.3. Relationships (2.9) and (2.10) are well known but unfortunately are used in an incorrect way by some authors.

Such relationships have been considered as a short way to calculate Lagrange multipliers fkjgMj¼1 through the solution of a linear

system avoiding the computational burden represented by (2.4). With such an improper procedure one replaces the sequence

fljðfMÞg2Mj¼Mþ1 (which will is known only after calculating fkjgM

j¼1 through (2.4)) with fljðf Þg2Mj¼Mþ1. Nevertheless, for high M

values (2.9) and (2.10) may be considered as a valuable tool to calculating fkjgMj¼1 solving a linear system and a definite integral,

without solving nonlinear optimization problem (2.4). Indeed, the replacing fljðfMÞg2Mj¼Mþ1 with fljðf Þg

2Mj¼Mþ1 becomes

meaningful as fM(x) = f(x) a.e., in virtue of the entropy convergence theorem. At the end, the entropy H½f � ¼ H½fM� ¼k0 þ

PMj¼1kjlj can be found in terms of fljðf Þg

2Mj¼1 only.

2.1. DH estimate

In the special case when the convergence of H[fM] to H[f] is fast, an estimate of DH = H[fM] � H(app)[fM] may be obtained asfollows.

Let us consider (2.9). If flj � ljðfMÞg2Mj¼Mþ1 are involved, then (2.9) may be written in matrix form as D2M � k = b, with

b ¼

1� 2l11� 3l2

..

.

1� ðM þ 1ÞlM

26664

37775; D2M ¼

l1 � l2 � � � lM � lMþ1

..

. ... ..

.

l1 � lMþ1 � � � lM � l2M

264

375diagð1;2; . . . ;MÞ:

If flj � ljðf Þg2Mj¼Mþ1 are involved, then (2.9) may be written as (D2M + d)(k + Dk) = b, where the matrix d is obtained as a dif-

ference of matrices whose entries are given by lj(fM) and lj(f), respectively d = D2M(lj � lj(fM)) � D2M(lj � lj(f)), j = 1, . . . ,2Mand taking into account lj(fM) = lj(f), j = 1, . . . ,M. Then d is an Hankel matrix with on and lower antidiagonal entries given byDlj = lj(fM) � lj(f), j = 1, . . . ,M. From (2.14) it follows kdk1 6 M

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ

pand from k ¼ D�1

2Mb and k + Dk = (D2M + d)�1bit follows

DH ¼ H½fM� � HðappÞ½fM � ¼ k0 þXM

j¼1

kjlj � ðk0 þ Dk0Þ �XM

j¼1

ðkj þ DkjÞlj ¼ �Dk0 �XM

j¼1

Dkjlj

and

jDHj ¼ jH½fM � � HðappÞ½fM�j 6 jDk0j þXM

j¼1

jDkjjlj 6 jDk0j þXM

j¼1

jDkjj ¼ jDk0j þ kkk1:

It remains to evaluate jDk0j and kkk1. If the convergence of H[fM] to H[f] is fast, so that from (2.14)

kD�12Mdk1 6 kD

�12Mk1kdk1 ¼ kD

�12Mk1M

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ

p� 1 and as well as ðI þ D�1

2MdÞ�1 ’ �D�12Md may be assumed, then it holds

kDkk1 ¼ kðkþ DkÞ � kk1 ¼ kðD2M þ dÞ�1b� D�12Mbk1 6 kðD2M þ dÞ�1 � D�1

2Mk1kbk1 ¼ k½D2MðI þ D�12MdÞ��1k1kbk1

6 kðI þ D�12MdÞ�1k1kD

�12Mk1kbk1 ’ k � D�1

2Mdk1kD�12Mk1kbk1 6 kdk1kD

�12Mk

21kbk1

6 M2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ

qkD�1

2Mk21kbk1 ð2:15Þ

5786 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

(last inequality takes into account both (2.14) and the bound of kdk1 previously calculated).For what Dk0 is concerned, from (2.13) we have

Dk0 ¼ ðk0 þ Dk0Þ � k0 ¼ ln

RD exp �

PMj¼1ðkj þ DkjÞxj

!dx

RD exp �

PMj¼1

kjxj

!dx

¼ lnZ

Dexp �k0 �

XM

j¼1

ðkj þ DkjÞxj

!dx

¼ lnZ

Dexp �k0 �

XM

j¼1

kjxj

!exp �

XM

j¼1

Dkjxj

!dx ’ ln

ZD

1�XM

j¼1

Dkjxj

!exp �k0 �

XM

j¼1

kjxj

!

¼ ln 1�XM

j¼1

Dkjlj

!’ �

XM

j¼1

Dkjlj;

from which

jDk0j 6XM

j¼1

jDkjjlj 6 kDkk1:

Then from (2.15) one obtains

jDHj 6 jDk0j þ kDkk1 6 2kDkk1 6 2M2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ

qkD�1

2Mk21kbk1;

the wanted bound for DH.

3. Numerical results

In order to investigate the numerical consistency of the proposed procedure we apply it to several cases in which the tar-

get pdf is known and then any number of its moments can be calculated, as well as its entropy. Once assigned flj � ljðf Þg2Mj¼1,

for increasing M values, H(app)[fM] is obtained from (2.9) and (2.10). From (2.5) and Theorem 2, H[f] = H[fM] = H(app)[fM] holdstrue for large M values. As H(app)[fM] is obtained through a simplified procedure, which turns out true for high M values only,we expect that, for some low M values, H(app)[fM] < H[f] or the sequence {H(app)[fM]}, M = 2,3, . . . fails to be monotonic decreas-ing without contradicting the MaxEnt procedure. Then the asymptotic value of H(app)[fM] provides the wanted value H[f].

1. HðappÞ½fM � ¼ k0 þPM

j¼1kjlj, with fkjgMj¼1 and k0 given by (2.9) and (2.10) respectively, using flj � ljðf Þg

2Mj¼1;

2. H[fM] obtained from fMðxÞ ¼ exp �PM

j¼0kjxj�

, whose Lagrange multipliers fkjgMj¼0 are calculated by (2.4);

3. exact entropy H[f];

4. relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� and H½fM ��H½f �H½f �

��� ���.All the computations have been done by high numerical precision, using Maple.

3.1. Preconditioner of D2M

For what concerns the numerical solution of (2.9), let us consider the matrix

D2M ¼

l1 � l2 � � � lM � lMþ1

..

. ... ..

.

l1 � lMþ1 � � � lM � l2M

26664

37775 ¼

1 � � � 1

..

. ... ..

.

1 � � � 1

2664

3775diagðl1; . . . ;lMÞ �

l2 � � � lMþ1

..

. ... ..

.

lMþ1 � � � l2M

26664

37775 ¼ Dð1Þ2M � Dð2Þ2M ð3:1Þ

splitted in two matrices, Dð1Þ2M and Dð2Þ2M; the last one is an Hankel matrix (for brevity D2M maintains same name as the matrixinvolved in (2.9)). Now D2M is a moments matrix relating Lagrange multipliers k with moments fljðfMÞg2M

j¼1 and it is ill-con-ditioned, with condition number comparable to that of Dð2Þ2M . This severe ill-conditioning motivates the search for a precon-ditioner for D2M that can reduce the illness. For instance, a comparison between Cond(D2M) and CondðDð2Þ2MÞ is illustrated inFig. 1 in order to prove the conjecture that both matrices have a comparable condition number, since both are momentsmatrices (here the involved moments come from f(x) which turns out used in next Example 4; the ratio

CondðD2MÞ=CondðDð2Þ2MÞ� 1=M

is reported, according to (3.2) below). Even if asymptotically CondðD2MÞ=CondðDð2Þ2MÞ� 1=M

! 1,a good preconditioner for Dð2Þ2M not necessarily is a good preconditioner for D2M.

It remains to estimate the condition number of Dð2Þ2M . A result in [10] (Theorem 3.2) states Hankel matrices Dð2Þ2M generatedby moments of a strictly positive weight function with support the interval [0,1] asymptotically are conditioned in the sameway as Hilbert matrices HM of the equal size, i.e.

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5787

Author's personal copy

limM!1

CondðDð2Þ2MÞ� 1=M

¼ limM!1ðCondðHMÞÞ1=M ’ e3:525: ð3:2Þ

Then Hilbert matrix HM is a good preconditioner for this class of matrices [10] (Theorem 3.3), as the conditioning of the pre-

conditioned matrix grows at most linearly with M. It must be also observed that the matrix Dð2Þ2M , whose first entry is given byl2, may be considered as an Hankel matrix generated by moments of a positive weight function x2f(x), with pdf f(x). Then

Dð2Þ2M , up the multiplicative constant, may be considered a moments matrix of a pdf, with condition number comparable with

a Hilbert matrix of the same size. Then it is meaningful the use of HM as a preconditioner of Dð2Þ2M and then of D2M in (2.9).Thus we can indirectly solve (2.9), written as D2Mk = b, by solving the preconditioned system H�1

M D2Mk ¼ H�1M b or, equiv-

alently CMD2MC0My ¼ CMb, where C0MCM ¼ H�1M and C0My ¼ k and C0 stands for transposed of C. Here CM = [cij] is the inverse of

the lower triangular Cholesky factor of the Hilbert matrix HM with entries cij ¼ ð2iþ 1Þ1=2ð�1Þj ij

� �iþ j

j

� �, i,

j = 0,1, . . . ,M � 1 ([21] for more details). By solving the preconditioned system higher M values may be taken into accountwith consequent improved estimate of H[f].

A comparison between condition number of D2M given by (2.9) and preconditioned matrix CMD2MC0M , for increasing M, isillustrated in Fig. 2 (some experiments with different f(x) provide comparable results). Here the entries of D2M are the mo-ments of f(x) used in Example 4. The improvement of conditioning is not trivial.

Remark 3.1. In [10, Theorem 3.2] strictly positive weight functions f(x) with support [0,1] are considered, i.e., f(x) P n > 0,"x 2 [0,1]. Such special weight functions are conditioned as same order Hilbert matrices. On the other hand in [15] it wasproved each Hankel matrix D2M generated by moments of MaxEnt density fM(x) has condition number satisfyingCond(D2M) P 24M�2. Taking into account Theorems 1 and 2 we may conclude that each Hankel matrix generated bymoments of a density f(x), with f(x) P 0, "x 2 [0,1] has condition number asymptotically satisfying Cond(D2M) P 24M�2.Then, in some sense, results in [10, Theorem 3.3] may be extended to non negative weight functions f(x) P 0 and then HM

may be used as a preconditioner in all next examples.

3.2. Numerical examples

All the below considered densities have support [0,1] and all next examples are borrowed from [20]. Each chosen pdf issuch that its moments are available in closed form involving some special functions such that.

2 3 4 5 6 7 8 9 101

1.1

1.2

1.3

1.4

M

Con

ditio

n ra

tio

Fig. 1. Condition number ratio ðCondðD2MÞ=CondðDð2Þ2MÞÞ1=M for increasing M, from (3.1).

2 3 4 5 6 7 8 9 10 11100

1010

1020

M

Con

ditio

n N

umbe

r

Fig. 2. Condition number of D2M (⁄) and CMD2MC0M (+) for increasing M.

5788 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

1. Gamma function: CðaÞ ¼R1

0 ta�1e�tdt [20, 8.310, p. 892].2. Incomplete Gamma function: Cincðx; aÞ ¼ 1

CðaÞR x

0 ta�1e�tdt [20, 8.350, p. 899].

3. Beta function: Bða; bÞ ¼R 1

0 xa�1ð1� xÞb�1dx ¼ CðaÞCðbÞCðaþbÞ [20, 8.380, p. 908]; the change of variable x = sin2t yields the trigono-

metric representation of Beta function [20, 3.621.5, p. 395]R p=2

0 sina�1 t cosb�1 tdt ¼ 12 Bða=2; b=2Þ. With b = 1 and a integer

B(a,b) takes special expressions.4. Digamma function: wðxÞ ¼ d

dx ln CðxÞ [20, 8.360, p. 902].

Example 1. Let us consider the pdf

f ðxÞ ¼ Ae�xn; A ¼ n

C 1n

� �Cinc 1; 1

n

� � ; n 2 Rþ:

The change of variable xn = t allows us to calculate the above normalizing constant A, moments and entropy in terms ofIncomplete Gamma function

lj ¼C jþ1

n

� Cinc 1; jþ1

n

� C 1

n

� �Cincð1; 1

nÞ; j ¼ 0;1; . . .

(see also [20, 8.326.10, p. 337]) and entropy

H½f � ¼ � lnn

C 1n

� �Cinc 1; 1

n

� �" #

þC nþ1

n

� �Cinc 1; nþ1

n

� �C 1

n

� �Cinc 1; 1

n

� � :

The density f(x) is MaxEnt density with characterizing moment EðXnÞ. Then, whenever n 2 N, once assigned fljgnj¼0 MaxEnt

fnðxÞ ¼ exp �Pn

j¼0kjxj�

coincides with f(x) and H[fn] = H[f].

The example, where n = 15 is assumed and then H[f] ’ �0.00980278980538, has as a goal to prove experimentally (1.1)which states higher moments may be well approximated by a sequence of lower moments. Then the information content isspread among lower moments (see Fig. 3). Since H[f] ’ 0 the bound (1.2) becomes meaningful. The latter provides theestimate H[f] 6 �0.001987.

Example 2. Let us consider a Beta density

f ðxÞ ¼ 1Bðp; qÞ x

p�1ð1� xÞq�1 with p ¼ 4; q ¼ 2;

2 3 4 5 6 7 8 9 10 11

−9

−8

−7

−6x 10−3

M

−10

H(a

pp) [f M

]

2 3 4 5 6 7 8 9 10 1110−10

10−5

100

M

⏐ (H

(app

) [f M]−

H[f]

)/H[f]

(a)

(b)

Fig. 3. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� (⁄) and H½fM ��H½f �H½f �

��� ��� (�) for Example 1.

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5789

Author's personal copy

with moments [20, 3.199, p. 318] obtained directly in terms of Beta function

lj ¼Cðpþ qÞCðpþ jÞCðpþ qþ jÞCðpÞ ; j ¼ 0;1; . . .

and Shannon entropy given in terms of Beta and Digamma function

H½f � ¼ ln Bðp; qÞ � ðp� 1Þ½wðpÞ � wðpþ qÞ� � ðq� 1Þ½wðqÞ � wðpþ qÞ� ¼ ln 20þ 79=30 ’ �0:3623989402206575:

Now H(app)[fM], H[fM] for increasing M and H[f] are reported in Fig. 4. Here H[fM], M = 2, . . . ,8 is reported. Although H(app)[fM] becalculated through an approximate procedure, its value is more accurate than H[fM]. This seems reasonable: indeed H(app)[fM]involves 2M moments, whilst H[fM] involves the first M moments.

Example 3. Let us consider the density

f ðxÞ ¼ � 2p lnð2Þ

lnðxÞffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� x2p :

Starting from IðaÞ ¼R 1

0xaffiffiffiffiffiffiffiffi

1�x2p dx, from which d

da IðaÞ ¼R 1

0xa logðxÞffiffiffiffiffiffiffiffi

1�x2p dx. The change of variable x = sint leads to IðaÞ ¼

R p=20 sina tdt, so

that the wanted integral is given in term of Beta function derivatives. When a takes integer values, after some algebra oneobtains moments (see also [20], 4.241–2, p. 538)

l0 ¼ 1;

l2n ¼ �2

p lnð2Þ

2n

n

� �p

22nþ1 � lnð2Þ þX2n

k¼1

ð�1Þk�1

k

!; n ¼ 1;2; . . . ;

l2nþ1 ¼ �2

p lnð2Þ22n

ðnþ 1Þ2nþ 1

n

� � lnð2Þ þX2nþ1

k¼1

ð�1Þk

k

!; n ¼ 0;1;2; . . . ;

whilst the entropy, obtained numerically, is

H½f � ’ �0:3179752927304:

Numerical results are reported in Fig. 5 and all the considerations are analogous to Example 2.

2 4 6 8 10 12 14

−0.36

−0.35

−0.34

M

H(a

pp) [f M

]

2 4 6 8 10 12 1410−4

10−3

10−2

10−1

M

⏐ (H

(app

) [f M]−

H[f]

)/H[f]

(a)

(b)

Fig. 4. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� (⁄) and H½fM ��H½f �H½f �

��� ���(�) for Example 2.

5790 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

Example 4. Let us consider the density

f ðxÞ ¼ � 8pð1þ 2 lnð2ÞÞ lnðxÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� x2p

:

. Moments and entropy are obtained with procedure similar to previous example ([20], 4.241.3–4, p. 538)

l0 ¼ 1;

l2n ¼8

pð1þ 2 lnð2ÞÞ

2nn

� �p

22nþ2ðnþ 1Þlnð2Þ þ 1

2nþ 2þX2n

k¼1

ð�1Þk

k

!; n ¼ 1;2; . . . ;

l2nþ1 ¼ �8

pð1þ 2 lnð2ÞÞ22nþ1

ðnþ 1Þðnþ 2Þ2nþ 3nþ 1

� � lnð2Þ � 12nþ 3

þX2nþ1

k¼1

ð�1Þk

k

!; n ¼ 0;1;2; . . .

and

H½f � ’ �0:508931221953:

Numerical results are reported in Fig. 6 and all the considerations are analogous to Example 2.

Example 5. Let us consider the truncated Cauchy density

f ðxÞ ¼ 4pð1þ x2Þ :

Here odd and even lj are calculated through the stable recursive relationship

lj ¼4p

1j� 1

� lj�2; j P 2; l0 ¼ 1; l1 ¼2p

ln 2;

(closed formulas are in ([20], 2.146.1, p. 76)) whilst H[f] ’ �0.0215137302739. f(x) may be written as

f ðxÞ ¼ eln 4

pð1þx2 Þ

� ’ eln4

p�x2þx4=2�x6=3þ��� and then f(x) may be considered as MaxEnt distribution constrained by integer moments.It is reasonable expect few integer moments are able to catch the information content of f(x). Numerical results are reported

0 5 10 15 20−0.32

−0.31

−0.3

M

H(a

pp) [f M

]

0 5 10 15 2010−4

10−3

10−2

10−1

M

⏐ (H

(app

) [f M]−

H[f]

)/H[f]

(a)

(b)

Fig. 5. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� (⁄) and H½fM ��H½f �H½f �

��� ��� (�) for Example 3.

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5791

Author's personal copy

in Fig. 7; entropy decreasing is fast as M increases and the conjecture about the information content of f(x) is proved. HereMaxEnt distributions are calculated until M = 6 before incurring in numerical instability. Since H[f] is close to zero the bound(1.2) becomes meaningful. The latter provides the estimate H[f] 6 �0.003338.

0 2 4 6 8 10 12−0.022

−0.0215

−0.021

−0.0205

M

H(a

pp) [f M

]

0 2 4 6 8 10 1210−15

10−10

10−5

100

M

⏐ (H

(app

) [f M]−

H[f]

)/H[f]

(a)

(b)

Fig. 7. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� (⁄) and H½fM ��H½f �H½f �

��� ��� (�) for Example 5.

0 5 10 15 20−0.51

−0.5

−0.49

M

H(a

pp) [f M

]

0 5 10 15 2010−4

10−3

10−2

10−1

M

⏐ (H

(app

) [f M]−

H[f]

)/H[f]

(a)

(b)

Fig. 6. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �

��� ��� (⁄) and H½fM ��H½f �H½f �

��� ��� (�) for Example 4.

5792 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

Example 6. In this example we consider a pdf f(x) so that the procedure to estimate f(x) from fljg1j¼0 completely fails. Let us

consider the highly oscillating density

f ðxÞ ¼ A sin2ðpxxÞ; x� 1;

with A ¼ 21�sin 2px

2px. Taking into account well known relationships

sin2ðpxxÞ ¼ 12ð1� cosð2pxÞÞ and lim

x!1

Z 1

0f ðxÞ cosð2pxxÞdx ¼ 0;

for each piecewise continuous function f(x), we have as x ?1

A ¼ 2;

ln ¼ AZ 1

0xn sin2ðpxxÞdx ¼ A

2

Z 1

0xndx� A

2

Z 1

0xn cosð2pxxÞdx ¼ 1

nþ 1

(which coincides with the moments of the uniform distribution); and, after some algebra,

H½f � ¼ �Z 1

0A sin2ðpxxÞ lnðA sin2ðpxxÞÞdx ¼ �1þ ln 2 ’ �0:30685281944:

Then H[f] ’ �0.30685281944, while H(app)[fM] = H[fM] = 0,"M, since ln ¼ 1nþ1 implies that MaxEnt density fM(x) � 1, the uni-

form distribution and from (2.9) and (2.10) k1 = � � � = kM = 0, k0 = 1 holds. Then, if f(x) differs from a ‘‘reasonable density’’, theoutlined procedure may completely fail in evaluating H[f] from fljðf Þg

1j¼0. If several moments are unable to squeeze the rel-

evant part of information about f(x), it should mean that moments represent an unsuitable tool. As a consequence, we shouldturn our attention to more proper expected values.

Remark 3.2. Numerical results highlight

1. H(app)[fM] P H[f],"M;2. the sequence {H(app)[fM]} is monotonic decreasing.

Hence we prove the following assumptions through an heuristic reasoning:

(a) as M ?1, H(app)[fM] � H[fM] P H[f], being lj(fM) � lj(f), j = 1, . . . ,2M;(b) for each fixed M, H(app)[fM] takes into account lj(f), j = 1, . . . ,2M, while H(app)[fM�1] considers lj(f), j = 1, . . . ,2M � 2.

Then, according to physical meaning of H(app)[fM], it happens that H(app)[fM�1] > H(app)[fM] proving that the sequence{H(app)[fM]} is monotonic decreasing. Combining the previous two facts, we obtain the following criterion for the choiceof the number M of moments that yield the best approximation H(app)[fM] before incurring in numerical instability(equivalently, the sequence {H(app)[fM]} fails to be monotonic decreasing):

M � j : minj

HðappÞ½fM �:

The failure comes from an insufficient machine precision.

4. Conclusions and discussion

The problem of evaluating Shannon entropy of a distribution, whose only the sequence of integer moments is given, isconsidered. Shannon entropy is directly given in terms of integer moments without solving the underlying moment problem.Known results about both entropy-convergence of MaxEnt approximation and Kullback–Leibler distance allowed us to reachthe goal. Ill-conditioning issues affecting the linear system to be solved may be mitigated by a proper pre-conditioning tech-nique with Hilbert matrix as a preconditioner. Numerical results prove the moments are a suitable tool to gather the infor-mation content of a distribution, even if an accurate entropy estimate requires the involving a high number of moments. Thecommon opinion diffused in literature that the information content of a distribution is spread among the sequence of infinitemoments is now supported by numerical evidence. If, in the approximation framework, the discrepancy between underlyingdensity and its approximation is measured by Kullback–Leibler distance then the numerical tests allow us the followingconclusions.

(a) a high number of moments have to be involved with consequent issues of ill-conditioning;(b) more suitable expected values, as fractional moments EðXaÞ ¼

RD xaf ðxÞdx, a 2 R, have to be involved for an accurate

recovering. Two of the authors of this paper have suggested the use of fractional moments in some past papers[22,23]. Numerical results illustrated above support that theoretical conjecture.

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5793

Author's personal copy

(c) As a further alternative to integer moments in [24] the use of shifted Chebyshev polynomials TnðxÞ ¼ Tnð2x� 1Þ, withTn(x) = cos[ncos�1(x)] is suggested. More accurate results obtained, than integer moments, can be explained by takinginto account the superior minimax property of the Chebyshev polynomials and the consequent analytical form of the

MaxEnt solution fMðxÞ ¼ exp �PM

j¼0kjTj ðxÞ

� for Chebyshev moments. Indeed, in presence of the term kjx

j in the Max-

Ent solution (2.1) for integer moments makes it very difficult to exploit information from the higher order moments inthe interval [0,1]. Such an issue does not appear in the formulation of the problem via Chebyshev moments owing tothe bounded nature of Chebyshev polynomials and more regular sampling of the interval, and consequently provides away to incorporate systematically the information from the higher moments in the reconstruction process.

The procedure outlined here above for the Hausdorff case might be likewise extended to Stieltjes and Hamburger cases.Indeed, in Stieltjes case (in Hamburger case the considerations are similar; even values of M are requested) higher momentslj(fM), j > M may be found in terms of fljg

Mj¼1 and fkjgM

j¼1 integrating by parts (2.2) (here D � Rþ) with the following result

XM

k¼1

kkklkþj ¼ ð1þ jÞlj; j ¼ 0; . . . ;M � 1 ð4:1Þ

and

k0 ¼ lnZ 1

0exp �

XM

k¼1

kkxk

!dx: ð4:2Þ

In analogy with Hausdorff case, MaxEnt density fM converges in entropy to f [25] and then in directed divergence, so thatlimM?1fM(x) = f(x) almost everywhere. Then, starting from high values of M we may assume lk+j(fM) = lk+j(f), for each j, k,with j + k P M, where lk+j(f) are known quantities. (4.1) may be considered a linear system which provides (k1, . . . ,kM).The linear system (4.1) admits solution, being the involved matrix an Hankel matrix. Nevertheless two main issues are stillopen.

1. A suitable preconditioner for Hankel matrices coming from Stieltjes (or Hamburger) moment problem is lacking in lit-erature. Then high M values cannot be used in (4.1).

2. How to guarantee kM P 0 in (4.1) to ensure integrability in (4.2).

The last drawback may be solved as follows. An usual practice amounts to truncate the support Rþ, so that Stieltjesmoment problem is solved within a proper large enough interval [0,b] (b might be chosen according to Chebyshev inequality

PðjX � l1jP �Þ 6 l2�l21

�2 ), changing the original Stieltjes problem into Hausdorff moment problem (in such a case kM mayassume any real value). In MaxEnt setup the latter admits a solution for each sequence fljg

Mj¼1 inner to moment space

[26, Theorem 4]. On the contrary, in the Stieltjes case the moment space of Maxent fM is a proper subset of the moment spaceand then, for an arbitrary set fljg

Mj¼1, the existence of fM is based only on the numerical evidence [27, Theorem 3]. Then we

might be led to the unpleasant fact that a MaxEnt solution always exists, although the original Stieltjes moment problemdoesn’t admit any solution. In conclusion, the extension to Stieltjes (or Hamburger) case of the illustrated technique tocalculating the entropy directly from the moments sequence is not obvious.

References

[1] J.A. Shohat, J.D. Tamarkin, The Problem of Moments, Math. Surveys, vol. 1, American Mathematical Society, Providence, RI, 1970.[2] S. Karlin, L.S. Shapley, Geometry of Moment Spaces, AMS Memoirs 12, 1953, Providence, RI.[3] B.G. Lindsay, P. Basak, Moments determine the tail of a distribution (but not much else), Am. Statist. 54 (4) (2000) 248–251.[4] N.I. Akhieser, The Classical Moment Problem and Some Related Questions in Analysis, Hafner, New York, 1965.[5] M. Goria, A. Tagliani, Bounds on the tail probability and absolute difference between two distributions, Commun. Statist. Theory Methods 32 (2003)

519–532.[6] P.N. Gavriliadis, G.A. Athanassoulis, Moment information for probability distributions, without solving the moment problem. II: Main-mass, tails and

shape approximation, J. Comput. Appl. Math. 229 (2009) 7–15.[7] P.N. Gavriliadis, Moment information for probability distributions, without solving the moment problem. I: Where is the mode?, Commun Statist.

Theory Methods 37 (2008) 671–681.[8] R.M. Mnatsakanov, Hausdorff moment problem: reconstruction of probability density functions, Statist. Probab. Lett. 78 (2008) 1869–1877.[9] R.M. Mnatsakanov, Hausdorff moment problem: reconstruction of distributions, Statist. Probab. Lett. 78 (2008) 1612–1618.

[10] D. Fasino, Spectral properties of Hankel matrices and numerical solutions of finite moment problems, J. Comput. Appl. Math. 65 (1995) 145–155.[11] B. Beckermann, The condition number of real Vandermonde, Krylov and positive definite Hankel matrices, Numer. Math. 85 (2000) 553–577.[12] E.E. Tyrtyshnikov, How bad are Hankel matrices?, Numer Math. 67 (1994) 261–269.[13] H.K. Kesavan, J.N. Kapur, Entropy Optimization Principles with Applications, Academic Press, London, 1992.[14] A. Tagliani, Entropy estimate of probability densities having assigned moments: Hausdorff case, Appl. Math. Lett. 15 (2002) 309–314.[15] A. Tagliani, Hausdorff moment problem and maximum entropy: a unified approach, Appl. Math. Comput. 105 (1999) 291–305.[16] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley-Interscience, Hoboken, New Jersey, 2006.[17] E.V. Volpe, D. Baganoff, Maximum entropy PDFs and the moment problem under near-Gaussian conditions, Probab. Eng. Mech. 18 (2003) 17–29.[18] E.V. Volpe, Closure relations as determined by the maximum entropy method and near-equilibrium conditions, Ph.D. Thesis, 2000, Stanford University.[19] S. Kullback, Information Theory and Statistics, Dover, New York, 1968.[20] I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products, in: A. Jeffrey, D. Zwillinger (Eds.), Academic Press, New York, 2007.

5794 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795

Author's personal copy

[21] G. Talenti, Recovering a function from a finite number of moments, Inverse Probl. 3 (1987) 501–517.[22] P.L. Novi Inverardi, A. Petri, G. Pontuale, A. Tagliani, Hausdorff moment problem via fractional moments, Appl. Math. Comput. 144 (1) (2003) 61–74.[23] H. Gzyl, A. Tagliani, Hausdorff moment problem and fractional moments, Appl. Math. Comput. 216 (2010) 3319–3328.[24] P. Biswas, H. Shimoyama, L.R. Mead, Lyapunov exponents and the natural invariant density determination of chaotic maps: an iterative maximum

entropy ansatz, J. Phys. A – Math. Theory 43 (2010) 1–12.[25] M. Frontini, A. Tagliani, Entropy-convergence in Stieltjes and Hamburger moment problem, Appl. Math. Comput. 88 (1997) 39–51.[26] M. Frontini, A. Tagliani, Hausdorff moment problem and maximum entropy: on the existence conditions, Appl. Math. Comput. 218 (2011) 430–433.[27] A. Tagliani, Maximum entropy solutions in unbounded domains, Appl. Math. Lett. 16 (4) (2003) 519–552.

M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5795