moment information and entropy evaluation for probability densities
TRANSCRIPT
This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies areencouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Moment information and entropy evaluation for probability densities
Mariyan Milev a, Pierluigi Novi Inverardi b, Aldo Tagliani b,⇑a Dept. of Informatics and Statistics, University of Food Technologies, 4002 Plovdiv, Bulgariab Dept. of Computer and Management Sciences, University of Trento, 38100 Trento, Italy
a r t i c l e i n f o
Keywords:EntropyEntropy convergenceHausdorff moment problemKullback–Leibler distanceMaximum entropy
a b s t r a c t
How much information does the sequence of integer moments carry about the correspond-ing unknown absolutely continuous distribution? We prove that a reliable evaluation ofthe corresponding Shannon entropy can be done by exploiting some known theoreticalresults on the entropy convergence, uniquely involving exact moments without solvingthe underlying moment problem. All the procedure essentially rests on the solution of lin-ear systems, with nearly singular matrices, and hence it requires both calculations in highprecision and a pre-conditioning technique. Numerical examples are provided to supportthe theoretical results.
� 2011 Elsevier Inc. All rights reserved.
1. Introduction
Let be X an absolutely continuous random variable with support D. The moment problem consists of recovering anunknown probability density function (pdf) f ðxÞ 2 L2
D from the knowledge of its associated sequence fljgMj¼0 of integer
moments, with M finite or infinite, where lj ¼ EðXjÞ ¼R
D xjf ðxÞdx, j P 0, l0 = 1. In actual practice, the problem consists of
determining approximations to the unknown density function f(x) from a finite collection fljgMj¼0 of its integer moments.
In such a case the moment problem is called ‘‘truncated moment problem’’, else, if M is infinite the problem is called ‘‘fullmoment problem’’. If D is bounded, typically D = [0,1], we say ‘‘Hausdorff moment problem’’, whilst if D ¼ Rþ Stieltjesmoment problem and if D ¼ R Hamburger moment problem. In Hausdorff case the infinite sequence of moments determinesa unique distribution (we say determinate moment problem), unlike Stieltjes and Hamburger cases where supplementaryconditions are requested to guarantee an unique distribution. The above terminology and general results about existenceand determinacy are found in the classical book of Shohat and Tamarkin [1]. Once assigned the first moments fljg
Mj¼1 we
define the Mth moment space VM � RMþ as the convex hull of the curve {x,x2, . . . ,xM}, x 2 D; in Hausdorff case VM is closed,
bounded and convex [2, Theorem 7.2] and its geometrical properties are well described in [2].In the last decade several papers have been devoted on determining how the partial information contained in the finite
sequence of integer moments can be used to describe various distributional characteristics and properties. Lindsay and Basak[3] show tail bounds for an unknown distribution in terms of moments by exploiting a classical bounding result due toAkhiezer [4] and Shohat and Tamarkin [1]. Goria and Tagliani [5] present a moment bound for the absolute differencebetween two distributions, based on maximum entropy (MaxEnt) method; Gavriliadis and Athanassoulis [6] and Gavriliadis[7] obtain some results about the separation of the main mass interval, the tail interval and the position of the mode. In somerecent papers Mnatsakanov [8,9] provides a procedure to recover a probability density function (pdf) f(x) and the associateddistribution function F(x) directly from the associated moments without resorting to an approximation f (app)(x) of f(x) or
0096-3003/$ - see front matter � 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.amc.2011.11.093
⇑ Corresponding author.E-mail addresses: [email protected] (M. Milev), [email protected] (P.N. Inverardi), [email protected] (A. Tagliani).
Applied Mathematics and Computation 218 (2012) 5782–5795
Contents lists available at SciVerse ScienceDirect
Applied Mathematics and Computation
journal homepage: www.elsevier .com/ locate /amc
Author's personal copy
F (app)(x) of F(x); in the framework of the Hausdorff moment problem Mnatsakanov aims to construct a stable approximationf (app)(x) of f(x) from its moments sequence and proves some asymptotic properties of the approximant and finally derives theuniform and L1-rates of convergence.
All the previous results come from an obvious fact that is, in a determinate moment problem the information contentabout a distribution is spread on the infinite sequence of its moments. In the present paper we run along the guidelinesof the above quoted papers. In more details, we tackle the issue to determining the Shannon-entropy of a distribution withsupport D = [0,1] and admitting pdf f(x), without solving the inverse moment problem, avoiding consequently all the insta-bility drawbacks related to the solution of inverse problem. A viable route to calculating Shannon-entropyH½f � ¼ �
RD f ðxÞ ln f ðxÞdx consists in approximating f(x) by a density f (app)(x), constrained by a finite set of moments. ThenR
D xjf ðappÞðxÞdx ¼R
D xjf ðxÞdx, j = 0, . . . ,M and H[f] is replaced by H½f ðappÞ� ¼ �R
D f ðappÞðxÞ ln f ðappÞðxÞdx.The procedure that we propose here requires an accurate recovering of f(x) through f(app)(x) by involving a high number M
of moments fljgMj¼0. In general, high M values arise ill-conditioning issues in the involved Hankel matrices. In Hausdorff case,
for instance, [10] proved that all Hankel matrices generated by moments of positive density functions are conditioned essen-tially the same as Hilbert matrices. Then each order M Hankel matrix has condition number which grows at exponentiallyrate ’e3.525M for large M. Condition number of Hankel matrices generated by moments arising from a distribution with sup-port Rþ or R may be found in [11,12]. All the latter matrices have condition number growing at exponentially rate. Then, totake ill conditioning into account, only few moments have to be involved with a consequent poor estimate of H[f], unless asuitable preconditioner is available, as in Hausdorff case [10]. Indeed, although the infinite sequence of moments fljg
1j¼0
completely determines the unknown distribution, the information content carried out by each moment lj depends on thesequence itself; in general, it could be poor or even insignificant when the corresponding distribution exhibits fat tails aswell as few moments may sometimes characterize completely the distribution: in such a case they coincide with the ‘‘char-acterizing moments’’. We recall that the characterizing moments of a probability distribution are the moment constraintssubject to which the probability distribution is obtained as the MaxEnt probability distribution [13, p. 135]. They are ableto synthesize all the relevant information on the distribution and characterize it.
For instance, if D = [0,1], when M takes a high value from Taylor expansion one has xM ’P
n�M x� 12
� �n. Taking theexpected value one gets
lM ¼ EðXMÞ ’Xn�M
n¼0
EðX � 12Þn ¼
Xn�M
n¼0
Xn
j¼0
n
j
� ��1
2
� �n�j
lj: ð1:1Þ
Equivalently, higher moments may be obtained combining lower order moments. Heuristically speaking, different powers ofx differ very little from each other if x is restricted by 0 < x < 1 and the exponents are large. Then a high number of momentshas to be involved in an accurate recovering with the consequent risk to run into numerical instability.
An upper bound for entropy H[f] uniquely in terms of first M + 1 moments l = (l0, . . . ,lM) had been provided [14, Eq.
(3.8)]. If the moment vector lU ¼ 1jþ1
n oM
j¼0of the uniform distribution in [0,1] is considered, it holds
H½f � 6 infM� kl� lUk2
2
2kmaxfl;lUgk1
" #: ð1:2Þ
The bound (1.2) is tight whenever the moment vector l is close enough to the moment vector lU, or equivalently H[f] ’ 0(see [14] for details).
Exploiting previous results of convergence characterizing MaxEnt approximations of f(x) we try to bypass that instabilityand we illustrate a viable way to calculate the entropy of the underlying distribution uniquely in terms of moments. Theneeded procedure requires uniquely the solution of linear systems and the evaluation of definite integrals ((2.9) and(2.10) below). An accurate evaluation of H[f] requires both a high number of moments and a high precision computationalprocedure jointly with a suitable preconditioner for the matrix involved in the linear system.
The paper is organized as follows. In Section 2 the MaxEnt problem is formulated and convergence theorems, as entropy,directed divergence and almost everywhere are reviewed. Based upon above theorems an approximate procedure for entro-py evaluation is formulated. In Section 3 a suitable preconditioner for solving ill conditioned linear systems is illustrated, aswell as numerical examples validating the above approximate method. In Section 4 expected values to be used in MaxEntsetup alternative to integer moments are suggested. An extension to Stieltjes or Hamburger moment problems of the aboveprocedure for approximate entropy evaluation is discussed.
2. Formulation of the problem
Let us consider a random variable X with support D = [0,1], admitting pdf f(x) with moments fljg1j¼0 and underlying deter-
minate moment problem (so called Hausdorff moment problem). For practical purposes, only a finite sequence fljgMj¼0 has to
be considered. Then there exist an infinite set of densities f (app)(x) compatible with the information carried from the finitemoment sequence and hence a choice criterion is required. The MaxEnt approach is widely used in the practice and providesthe approximation of f(x) [13]
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5783
Author's personal copy
f ðappÞðxÞ ¼: fMðxÞ ¼ exp �XM
j¼0
kjxj
!; ð2:1Þ
constrained byZD
xjfMðxÞdx ¼ lj; j ¼ 0; . . . ;M; ð2:2Þ
with Shannon entropy
H½fM � ¼ �Z
DfMðxÞ ln fMðxÞdx ¼
XM
j¼0
kjlj ð2:3Þ
and 0 P H[f1] P � � �P H[fM] P � � �P H[f] as is known from the MaxEnt context. Lagrange multipliers (k0, . . . ,kM) are evalu-ated by solving the minimization problem [13]
fkjgMj¼1 : min
k1 ;...;kM
lnZ
Dexp �
XM
j¼1
kjxj
!dx
!þXM
j¼1
kjlj
" #ð2:4Þ
and
k0 ¼ lnZ
Dexp �
XM
j¼1
kjxj
!dx:
Then a direct evaluation of H[fM] is obtained by H½fM � ¼PM
j¼0kjlj. A moment problem solution produces instability issueswhen a high number M of moments is involved. On the other hand the entropy H[fM] of fM(x) represents one of the mainindicators to test the reliability of the approximation. Combining entropy convergence (Theorem 1 below) and the obvioussequence of inequalities 0 P H[f1] P � � �P H[fM] P � � �P H[f] the criterion for the choice of the number M of moments thatyield the best approximation fM(x) before incurring in numerical instability (equivalently, the sequence {H[fj]} fails to bemonotonic decreasing) should be
M � j : minj
H½fj�:
This fact is due to conditioning of H[fM](lM) (which we consider depending on lM only, whilst the remaining moments(l0, . . . ,lM�1) are held fixed), whose value is given by Tagliani [15]
CondðH½fM�Þ ¼H½fM�ðlM þ DlMÞ � H½fM�ðlMÞ
H½fM�
�DlM
lM
��������P eps
ðlMÞ2
jH½f �j 24M�3;
where eps = jDlMj/lM may be identified with the machine error. Then the condition of H[fM] grows exponentially and justi-fies the use of a low number M of integer moments in recovering fM(x) by (2.4). Such a drawback may be bypassed by resort-ing to some known theoretical results, arising an alternative procedure to calculate the entropy.
In the context of MaxEnt the following convergence result holds [15].
Theorem 1. Whenever the underlying moment problem is determinate, MaxEnt approximations converge in entropy, as Mincreases, to f(x), i.e.
limM!1
H½fM� ¼ H½f �: ð2:5Þ
Denote I½f ; fM � ¼R
D f ðxÞ ln f ðxÞfM ðxÞ
dx the Kullback–Leibler distance between f(x) and fM(x); since f(x) and fM(x) share same first M
moments fljgMj¼0, it follows [13]
H½fM � � H½f � ¼Z
Df ðxÞ ln f ðxÞ
fMðxÞdx: ð2:6Þ
Combining (2.5) with (2.6) it follows that
limM!1
H½fM� � H½f � ¼ limM!1
ZD
f ðxÞ ln f ðxÞfMðxÞ
dx ¼ 0: ð2:7Þ
Invoking the following result.
Theorem 2. [16, p. 252]
I½f ; fM �P 0;
with equality if and only if f and fM coincide almost everywhere.
5784 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
we conclude that limM?1fM(x) = f(x) almost everywhere. Combining entropy convergence and Kullback–Leibler distance,the following approximate procedure to calculating the entropy arises. For high M values we have
lj ¼ ljðf Þ ¼Z
Dxjf ðxÞdx ¼ ljðf ðappÞÞ ¼
ZD
xjf ðappÞðxÞdx; j P M: ð2:8Þ
Once calculated fMðxÞ ¼ exp �PM
j¼0kjxj�
, higher moments lj(fM), j > M may be found in terms of fljgMj¼1 and fkjgM
j¼1 integrat-ing by parts (2.2) with the following result
XM
k¼1
kkkðlk � lkþjÞ ¼ 1� ðjþ 1Þlj; j ¼ 1; . . . ;M: ð2:9Þ
The linear system (2.9) admits solution being an identity relating k = (k1, . . . ,kM) with fljðfMÞg2Mj¼Mþ1. From (2.2) (with j = 0), k0
may be obtained integrating fM(x) to one
k0 ¼ lnZ 1
0exp �
XM
j¼1
kjxj
!dx: ð2:10Þ
Remark 2.1. (2.9) is not new in literature. Similar relationships had been previously obtained by several authors. We restrictourselves to recall the system of moment equations obtained in [17,18] for the Hamburger case (here an even number M ofmoments has to be considered and D � R)
Han2M � ½k1;2k2; . . . ;MkM�0 ¼ ½0;1l0;2l1; . . . ; ðM þ 1ÞlM �0; ð2:11Þ
with rectangular matrix Han2M
Han2M ¼
l0 � � � lM�1
..
. ... ..
.
lMþ1 � � � l2M
2664
3775:
Here the authors [17] observe what is remarkable about (2.11) is that, not only they are exact, but they exhibit a strikinglysimple bilinear form, despite the strong non-linearity of the involved pdf fM. From the third equation on, (2.11) involves mo-ments of higher order than those which were constrained. Therefore, these equations can be used to obtain successfulapproximations to the closure relations interesting the Fluid Dynamics, i.e. to express the higher order moments in termsof the lower order ones and Lagrange multipliers (k1, . . . ,kM).
Remark 2.2. Alternatively, k0 may be obtained directly from fljgMj¼1 and fkjgM
j¼1 integrating by parts (2.2) (with j = 0)
k0 ¼ � ln
PMj¼1
jkjlj�1
1� exp �PMj¼1
kj
!266664
377775 ð2:12Þ
or
k0 ¼ � ln 1�XM
j¼1
jljkj
!exp
XM
j¼1
kj
!" #: ð2:13Þ
Being the procedure (2.9) for calculating k = (k1, . . . ,kM) only approximate, the use of (2.12) or (2.13) leads to inaccurate re-sults, as clarified by their numerical analysis. The procedure (2.9) for calculating k = (k1, . . . ,kM) is approximate in the follow-ing sense: for a fixed M, (2.9) involves lj(fM), which are unknown for j = M + 1, . . . ,2M, whilst lj(fM) = lj(f), j = 1, . . . ,M areknown. For practical purposes such lj(fM), j = M + 1, . . . ,2M are replaced by the assigned lj(f), with error decreasing as M in-creases, given as follows.
From Pinsker inequality [19, p. 390] and (2.6)
H½fM� � H½f � ¼Z
Df ðxÞ ln f ðxÞ
fMðxÞdx P
12
Z 1
0jfMðxÞ � f ðxÞjdx
� �2
and thenZ 1
0jfMðxÞ � f ðxÞjdx 6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ
q:
Comparing lj(f) and lj(fM), j = M + 1, . . . ,2M, calling Dlj = lj(fM) � lj(f) and taking into account Theorem 1, one has
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5785
Author's personal copy
jDljj ¼ jljðfMÞ � ljðf Þj ¼Z 1
0xj fMðxÞ � f ðxÞð Þdx
�������� 6
Z 1
0xjjfMðxÞ � f ðxÞjdx 6
Z 1
0jfMðxÞ � f ðxÞjdx
6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ
q! 0: ð2:14Þ
Then lj(fM) closely approximates lj(f), j = M + 1, . . . ,2M, as M increases. For high M values, after replacing lj(fM) with lj(f),j = M + 1, . . . ,2M, the linear system (2.9) continues to admit solutions as a consequence of Theorem 2.
From a combined use of (2.9) and (2.10) we have as follows. We call.
(i) H½fM� ¼ k0 þPM
j¼1kjlj the exact entropy of fM(x), where kj are obtained from (2.9) involving flj � ljðfMÞg2Mj¼Mþ1;
(ii) HðappÞ½fM� ¼ ðk0 þ Dk0Þ þPM
j¼1ðkj þ DkjÞlj the approximate entropy of fM(x), where kj + Dkj are obtained from (2.9)
involving flj � ljðf Þg2Mj¼Mþ1 and k0 + Dk0 from (2.10);
(iii) error DH = H[fM] � H(app)[fM].
From Theorem 2, as M !1; fljðfMÞ ¼ ljðf Þg2Mj¼Mþ1 and then H[fM] = H(app)[fM]. From Theorem 1, as M ?1, H[fM] = H[f] and
then both H[fM] = H(app)[fM] and H[f] = H(app)[fM]. Equivalently, H[f] is evaluated using directly fljðf Þg2Mj¼1. The computational
cost amounts to solving a linear system (2.9) and a definite integral (2.10).
Remark 2.3. Relationships (2.9) and (2.10) are well known but unfortunately are used in an incorrect way by some authors.
Such relationships have been considered as a short way to calculate Lagrange multipliers fkjgMj¼1 through the solution of a linear
system avoiding the computational burden represented by (2.4). With such an improper procedure one replaces the sequence
fljðfMÞg2Mj¼Mþ1 (which will is known only after calculating fkjgM
j¼1 through (2.4)) with fljðf Þg2Mj¼Mþ1. Nevertheless, for high M
values (2.9) and (2.10) may be considered as a valuable tool to calculating fkjgMj¼1 solving a linear system and a definite integral,
without solving nonlinear optimization problem (2.4). Indeed, the replacing fljðfMÞg2Mj¼Mþ1 with fljðf Þg
2Mj¼Mþ1 becomes
meaningful as fM(x) = f(x) a.e., in virtue of the entropy convergence theorem. At the end, the entropy H½f � ¼ H½fM� ¼k0 þ
PMj¼1kjlj can be found in terms of fljðf Þg
2Mj¼1 only.
2.1. DH estimate
In the special case when the convergence of H[fM] to H[f] is fast, an estimate of DH = H[fM] � H(app)[fM] may be obtained asfollows.
Let us consider (2.9). If flj � ljðfMÞg2Mj¼Mþ1 are involved, then (2.9) may be written in matrix form as D2M � k = b, with
b ¼
1� 2l11� 3l2
..
.
1� ðM þ 1ÞlM
26664
37775; D2M ¼
l1 � l2 � � � lM � lMþ1
..
. ... ..
.
l1 � lMþ1 � � � lM � l2M
264
375diagð1;2; . . . ;MÞ:
If flj � ljðf Þg2Mj¼Mþ1 are involved, then (2.9) may be written as (D2M + d)(k + Dk) = b, where the matrix d is obtained as a dif-
ference of matrices whose entries are given by lj(fM) and lj(f), respectively d = D2M(lj � lj(fM)) � D2M(lj � lj(f)), j = 1, . . . ,2Mand taking into account lj(fM) = lj(f), j = 1, . . . ,M. Then d is an Hankel matrix with on and lower antidiagonal entries given byDlj = lj(fM) � lj(f), j = 1, . . . ,M. From (2.14) it follows kdk1 6 M
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ
pand from k ¼ D�1
2Mb and k + Dk = (D2M + d)�1bit follows
DH ¼ H½fM� � HðappÞ½fM � ¼ k0 þXM
j¼1
kjlj � ðk0 þ Dk0Þ �XM
j¼1
ðkj þ DkjÞlj ¼ �Dk0 �XM
j¼1
Dkjlj
and
jDHj ¼ jH½fM � � HðappÞ½fM�j 6 jDk0j þXM
j¼1
jDkjjlj 6 jDk0j þXM
j¼1
jDkjj ¼ jDk0j þ kkk1:
It remains to evaluate jDk0j and kkk1. If the convergence of H[fM] to H[f] is fast, so that from (2.14)
kD�12Mdk1 6 kD
�12Mk1kdk1 ¼ kD
�12Mk1M
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM� � H½f �Þ
p� 1 and as well as ðI þ D�1
2MdÞ�1 ’ �D�12Md may be assumed, then it holds
kDkk1 ¼ kðkþ DkÞ � kk1 ¼ kðD2M þ dÞ�1b� D�12Mbk1 6 kðD2M þ dÞ�1 � D�1
2Mk1kbk1 ¼ k½D2MðI þ D�12MdÞ��1k1kbk1
6 kðI þ D�12MdÞ�1k1kD
�12Mk1kbk1 ’ k � D�1
2Mdk1kD�12Mk1kbk1 6 kdk1kD
�12Mk
21kbk1
6 M2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ
qkD�1
2Mk21kbk1 ð2:15Þ
5786 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
(last inequality takes into account both (2.14) and the bound of kdk1 previously calculated).For what Dk0 is concerned, from (2.13) we have
Dk0 ¼ ðk0 þ Dk0Þ � k0 ¼ ln
RD exp �
PMj¼1ðkj þ DkjÞxj
!dx
RD exp �
PMj¼1
kjxj
!dx
¼ lnZ
Dexp �k0 �
XM
j¼1
ðkj þ DkjÞxj
!dx
¼ lnZ
Dexp �k0 �
XM
j¼1
kjxj
!exp �
XM
j¼1
Dkjxj
!dx ’ ln
ZD
1�XM
j¼1
Dkjxj
!exp �k0 �
XM
j¼1
kjxj
!
¼ ln 1�XM
j¼1
Dkjlj
!’ �
XM
j¼1
Dkjlj;
from which
jDk0j 6XM
j¼1
jDkjjlj 6 kDkk1:
Then from (2.15) one obtains
jDHj 6 jDk0j þ kDkk1 6 2kDkk1 6 2M2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ðH½fM � � H½f �Þ
qkD�1
2Mk21kbk1;
the wanted bound for DH.
3. Numerical results
In order to investigate the numerical consistency of the proposed procedure we apply it to several cases in which the tar-
get pdf is known and then any number of its moments can be calculated, as well as its entropy. Once assigned flj � ljðf Þg2Mj¼1,
for increasing M values, H(app)[fM] is obtained from (2.9) and (2.10). From (2.5) and Theorem 2, H[f] = H[fM] = H(app)[fM] holdstrue for large M values. As H(app)[fM] is obtained through a simplified procedure, which turns out true for high M values only,we expect that, for some low M values, H(app)[fM] < H[f] or the sequence {H(app)[fM]}, M = 2,3, . . . fails to be monotonic decreas-ing without contradicting the MaxEnt procedure. Then the asymptotic value of H(app)[fM] provides the wanted value H[f].
1. HðappÞ½fM � ¼ k0 þPM
j¼1kjlj, with fkjgMj¼1 and k0 given by (2.9) and (2.10) respectively, using flj � ljðf Þg
2Mj¼1;
2. H[fM] obtained from fMðxÞ ¼ exp �PM
j¼0kjxj�
, whose Lagrange multipliers fkjgMj¼0 are calculated by (2.4);
3. exact entropy H[f];
4. relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� and H½fM ��H½f �H½f �
��� ���.All the computations have been done by high numerical precision, using Maple.
3.1. Preconditioner of D2M
For what concerns the numerical solution of (2.9), let us consider the matrix
D2M ¼
l1 � l2 � � � lM � lMþ1
..
. ... ..
.
l1 � lMþ1 � � � lM � l2M
26664
37775 ¼
1 � � � 1
..
. ... ..
.
1 � � � 1
2664
3775diagðl1; . . . ;lMÞ �
l2 � � � lMþ1
..
. ... ..
.
lMþ1 � � � l2M
26664
37775 ¼ Dð1Þ2M � Dð2Þ2M ð3:1Þ
splitted in two matrices, Dð1Þ2M and Dð2Þ2M; the last one is an Hankel matrix (for brevity D2M maintains same name as the matrixinvolved in (2.9)). Now D2M is a moments matrix relating Lagrange multipliers k with moments fljðfMÞg2M
j¼1 and it is ill-con-ditioned, with condition number comparable to that of Dð2Þ2M . This severe ill-conditioning motivates the search for a precon-ditioner for D2M that can reduce the illness. For instance, a comparison between Cond(D2M) and CondðDð2Þ2MÞ is illustrated inFig. 1 in order to prove the conjecture that both matrices have a comparable condition number, since both are momentsmatrices (here the involved moments come from f(x) which turns out used in next Example 4; the ratio
CondðD2MÞ=CondðDð2Þ2MÞ� 1=M
is reported, according to (3.2) below). Even if asymptotically CondðD2MÞ=CondðDð2Þ2MÞ� 1=M
! 1,a good preconditioner for Dð2Þ2M not necessarily is a good preconditioner for D2M.
It remains to estimate the condition number of Dð2Þ2M . A result in [10] (Theorem 3.2) states Hankel matrices Dð2Þ2M generatedby moments of a strictly positive weight function with support the interval [0,1] asymptotically are conditioned in the sameway as Hilbert matrices HM of the equal size, i.e.
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5787
Author's personal copy
limM!1
CondðDð2Þ2MÞ� 1=M
¼ limM!1ðCondðHMÞÞ1=M ’ e3:525: ð3:2Þ
Then Hilbert matrix HM is a good preconditioner for this class of matrices [10] (Theorem 3.3), as the conditioning of the pre-
conditioned matrix grows at most linearly with M. It must be also observed that the matrix Dð2Þ2M , whose first entry is given byl2, may be considered as an Hankel matrix generated by moments of a positive weight function x2f(x), with pdf f(x). Then
Dð2Þ2M , up the multiplicative constant, may be considered a moments matrix of a pdf, with condition number comparable with
a Hilbert matrix of the same size. Then it is meaningful the use of HM as a preconditioner of Dð2Þ2M and then of D2M in (2.9).Thus we can indirectly solve (2.9), written as D2Mk = b, by solving the preconditioned system H�1
M D2Mk ¼ H�1M b or, equiv-
alently CMD2MC0My ¼ CMb, where C0MCM ¼ H�1M and C0My ¼ k and C0 stands for transposed of C. Here CM = [cij] is the inverse of
the lower triangular Cholesky factor of the Hilbert matrix HM with entries cij ¼ ð2iþ 1Þ1=2ð�1Þj ij
� �iþ j
j
� �, i,
j = 0,1, . . . ,M � 1 ([21] for more details). By solving the preconditioned system higher M values may be taken into accountwith consequent improved estimate of H[f].
A comparison between condition number of D2M given by (2.9) and preconditioned matrix CMD2MC0M , for increasing M, isillustrated in Fig. 2 (some experiments with different f(x) provide comparable results). Here the entries of D2M are the mo-ments of f(x) used in Example 4. The improvement of conditioning is not trivial.
Remark 3.1. In [10, Theorem 3.2] strictly positive weight functions f(x) with support [0,1] are considered, i.e., f(x) P n > 0,"x 2 [0,1]. Such special weight functions are conditioned as same order Hilbert matrices. On the other hand in [15] it wasproved each Hankel matrix D2M generated by moments of MaxEnt density fM(x) has condition number satisfyingCond(D2M) P 24M�2. Taking into account Theorems 1 and 2 we may conclude that each Hankel matrix generated bymoments of a density f(x), with f(x) P 0, "x 2 [0,1] has condition number asymptotically satisfying Cond(D2M) P 24M�2.Then, in some sense, results in [10, Theorem 3.3] may be extended to non negative weight functions f(x) P 0 and then HM
may be used as a preconditioner in all next examples.
3.2. Numerical examples
All the below considered densities have support [0,1] and all next examples are borrowed from [20]. Each chosen pdf issuch that its moments are available in closed form involving some special functions such that.
2 3 4 5 6 7 8 9 101
1.1
1.2
1.3
1.4
M
Con
ditio
n ra
tio
Fig. 1. Condition number ratio ðCondðD2MÞ=CondðDð2Þ2MÞÞ1=M for increasing M, from (3.1).
2 3 4 5 6 7 8 9 10 11100
1010
1020
M
Con
ditio
n N
umbe
r
Fig. 2. Condition number of D2M (⁄) and CMD2MC0M (+) for increasing M.
5788 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
1. Gamma function: CðaÞ ¼R1
0 ta�1e�tdt [20, 8.310, p. 892].2. Incomplete Gamma function: Cincðx; aÞ ¼ 1
CðaÞR x
0 ta�1e�tdt [20, 8.350, p. 899].
3. Beta function: Bða; bÞ ¼R 1
0 xa�1ð1� xÞb�1dx ¼ CðaÞCðbÞCðaþbÞ [20, 8.380, p. 908]; the change of variable x = sin2t yields the trigono-
metric representation of Beta function [20, 3.621.5, p. 395]R p=2
0 sina�1 t cosb�1 tdt ¼ 12 Bða=2; b=2Þ. With b = 1 and a integer
B(a,b) takes special expressions.4. Digamma function: wðxÞ ¼ d
dx ln CðxÞ [20, 8.360, p. 902].
Example 1. Let us consider the pdf
f ðxÞ ¼ Ae�xn; A ¼ n
C 1n
� �Cinc 1; 1
n
� � ; n 2 Rþ:
The change of variable xn = t allows us to calculate the above normalizing constant A, moments and entropy in terms ofIncomplete Gamma function
lj ¼C jþ1
n
� Cinc 1; jþ1
n
� C 1
n
� �Cincð1; 1
nÞ; j ¼ 0;1; . . .
(see also [20, 8.326.10, p. 337]) and entropy
H½f � ¼ � lnn
C 1n
� �Cinc 1; 1
n
� �" #
þC nþ1
n
� �Cinc 1; nþ1
n
� �C 1
n
� �Cinc 1; 1
n
� � :
The density f(x) is MaxEnt density with characterizing moment EðXnÞ. Then, whenever n 2 N, once assigned fljgnj¼0 MaxEnt
fnðxÞ ¼ exp �Pn
j¼0kjxj�
coincides with f(x) and H[fn] = H[f].
The example, where n = 15 is assumed and then H[f] ’ �0.00980278980538, has as a goal to prove experimentally (1.1)which states higher moments may be well approximated by a sequence of lower moments. Then the information content isspread among lower moments (see Fig. 3). Since H[f] ’ 0 the bound (1.2) becomes meaningful. The latter provides theestimate H[f] 6 �0.001987.
Example 2. Let us consider a Beta density
f ðxÞ ¼ 1Bðp; qÞ x
p�1ð1� xÞq�1 with p ¼ 4; q ¼ 2;
2 3 4 5 6 7 8 9 10 11
−9
−8
−7
−6x 10−3
M
−10
H(a
pp) [f M
]
2 3 4 5 6 7 8 9 10 1110−10
10−5
100
M
⏐ (H
(app
) [f M]−
H[f]
)/H[f]
⏐
(a)
(b)
Fig. 3. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� (⁄) and H½fM ��H½f �H½f �
��� ��� (�) for Example 1.
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5789
Author's personal copy
with moments [20, 3.199, p. 318] obtained directly in terms of Beta function
lj ¼Cðpþ qÞCðpþ jÞCðpþ qþ jÞCðpÞ ; j ¼ 0;1; . . .
and Shannon entropy given in terms of Beta and Digamma function
H½f � ¼ ln Bðp; qÞ � ðp� 1Þ½wðpÞ � wðpþ qÞ� � ðq� 1Þ½wðqÞ � wðpþ qÞ� ¼ ln 20þ 79=30 ’ �0:3623989402206575:
Now H(app)[fM], H[fM] for increasing M and H[f] are reported in Fig. 4. Here H[fM], M = 2, . . . ,8 is reported. Although H(app)[fM] becalculated through an approximate procedure, its value is more accurate than H[fM]. This seems reasonable: indeed H(app)[fM]involves 2M moments, whilst H[fM] involves the first M moments.
Example 3. Let us consider the density
f ðxÞ ¼ � 2p lnð2Þ
lnðxÞffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� x2p :
Starting from IðaÞ ¼R 1
0xaffiffiffiffiffiffiffiffi
1�x2p dx, from which d
da IðaÞ ¼R 1
0xa logðxÞffiffiffiffiffiffiffiffi
1�x2p dx. The change of variable x = sint leads to IðaÞ ¼
R p=20 sina tdt, so
that the wanted integral is given in term of Beta function derivatives. When a takes integer values, after some algebra oneobtains moments (see also [20], 4.241–2, p. 538)
l0 ¼ 1;
l2n ¼ �2
p lnð2Þ
2n
n
� �p
22nþ1 � lnð2Þ þX2n
k¼1
ð�1Þk�1
k
!; n ¼ 1;2; . . . ;
l2nþ1 ¼ �2
p lnð2Þ22n
ðnþ 1Þ2nþ 1
n
� � lnð2Þ þX2nþ1
k¼1
ð�1Þk
k
!; n ¼ 0;1;2; . . . ;
whilst the entropy, obtained numerically, is
H½f � ’ �0:3179752927304:
Numerical results are reported in Fig. 5 and all the considerations are analogous to Example 2.
2 4 6 8 10 12 14
−0.36
−0.35
−0.34
M
H(a
pp) [f M
]
2 4 6 8 10 12 1410−4
10−3
10−2
10−1
M
⏐ (H
(app
) [f M]−
H[f]
)/H[f]
⏐
(a)
(b)
Fig. 4. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� (⁄) and H½fM ��H½f �H½f �
��� ���(�) for Example 2.
5790 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
Example 4. Let us consider the density
f ðxÞ ¼ � 8pð1þ 2 lnð2ÞÞ lnðxÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� x2p
:
. Moments and entropy are obtained with procedure similar to previous example ([20], 4.241.3–4, p. 538)
l0 ¼ 1;
l2n ¼8
pð1þ 2 lnð2ÞÞ
2nn
� �p
22nþ2ðnþ 1Þlnð2Þ þ 1
2nþ 2þX2n
k¼1
ð�1Þk
k
!; n ¼ 1;2; . . . ;
l2nþ1 ¼ �8
pð1þ 2 lnð2ÞÞ22nþ1
ðnþ 1Þðnþ 2Þ2nþ 3nþ 1
� � lnð2Þ � 12nþ 3
þX2nþ1
k¼1
ð�1Þk
k
!; n ¼ 0;1;2; . . .
and
H½f � ’ �0:508931221953:
Numerical results are reported in Fig. 6 and all the considerations are analogous to Example 2.
Example 5. Let us consider the truncated Cauchy density
f ðxÞ ¼ 4pð1þ x2Þ :
Here odd and even lj are calculated through the stable recursive relationship
lj ¼4p
1j� 1
� lj�2; j P 2; l0 ¼ 1; l1 ¼2p
ln 2;
(closed formulas are in ([20], 2.146.1, p. 76)) whilst H[f] ’ �0.0215137302739. f(x) may be written as
f ðxÞ ¼ eln 4
pð1þx2 Þ
� ’ eln4
p�x2þx4=2�x6=3þ��� and then f(x) may be considered as MaxEnt distribution constrained by integer moments.It is reasonable expect few integer moments are able to catch the information content of f(x). Numerical results are reported
0 5 10 15 20−0.32
−0.31
−0.3
M
H(a
pp) [f M
]
0 5 10 15 2010−4
10−3
10−2
10−1
M
⏐ (H
(app
) [f M]−
H[f]
)/H[f]
⏐
(a)
(b)
Fig. 5. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� (⁄) and H½fM ��H½f �H½f �
��� ��� (�) for Example 3.
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5791
Author's personal copy
in Fig. 7; entropy decreasing is fast as M increases and the conjecture about the information content of f(x) is proved. HereMaxEnt distributions are calculated until M = 6 before incurring in numerical instability. Since H[f] is close to zero the bound(1.2) becomes meaningful. The latter provides the estimate H[f] 6 �0.003338.
0 2 4 6 8 10 12−0.022
−0.0215
−0.021
−0.0205
M
H(a
pp) [f M
]
0 2 4 6 8 10 1210−15
10−10
10−5
100
M
⏐ (H
(app
) [f M]−
H[f]
)/H[f]
⏐
(a)
(b)
Fig. 7. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� (⁄) and H½fM ��H½f �H½f �
��� ��� (�) for Example 5.
0 5 10 15 20−0.51
−0.5
−0.49
M
H(a
pp) [f M
]
0 5 10 15 2010−4
10−3
10−2
10−1
M
⏐ (H
(app
) [f M]−
H[f]
)/H[f]
⏐
(a)
(b)
Fig. 6. (a) H(app)[fM] (⁄), H[fM] (�) and exact entropy H[f] (dashed line); (b) Relative errors HðappÞ ½fM ��H½f �H½f �
��� ��� (⁄) and H½fM ��H½f �H½f �
��� ��� (�) for Example 4.
5792 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
Example 6. In this example we consider a pdf f(x) so that the procedure to estimate f(x) from fljg1j¼0 completely fails. Let us
consider the highly oscillating density
f ðxÞ ¼ A sin2ðpxxÞ; x� 1;
with A ¼ 21�sin 2px
2px. Taking into account well known relationships
sin2ðpxxÞ ¼ 12ð1� cosð2pxÞÞ and lim
x!1
Z 1
0f ðxÞ cosð2pxxÞdx ¼ 0;
for each piecewise continuous function f(x), we have as x ?1
A ¼ 2;
ln ¼ AZ 1
0xn sin2ðpxxÞdx ¼ A
2
Z 1
0xndx� A
2
Z 1
0xn cosð2pxxÞdx ¼ 1
nþ 1
(which coincides with the moments of the uniform distribution); and, after some algebra,
H½f � ¼ �Z 1
0A sin2ðpxxÞ lnðA sin2ðpxxÞÞdx ¼ �1þ ln 2 ’ �0:30685281944:
Then H[f] ’ �0.30685281944, while H(app)[fM] = H[fM] = 0,"M, since ln ¼ 1nþ1 implies that MaxEnt density fM(x) � 1, the uni-
form distribution and from (2.9) and (2.10) k1 = � � � = kM = 0, k0 = 1 holds. Then, if f(x) differs from a ‘‘reasonable density’’, theoutlined procedure may completely fail in evaluating H[f] from fljðf Þg
1j¼0. If several moments are unable to squeeze the rel-
evant part of information about f(x), it should mean that moments represent an unsuitable tool. As a consequence, we shouldturn our attention to more proper expected values.
Remark 3.2. Numerical results highlight
1. H(app)[fM] P H[f],"M;2. the sequence {H(app)[fM]} is monotonic decreasing.
Hence we prove the following assumptions through an heuristic reasoning:
(a) as M ?1, H(app)[fM] � H[fM] P H[f], being lj(fM) � lj(f), j = 1, . . . ,2M;(b) for each fixed M, H(app)[fM] takes into account lj(f), j = 1, . . . ,2M, while H(app)[fM�1] considers lj(f), j = 1, . . . ,2M � 2.
Then, according to physical meaning of H(app)[fM], it happens that H(app)[fM�1] > H(app)[fM] proving that the sequence{H(app)[fM]} is monotonic decreasing. Combining the previous two facts, we obtain the following criterion for the choiceof the number M of moments that yield the best approximation H(app)[fM] before incurring in numerical instability(equivalently, the sequence {H(app)[fM]} fails to be monotonic decreasing):
M � j : minj
HðappÞ½fM �:
The failure comes from an insufficient machine precision.
4. Conclusions and discussion
The problem of evaluating Shannon entropy of a distribution, whose only the sequence of integer moments is given, isconsidered. Shannon entropy is directly given in terms of integer moments without solving the underlying moment problem.Known results about both entropy-convergence of MaxEnt approximation and Kullback–Leibler distance allowed us to reachthe goal. Ill-conditioning issues affecting the linear system to be solved may be mitigated by a proper pre-conditioning tech-nique with Hilbert matrix as a preconditioner. Numerical results prove the moments are a suitable tool to gather the infor-mation content of a distribution, even if an accurate entropy estimate requires the involving a high number of moments. Thecommon opinion diffused in literature that the information content of a distribution is spread among the sequence of infinitemoments is now supported by numerical evidence. If, in the approximation framework, the discrepancy between underlyingdensity and its approximation is measured by Kullback–Leibler distance then the numerical tests allow us the followingconclusions.
(a) a high number of moments have to be involved with consequent issues of ill-conditioning;(b) more suitable expected values, as fractional moments EðXaÞ ¼
RD xaf ðxÞdx, a 2 R, have to be involved for an accurate
recovering. Two of the authors of this paper have suggested the use of fractional moments in some past papers[22,23]. Numerical results illustrated above support that theoretical conjecture.
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5793
Author's personal copy
(c) As a further alternative to integer moments in [24] the use of shifted Chebyshev polynomials TnðxÞ ¼ Tnð2x� 1Þ, withTn(x) = cos[ncos�1(x)] is suggested. More accurate results obtained, than integer moments, can be explained by takinginto account the superior minimax property of the Chebyshev polynomials and the consequent analytical form of the
MaxEnt solution fMðxÞ ¼ exp �PM
j¼0kjTj ðxÞ
� for Chebyshev moments. Indeed, in presence of the term kjx
j in the Max-
Ent solution (2.1) for integer moments makes it very difficult to exploit information from the higher order moments inthe interval [0,1]. Such an issue does not appear in the formulation of the problem via Chebyshev moments owing tothe bounded nature of Chebyshev polynomials and more regular sampling of the interval, and consequently provides away to incorporate systematically the information from the higher moments in the reconstruction process.
The procedure outlined here above for the Hausdorff case might be likewise extended to Stieltjes and Hamburger cases.Indeed, in Stieltjes case (in Hamburger case the considerations are similar; even values of M are requested) higher momentslj(fM), j > M may be found in terms of fljg
Mj¼1 and fkjgM
j¼1 integrating by parts (2.2) (here D � Rþ) with the following result
XM
k¼1
kkklkþj ¼ ð1þ jÞlj; j ¼ 0; . . . ;M � 1 ð4:1Þ
and
k0 ¼ lnZ 1
0exp �
XM
k¼1
kkxk
!dx: ð4:2Þ
In analogy with Hausdorff case, MaxEnt density fM converges in entropy to f [25] and then in directed divergence, so thatlimM?1fM(x) = f(x) almost everywhere. Then, starting from high values of M we may assume lk+j(fM) = lk+j(f), for each j, k,with j + k P M, where lk+j(f) are known quantities. (4.1) may be considered a linear system which provides (k1, . . . ,kM).The linear system (4.1) admits solution, being the involved matrix an Hankel matrix. Nevertheless two main issues are stillopen.
1. A suitable preconditioner for Hankel matrices coming from Stieltjes (or Hamburger) moment problem is lacking in lit-erature. Then high M values cannot be used in (4.1).
2. How to guarantee kM P 0 in (4.1) to ensure integrability in (4.2).
The last drawback may be solved as follows. An usual practice amounts to truncate the support Rþ, so that Stieltjesmoment problem is solved within a proper large enough interval [0,b] (b might be chosen according to Chebyshev inequality
PðjX � l1jP �Þ 6 l2�l21
�2 ), changing the original Stieltjes problem into Hausdorff moment problem (in such a case kM mayassume any real value). In MaxEnt setup the latter admits a solution for each sequence fljg
Mj¼1 inner to moment space
[26, Theorem 4]. On the contrary, in the Stieltjes case the moment space of Maxent fM is a proper subset of the moment spaceand then, for an arbitrary set fljg
Mj¼1, the existence of fM is based only on the numerical evidence [27, Theorem 3]. Then we
might be led to the unpleasant fact that a MaxEnt solution always exists, although the original Stieltjes moment problemdoesn’t admit any solution. In conclusion, the extension to Stieltjes (or Hamburger) case of the illustrated technique tocalculating the entropy directly from the moments sequence is not obvious.
References
[1] J.A. Shohat, J.D. Tamarkin, The Problem of Moments, Math. Surveys, vol. 1, American Mathematical Society, Providence, RI, 1970.[2] S. Karlin, L.S. Shapley, Geometry of Moment Spaces, AMS Memoirs 12, 1953, Providence, RI.[3] B.G. Lindsay, P. Basak, Moments determine the tail of a distribution (but not much else), Am. Statist. 54 (4) (2000) 248–251.[4] N.I. Akhieser, The Classical Moment Problem and Some Related Questions in Analysis, Hafner, New York, 1965.[5] M. Goria, A. Tagliani, Bounds on the tail probability and absolute difference between two distributions, Commun. Statist. Theory Methods 32 (2003)
519–532.[6] P.N. Gavriliadis, G.A. Athanassoulis, Moment information for probability distributions, without solving the moment problem. II: Main-mass, tails and
shape approximation, J. Comput. Appl. Math. 229 (2009) 7–15.[7] P.N. Gavriliadis, Moment information for probability distributions, without solving the moment problem. I: Where is the mode?, Commun Statist.
Theory Methods 37 (2008) 671–681.[8] R.M. Mnatsakanov, Hausdorff moment problem: reconstruction of probability density functions, Statist. Probab. Lett. 78 (2008) 1869–1877.[9] R.M. Mnatsakanov, Hausdorff moment problem: reconstruction of distributions, Statist. Probab. Lett. 78 (2008) 1612–1618.
[10] D. Fasino, Spectral properties of Hankel matrices and numerical solutions of finite moment problems, J. Comput. Appl. Math. 65 (1995) 145–155.[11] B. Beckermann, The condition number of real Vandermonde, Krylov and positive definite Hankel matrices, Numer. Math. 85 (2000) 553–577.[12] E.E. Tyrtyshnikov, How bad are Hankel matrices?, Numer Math. 67 (1994) 261–269.[13] H.K. Kesavan, J.N. Kapur, Entropy Optimization Principles with Applications, Academic Press, London, 1992.[14] A. Tagliani, Entropy estimate of probability densities having assigned moments: Hausdorff case, Appl. Math. Lett. 15 (2002) 309–314.[15] A. Tagliani, Hausdorff moment problem and maximum entropy: a unified approach, Appl. Math. Comput. 105 (1999) 291–305.[16] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley-Interscience, Hoboken, New Jersey, 2006.[17] E.V. Volpe, D. Baganoff, Maximum entropy PDFs and the moment problem under near-Gaussian conditions, Probab. Eng. Mech. 18 (2003) 17–29.[18] E.V. Volpe, Closure relations as determined by the maximum entropy method and near-equilibrium conditions, Ph.D. Thesis, 2000, Stanford University.[19] S. Kullback, Information Theory and Statistics, Dover, New York, 1968.[20] I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products, in: A. Jeffrey, D. Zwillinger (Eds.), Academic Press, New York, 2007.
5794 M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795
Author's personal copy
[21] G. Talenti, Recovering a function from a finite number of moments, Inverse Probl. 3 (1987) 501–517.[22] P.L. Novi Inverardi, A. Petri, G. Pontuale, A. Tagliani, Hausdorff moment problem via fractional moments, Appl. Math. Comput. 144 (1) (2003) 61–74.[23] H. Gzyl, A. Tagliani, Hausdorff moment problem and fractional moments, Appl. Math. Comput. 216 (2010) 3319–3328.[24] P. Biswas, H. Shimoyama, L.R. Mead, Lyapunov exponents and the natural invariant density determination of chaotic maps: an iterative maximum
entropy ansatz, J. Phys. A – Math. Theory 43 (2010) 1–12.[25] M. Frontini, A. Tagliani, Entropy-convergence in Stieltjes and Hamburger moment problem, Appl. Math. Comput. 88 (1997) 39–51.[26] M. Frontini, A. Tagliani, Hausdorff moment problem and maximum entropy: on the existence conditions, Appl. Math. Comput. 218 (2011) 430–433.[27] A. Tagliani, Maximum entropy solutions in unbounded domains, Appl. Math. Lett. 16 (4) (2003) 519–552.
M. Milev et al. / Applied Mathematics and Computation 218 (2012) 5782–5795 5795