bayesian optimum designs for discriminating between models with any distribution
TRANSCRIPT
Bayesian optimal designs for discriminating
between non-Normal models
López-Fidalgo, Jesús Tommasi, Chiara
14th May 2007
Abstract
Designs are found for discriminating between two non-Normal models in
the presence of prior information. The KL-optimality criterion, where the
true model is assumed to be completely known, is extended to a criterion
where prior distributions of the parameters and a prior probability of each
model to be true are assumed. Concavity of this criterion is proved. Thus,
the results of optimal design theory apply in this context and optimal designs
can be constructed and checked by the General Equivalence Theorem. Some
illustrative examples are provided.
Keywords: Equivalence theorem, Gamma distribution, Weibull distribution, KL-
optimum design.
1 Introduction
In order to discriminate between two rival regression models Atkinson and Fedorov
(1975a) propose the well known T-optimality criterion. The same authors generalize
this criterion to the case of more than two rival regression models (Atkinson and
Fedorov, 1975b). The T-optimum design is only locally optimum since it depends on
the nominal values of the parameters of the true model. Ponce de Leon and Atkinson
(1991a) generalize the T-optimality criterion using the Bayesian approach. More
specically, instead of assuming one of the models as the true model they consider
a prior distributions for the parameters of the models and prior probabilities for
1
each model to be true. Ponce de Leon and Atkinson (1991b) extend these ideas to
designs for discriminating between binary data models.
The T-optimality criterion can be used only for regression models. Ponce de
Leon and Atkinson (1992) extend this criterion to the case of generalized linear
models. López-Fidalgo, Tommasi and Trandar (2007) provide a new optimality
criterion based on the Kullback-Liebler distance, which is useful for discriminating
between any two statistical models. In a general context, a statistical model is a
parametric family of probability distributions of a response variable depending on
some explanatory variables. This new criterion is called KL-optimality and matches
the Tcriterion in the case of regression models and the generalization proposed by
Ponce de Leon and Atkinson (1992) in the case of generalized linear models. The
KL-optimality criterion is given by a concave design criterion function (Tommasi,
2007). Thus, standard results of optimum design theory apply in this context. In
general, the corresponding optimum design, i.e. the so called KL-optimum design,
is only locally optimum. In this paper this theory is extended to the case in which
there is a specied prior probability of each model to be true. Then, prior conditional
distributions for the parameters in the two models are also given. The corresponding
optimum design will be called Bayesian KL-optimum design. The motivation for the
Bayesian extension is that neither which model is true nor the parameters of the
assumed true model are known in practice. However, information is sometimes
available through a prior probability distribution.
The most important benet of the Bayesian approach is that optimal designs
no longer depend on the nominal values of the parameters of the true model but
only on their prior distribution. After having taken expectations over this prior,
standard optimization techniques may be used to nd the optimum designs. In this
paper an extension of the General Equivalence Theorem for KL-optimality given
by López-Fidalgo, Tommasi and Trandar (2007) is provided in order to check the
Bayesian KL-optimality of a design. This is given in Section 2.1.
2
In Section 2 an introduction to the problem is provided and the Bayesian KL-
optimality criterion is dened. In Section 3 two typical and illustrative examples are
described, which may be useful in reliability analysis. Finally, in Section 4 a brief
discussion is given.
2 Bayesian KL-optimum design
Following the same notation of Ponce De Leon and Atkinson (1991), let X be the
design region, let H be the class of all the discrete probability distributions on X
and let ξ ∈ H be a design measure. The statistical model is written as
fi(y, x, θi), x ∈ X , (i = 1, 2).
The true statistical model is one of the two known density functions f1(y, x, θ1)
and f2(y, x, θ2) with prior probabilities π1 and π2 = 1− π1, respectively. The set of
parameters θi ∈ Ωi ⊂ IRmi has a prior probability distribution pi(θi) (i = 1, 2).
If f1(y, x, θ1) is the true model then the Kullback-Leibler distance between f1(y, x, θ1)
and f2(y, x, θ2) is
I(f1, f2, x, θ2) =
∫f1(y, x, θ1) log
f1(y, x, θ1)
f2(y, x, θ2)dy, ∀x ∈ X , (1)
where log denotes the natural logarithm and the integral is computed on the sam-
ple space of the possible observations. Exchanging the role of f1(y, x, θ1) and
f2(y, x, θ2) in (1) the Kullback-Leibler distance between f2(y, x, θ2) and f1(y, x, θ1)
is I(f2, f1, x, θ1). The quantities
I21(ξ, θ1) = minθ2∈Ω2
∫XI(f1, f2, x, θ2) ξ(dx),
I12(ξ, θ2) = minθ1∈Ω1
∫XI(f2, f1, x, θ1) ξ(dx)
are the KL-criterion functions for the second model when the rst is true and vice
versa. These KL-criterion functions depend on the design and on the unknown pa-
rameters θ1 and θ2 respectively. Taking the expected value over the two models and
3
over the prior distributions of the parameters, the following Bayesian KL-optimality
criterion may be dened,
IB(ξ) = π1Eθ1 [I21(ξ, θ1)] + π2Eθ2 [I12(ξ, θ2)], (2)
where Eτ stands for the expectation according to a probability distribution of para-
meter τ . If it is assumed that the statistical model fi(y, x, θi) is true, but the only
information about the parameters θi is the prior distribution pi(θi), then a partially
Bayesian KL-optimality criterion may be denied,
IPBi (ξ) = Eθi
[Iii(ξ, θi)], i = 1, 2, (3)
where i denotes not i. If a design maximizes IB(ξ) it is called Bayesian KL-
optimum design otherwise, if it mazimizes IPBi (ξ) it is called partially Bayesian
KL-optimum design. Hereafter a partially Bayesian KL-optimum design will be
denoted by ξ∗PB.
Remark 1. The criterion function IB(ξ) is appropriate only if the statistical
models are not nested. If a model is nested within the other then at least one of
the quantities I21(ξ, θ1) and I12(ξ, θ2) is zero. In this case the largest model must
be assumed as the true model and only partially Bayesian optimum designs, in the
sense of (3), can be computed.
2.1 The Equivalence Theorem
López-Fidalgo, Tommasi and Trandar (2007) extend the General Equivalence The-
orem of Kiefer and Wolfowitz (1960) to the case of local KL-optimality. For checking
Bayesian KL-optimality of a given design, it is necessary to extend the theorem. In
what follows some background is given in order to state this generalization.
Because of the linearity of the criterion function IB(ξ) with respect to I21(ξ, θ1)
and I12(ξ, θ2) and because of their concavity on H (Tommasi, 2007), IB(ξ) is a
concave function on H.
4
A design ξ for which the sets
Ωi(ξ)=
θi : θi(ξ) = arg min
θi∈Ωi
∫XI [fi(y, x, θi), fi(y, x, θi)] ξ(dx)
, i = 1, 2 (4)
are singletons is called regular design, otherwise it is called singular design.
If ξ and ξ are any two designs, due to the linearity of IB(ξ) the directional
derivative of IB(ξ) at ξ in the direction of δξ = ξ − ξ is
∂IB(ξ, ξ) =2∑
i=1
πiEθi
[∂Iii(ξ, ξ, θi)
]. (5)
López-Fidalgo, Tommasi and Trandar (2007) prove that if ξ is a regular design
then,
∂Iii(ξ, ξ, θi) = limα→0+
Iii[(1− α)ξ + αξ]− Iii(ξ)
α=
∫Xψii(x, ξ, θi) ξ(dx), (6)
where the function
ψii(x, ξ, θi) = I [fi(y, x, θi), fi(y, x, θi)]−∫XI [fi(y, x, θi), fi(y, x, θi)] ξ(dx)
is the directional derivative of Iii(ξ) at ξ in the direction of δξx = ξx− ξ, with ξx the
design that puts the whole mass at point x and θi is the unique element of Ωi(ξ) as
dened by (4). From equation (6) the directional derivative (5) may be written as
∂IB(ξ, ξ) =
∫XψB(x, ξ) ξ(dx), (7)
where
ψB(x, ξ) =2∑
i=1
πiEθi[ψii(x, ξ, θi)]
is the directional derivative of IB(ξ) at ξ in the direction of δξx .
Under the assumption that the Bayesian KL-optimum design ξ∗B is regular, the
following theorem may be proved.
Theorem 1. Let ξ∗B be a regular design.
(i) The design ξ∗B is Bayesian KL-optimum if and only if
ψB(x, ξ∗B) ≤ 0, x ∈ X .
5
(ii) The function ψB(x, ξ∗B) achieves its maximum value at the points of the optimal
design support.
Proof.
(i) Since IB(ξ) is a concave function of ξ, a necessary and sucient condition for
the optimality of ξ∗B is ∂IB(ξ∗B, ξ) ≤ 0 for any design ξ. It may be easily proved
that
maxξ∂IB(ξ∗B, ξ) = max
xψB(x, ξ∗B)
thus, a necessary and sucient condition for the optimality of ξ∗B is
ψB(x, ξ∗B) ≤ 0, x ∈ X .
(ii) Let us assume there exists a subset X1 ⊂ supp(ξ∗B) and a scalar a such that∫X1
ψB(x, ξ∗B) ξ∗B(dx) ≤ a < 0
and ψB(x, ξ∗B) = 0 for x ∈ X\X1. This contradicts the fact that∫XψB(x, ξ∗B) ξ∗B(dx) = 0,
which is an obvious consequence of the (7).
3 Illustrative examples
In reliability analysis very often the same data set may be tted either to a Weibull
or a Gamma distribution with equal success. Thus, problems of model specication
could arise. Both distributions are typically used for skewed data (Kundu and Man-
glick, 2004). Furthermore, in the context of reliability and quality control historical
information this situation is frequent. If this information can be incorporated into a
prior probability distribution then a Bayesian approach may be used in the analysis.
In this example, Bayesian and partially Bayesian KL-optimum designs are found to
compare the Weibull probability density function (pdf),
fW (y; b, c) =c yc−1
bcexp
[−
(yb
)c], b > 0, c > 0
6
with the Gamma pdf,
fG(y; β, α) =
yα−1 exp
(− yβ
)βα Γ(α)
, β > 0, α > 0.
The modelization b = exp (λ1 x) and β = exp (γ1 x) will be considered. This para-
metrization assures b > 0 and β > 0 for any λ1, γ1 ∈ IR. With this notation
the Kullback-Leibler distances between the Gamma and the Weibull pdf's and vice
versa are
I(fG, fW , x, λ1, c) = −α γ1 x− log[Γ(α)] + c λ1 x− log c+ (α− c)
[Γ′(α)
Γ(α)+ γ1x
]− α+ exp[c(γ1 − λ1)x]
Γ(α+ c)
Γ(α)
and
I(fW , fG, x, γ1, α) = α γ1 x+ log[Γ(α)]− αλ1 x+ log c− 1.5772 +α
c0.5772
+ exp[(λ1 − γ1)x] Γ
(c+ 1
c
),
respectively.
In the latest computation the following argument has been used. If Y is a random
variable with a Weibull distribution as set above then X = log Ybfollows a Gumpel
distribution with parameters c−1 and 0. The expectation of this distribution is well
known and in this case becomes EW (X) = −γ/c, where γ ≈ 0.5772 is the Euler
number. Therefore EW (log Y ) = log b− γ/c.
Example 1. The experimental region will be the interval X = [1, 2]. The Bayesian
KL-optimum design depends on the prior probability of each model to be true and
on the prior probability distributions of the parameters of the two statistical models,
which will be assumed discrete distributions. More specically, π1 and π2 = 1− π1
denote the prior probabilities for the Gamma and the Weibull pdf's while p1 and p2
are independent discrete distributions of θ1 = (α, γ1) and θ2 = (c, λ1), respectively.
7
To study the eect of the prior information four dierent cases of uniform distribu-
tions are considered (Table 1). The Bayesian KL-optimum design computed using
Table 1: Four dierent probability distributions for θ1 = (α, γ1) and θ2 = (c, λ1).The prior probabilities for the true model are π1 = 0.4 and π2 = 0.6, respectively
Distribution (a) Distribution (c)
α γ1 p1 c λ1 p2 α γ1 p1 c λ1 p2
3 1 0.25 3 1 0.25 3 1 0.25 3 1 0.253 4 0.25 3 4 0.25 4 2 0.25 4 2 0.256 1 0.25 6 1 0.25 5 3 0.25 5 3 0.256 4 0.25 6 4 0.25 6 4 0.25 6 4 0.25
Distribution (b) Distribution (d)
α γ1 p1 c λ1 p2 α γ1 p1 c λ1 p2
3 1 0.125 3 1 0.125 3 1 0.125 3 1 0.1253 2 0.125 3 2 0.125 4 1 0.125 4 1 0.1253 3 0.125 3 3 0.125 5 1 0.125 5 1 0.1253 4 0.125 3 4 0.125 6 1 0.125 6 1 0.1256 1 0.125 6 1 0.125 3 4 0.125 3 4 0.1256 2 0.125 6 2 0.125 4 4 0.125 4 4 0.1256 3 0.125 6 3 0.125 5 4 0.125 5 4 0.1256 4 0.125 6 4 0.125 6 4 0.125 6 4 0.125
standard optimization techniques is the same for the distributions (a) and (b) and
for the distributions (c) and (d), respectively. More specically, for the distributions
(a) and (b) the optimum design is
ξ∗B =
1 2
0.6154 0.3846
,
while for the other two distributions the design is
ξ∗B =
1 2
0.6187 0.3813
.
Thus, the Bayesian KL-optimum design seems to be quite robust with respect to
the parameters γ1 and λ1 while it seems to be sensitive on α and c, i.e. the shape
parameters of the two pdf's.
8
Example 2. As pointed in Remark 1, when the models are nested then the largest
model (i.e. the model which includes the other one as a special case) is the true
model. In this case and whenever one of the two models is assumed to be the true
with parameters only known through a probability distribution, a partially Bayesian
KL-optimum design may be computed. This is an intermediate situation between
local KL-optimality (López-Fidalgo, Tommasi and Trandar, 2007) and Bayesian
KL-optimality described in Section 2. Partially Bayesian KL-optimality can be
seen as a special case of the problem discussed in Section 2. In order to compute
partially Bayesian KL-optimum designs Theorem 1 applies by setting to zero the
prior probability that the false model is true.
Dierently from the previous example the true statistical model is assumed to
be the Gamma distribution (i.e. π2 = 0). The design region considered is again
the interval X = [1, 2]. Table 2 lists three dierent probability distributions for the
parameter θ1 = (α, γ1).
Table 2: Three dierent distributions for the parameter θ1 = (α, γ1)
Distribution (a) Distribution (b) Distribution (c)
α γ1 p1 α γ1 p1 α γ1 p1
3 1 0.25 3 1 0.125 3 1 0.1253 4 0.25 3 2 0.125 4 1 0.1256 1 0.25 3 3 0.125 5 1 0.1256 4 0.25 3 4 0.125 6 1 0.125
6 1 0.125 3 4 0.1256 2 0.125 4 4 0.1256 3 0.125 5 4 0.1256 4 0.125 6 4 0.125
For distributions (a) and (b) the partially Bayesian KL-optimum design is the same,
ξ∗PB =
1 2
0.5454 0.4545
,
9
while for distribution (c) it changes to
ξ∗PB =
1 2
0.5537 0.4463
.
Thus, in this case the behavior of the partially Bayesian KL-optimum design is sim-
ilar to the behavior of the Bayesian KL-optimum design. The marginal distribution
of parameter α is the same for distributions (a) and (b) and dierent for distribu-
tion (c). Meanwhile, the marginal distribution of the parameter γ1 is the same for
distributions (a) and (c) and dierent for distribution (b). Therefore, the partially
Bayesian KL-optimum design seems to be robust with respect to the parameter γ1
and to be sensitive on the shape parameter α.
4 Discussion
In the literature several optimality criteria for discriminating between models have
been proposed. For instance, the D- and Ds−optimality criteria may be used for
discriminating purposes. A recent approach to this problem is the TE−criterion
proposed by Waterhouse, Woods, Eccleston and Lewis (2007). This new criterion
is based on the expected likelihood ratio test statistic, which is a measure of the
reduction in deviance of two nested models. The expectation is computed through a
simulation study. The TE−criterion is compared with the D-, T- and Ds−optimality
criteria through simulation studies of test power. From these comparisons, the
TE−optimum design seems to have a similar performance to the T-optimum design.
However, dierently from the T-criterion, nominal values of the parameters of the
largest model are not required any more. Thus, in some sense the TE−optimality
criterion overcomes the problem of the T-optimum design dependence on unknown
parameters, even if for the TE−optimum design suitable prior distributions for the
response at each support point are needed, for example, from scientic knowledge.
Other possibilities for overcoming the optimum design dependence on unknown
parameters could be the following pure Bayesian criteria,
10
1. E(θ1,θ2)[∫X I(f1, f2, x, θ2) ξ(dx)]
2. E(θ1,θ2)[∫X I(f2, f1, x, θ1) ξ(dx)]
3. π1E(θ1,θ2)[∫X I(f1, f2, x, θ2) ξ(dx)] + π2E(θ1,θ2)[
∫X I(f2, f1, x, θ1) ξ(dx)].
These criteria are based on the joint prior distribution of the parameters of both
models.
In addition, the following minimax criteria could be considered as well,
1. min(θ1,θ2)
∫X I(f1, f2, x, θ2) ξ(dx)
2. min(θ1,θ2)
∫X I(f2, f1, x, θ1) ξ(dx)
3. π1 min(θ1,θ2)
∫X I(f1, f2, x, θ2) ξ(dx) + π2 min(θ1,θ2)
∫X I(f2, f1, x, θ1) ξ(dx).
In cases 1. and 2. one of the two models needs to be assumed as the true model.
This is overtaken by case 3., where a prior distribution concerning the practitioner's
certainty of the true model is given.
Thus, further research is necessary to solve the problem of dependence of the
optimum design on unknown parameters. This paper provides a possible solution.
References
[1] Atkinson, A. C., Fedorov, V.V.: The designs of experiments for discriminating
between two rival models. Biometrika, 62, 5770 (1975a).
[2] Atkinson, A. C., Fedorov, V.V.: Optimal design: experiments for discriminating
between several models. Biometrika, 62, 289303 (1975b).
[3] Kiefer, J., Wolfowitz J.: The equivalence of two extremum problems. Canad.
J. Math. 12, 363366 (1960).
[4] Kundu, D., Manglick, A.: Discriminating between the Weibull and log-normal
distributions. Nav. Res. Log. 51(6), 893905 (2004).
11
[5] LópezFidalgo, J., Tommasi, C., Trandar, P.C.: An optimal experimental
design criterion for discriminating between non-Normal models. J.R.Statist.Soc
B, 69(2), 231242 (2007).
[6] Ponce de Leon, A.C, Atkinson, A.C.: Optimal experimental design for dis-
criminating between two rival models in the presence of prior information. Bio-
metrika, 78, 601608 (1991a).
[7] Ponce de Leon, A.C, Atkinson, A.C.: Optimal design for discriminating between
two rival binary data models in the presence of prior information. Probastat '91
(ed. Pázman and Voluafová), 123-129. Bratislav: Comenius University (1991b).
[8] Ponce de Leon, A.C, Atkinson, A.C.: The design of experiments to discriminate
between two rival generalized linear models. Lecture Notes in Statistics - Ad-
vances in GLM and Statistical Modelling. Springer-Verlag, New York, 159-164
(1992).
[9] Tommasi, C.: Optimal designs for discriminating among several non-Normal
models. In López-Fidalgo, J., Rodríguez-Díaz, J.M., Torsney, B, (eds) Ad-
vances in Model-Oriented Design and Analysis mODa 8. Physica-Verlag, 213
220 (2007).
[10] Waterhouse, T.H., Woods, D.C., Eccleston, J.A., Lewis, S.M.: Design selec-
tion criteria for discrimination/estimation for nested models and a binomial
response. JSPI (2007), In press.
12