bayesian optimum designs for discriminating between models with any distribution

12

Upload: unimi

Post on 10-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Bayesian optimal designs for discriminating

between non-Normal models

López-Fidalgo, Jesús Tommasi, Chiara

14th May 2007

Abstract

Designs are found for discriminating between two non-Normal models in

the presence of prior information. The KL-optimality criterion, where the

true model is assumed to be completely known, is extended to a criterion

where prior distributions of the parameters and a prior probability of each

model to be true are assumed. Concavity of this criterion is proved. Thus,

the results of optimal design theory apply in this context and optimal designs

can be constructed and checked by the General Equivalence Theorem. Some

illustrative examples are provided.

Keywords: Equivalence theorem, Gamma distribution, Weibull distribution, KL-

optimum design.

1 Introduction

In order to discriminate between two rival regression models Atkinson and Fedorov

(1975a) propose the well known T-optimality criterion. The same authors generalize

this criterion to the case of more than two rival regression models (Atkinson and

Fedorov, 1975b). The T-optimum design is only locally optimum since it depends on

the nominal values of the parameters of the true model. Ponce de Leon and Atkinson

(1991a) generalize the T-optimality criterion using the Bayesian approach. More

specically, instead of assuming one of the models as the true model they consider

a prior distributions for the parameters of the models and prior probabilities for

1

each model to be true. Ponce de Leon and Atkinson (1991b) extend these ideas to

designs for discriminating between binary data models.

The T-optimality criterion can be used only for regression models. Ponce de

Leon and Atkinson (1992) extend this criterion to the case of generalized linear

models. López-Fidalgo, Tommasi and Trandar (2007) provide a new optimality

criterion based on the Kullback-Liebler distance, which is useful for discriminating

between any two statistical models. In a general context, a statistical model is a

parametric family of probability distributions of a response variable depending on

some explanatory variables. This new criterion is called KL-optimality and matches

the Tcriterion in the case of regression models and the generalization proposed by

Ponce de Leon and Atkinson (1992) in the case of generalized linear models. The

KL-optimality criterion is given by a concave design criterion function (Tommasi,

2007). Thus, standard results of optimum design theory apply in this context. In

general, the corresponding optimum design, i.e. the so called KL-optimum design,

is only locally optimum. In this paper this theory is extended to the case in which

there is a specied prior probability of each model to be true. Then, prior conditional

distributions for the parameters in the two models are also given. The corresponding

optimum design will be called Bayesian KL-optimum design. The motivation for the

Bayesian extension is that neither which model is true nor the parameters of the

assumed true model are known in practice. However, information is sometimes

available through a prior probability distribution.

The most important benet of the Bayesian approach is that optimal designs

no longer depend on the nominal values of the parameters of the true model but

only on their prior distribution. After having taken expectations over this prior,

standard optimization techniques may be used to nd the optimum designs. In this

paper an extension of the General Equivalence Theorem for KL-optimality given

by López-Fidalgo, Tommasi and Trandar (2007) is provided in order to check the

Bayesian KL-optimality of a design. This is given in Section 2.1.

2

In Section 2 an introduction to the problem is provided and the Bayesian KL-

optimality criterion is dened. In Section 3 two typical and illustrative examples are

described, which may be useful in reliability analysis. Finally, in Section 4 a brief

discussion is given.

2 Bayesian KL-optimum design

Following the same notation of Ponce De Leon and Atkinson (1991), let X be the

design region, let H be the class of all the discrete probability distributions on X

and let ξ ∈ H be a design measure. The statistical model is written as

fi(y, x, θi), x ∈ X , (i = 1, 2).

The true statistical model is one of the two known density functions f1(y, x, θ1)

and f2(y, x, θ2) with prior probabilities π1 and π2 = 1− π1, respectively. The set of

parameters θi ∈ Ωi ⊂ IRmi has a prior probability distribution pi(θi) (i = 1, 2).

If f1(y, x, θ1) is the true model then the Kullback-Leibler distance between f1(y, x, θ1)

and f2(y, x, θ2) is

I(f1, f2, x, θ2) =

∫f1(y, x, θ1) log

f1(y, x, θ1)

f2(y, x, θ2)dy, ∀x ∈ X , (1)

where log denotes the natural logarithm and the integral is computed on the sam-

ple space of the possible observations. Exchanging the role of f1(y, x, θ1) and

f2(y, x, θ2) in (1) the Kullback-Leibler distance between f2(y, x, θ2) and f1(y, x, θ1)

is I(f2, f1, x, θ1). The quantities

I21(ξ, θ1) = minθ2∈Ω2

∫XI(f1, f2, x, θ2) ξ(dx),

I12(ξ, θ2) = minθ1∈Ω1

∫XI(f2, f1, x, θ1) ξ(dx)

are the KL-criterion functions for the second model when the rst is true and vice

versa. These KL-criterion functions depend on the design and on the unknown pa-

rameters θ1 and θ2 respectively. Taking the expected value over the two models and

3

over the prior distributions of the parameters, the following Bayesian KL-optimality

criterion may be dened,

IB(ξ) = π1Eθ1 [I21(ξ, θ1)] + π2Eθ2 [I12(ξ, θ2)], (2)

where Eτ stands for the expectation according to a probability distribution of para-

meter τ . If it is assumed that the statistical model fi(y, x, θi) is true, but the only

information about the parameters θi is the prior distribution pi(θi), then a partially

Bayesian KL-optimality criterion may be denied,

IPBi (ξ) = Eθi

[Iii(ξ, θi)], i = 1, 2, (3)

where i denotes not i. If a design maximizes IB(ξ) it is called Bayesian KL-

optimum design otherwise, if it mazimizes IPBi (ξ) it is called partially Bayesian

KL-optimum design. Hereafter a partially Bayesian KL-optimum design will be

denoted by ξ∗PB.

Remark 1. The criterion function IB(ξ) is appropriate only if the statistical

models are not nested. If a model is nested within the other then at least one of

the quantities I21(ξ, θ1) and I12(ξ, θ2) is zero. In this case the largest model must

be assumed as the true model and only partially Bayesian optimum designs, in the

sense of (3), can be computed.

2.1 The Equivalence Theorem

López-Fidalgo, Tommasi and Trandar (2007) extend the General Equivalence The-

orem of Kiefer and Wolfowitz (1960) to the case of local KL-optimality. For checking

Bayesian KL-optimality of a given design, it is necessary to extend the theorem. In

what follows some background is given in order to state this generalization.

Because of the linearity of the criterion function IB(ξ) with respect to I21(ξ, θ1)

and I12(ξ, θ2) and because of their concavity on H (Tommasi, 2007), IB(ξ) is a

concave function on H.

4

A design ξ for which the sets

Ωi(ξ)=

θi : θi(ξ) = arg min

θi∈Ωi

∫XI [fi(y, x, θi), fi(y, x, θi)] ξ(dx)

, i = 1, 2 (4)

are singletons is called regular design, otherwise it is called singular design.

If ξ and ξ are any two designs, due to the linearity of IB(ξ) the directional

derivative of IB(ξ) at ξ in the direction of δξ = ξ − ξ is

∂IB(ξ, ξ) =2∑

i=1

πiEθi

[∂Iii(ξ, ξ, θi)

]. (5)

López-Fidalgo, Tommasi and Trandar (2007) prove that if ξ is a regular design

then,

∂Iii(ξ, ξ, θi) = limα→0+

Iii[(1− α)ξ + αξ]− Iii(ξ)

α=

∫Xψii(x, ξ, θi) ξ(dx), (6)

where the function

ψii(x, ξ, θi) = I [fi(y, x, θi), fi(y, x, θi)]−∫XI [fi(y, x, θi), fi(y, x, θi)] ξ(dx)

is the directional derivative of Iii(ξ) at ξ in the direction of δξx = ξx− ξ, with ξx the

design that puts the whole mass at point x and θi is the unique element of Ωi(ξ) as

dened by (4). From equation (6) the directional derivative (5) may be written as

∂IB(ξ, ξ) =

∫XψB(x, ξ) ξ(dx), (7)

where

ψB(x, ξ) =2∑

i=1

πiEθi[ψii(x, ξ, θi)]

is the directional derivative of IB(ξ) at ξ in the direction of δξx .

Under the assumption that the Bayesian KL-optimum design ξ∗B is regular, the

following theorem may be proved.

Theorem 1. Let ξ∗B be a regular design.

(i) The design ξ∗B is Bayesian KL-optimum if and only if

ψB(x, ξ∗B) ≤ 0, x ∈ X .

5

(ii) The function ψB(x, ξ∗B) achieves its maximum value at the points of the optimal

design support.

Proof.

(i) Since IB(ξ) is a concave function of ξ, a necessary and sucient condition for

the optimality of ξ∗B is ∂IB(ξ∗B, ξ) ≤ 0 for any design ξ. It may be easily proved

that

maxξ∂IB(ξ∗B, ξ) = max

xψB(x, ξ∗B)

thus, a necessary and sucient condition for the optimality of ξ∗B is

ψB(x, ξ∗B) ≤ 0, x ∈ X .

(ii) Let us assume there exists a subset X1 ⊂ supp(ξ∗B) and a scalar a such that∫X1

ψB(x, ξ∗B) ξ∗B(dx) ≤ a < 0

and ψB(x, ξ∗B) = 0 for x ∈ X\X1. This contradicts the fact that∫XψB(x, ξ∗B) ξ∗B(dx) = 0,

which is an obvious consequence of the (7).

3 Illustrative examples

In reliability analysis very often the same data set may be tted either to a Weibull

or a Gamma distribution with equal success. Thus, problems of model specication

could arise. Both distributions are typically used for skewed data (Kundu and Man-

glick, 2004). Furthermore, in the context of reliability and quality control historical

information this situation is frequent. If this information can be incorporated into a

prior probability distribution then a Bayesian approach may be used in the analysis.

In this example, Bayesian and partially Bayesian KL-optimum designs are found to

compare the Weibull probability density function (pdf),

fW (y; b, c) =c yc−1

bcexp

[−

(yb

)c], b > 0, c > 0

6

with the Gamma pdf,

fG(y; β, α) =

yα−1 exp

(− yβ

)βα Γ(α)

, β > 0, α > 0.

The modelization b = exp (λ1 x) and β = exp (γ1 x) will be considered. This para-

metrization assures b > 0 and β > 0 for any λ1, γ1 ∈ IR. With this notation

the Kullback-Leibler distances between the Gamma and the Weibull pdf's and vice

versa are

I(fG, fW , x, λ1, c) = −α γ1 x− log[Γ(α)] + c λ1 x− log c+ (α− c)

[Γ′(α)

Γ(α)+ γ1x

]− α+ exp[c(γ1 − λ1)x]

Γ(α+ c)

Γ(α)

and

I(fW , fG, x, γ1, α) = α γ1 x+ log[Γ(α)]− αλ1 x+ log c− 1.5772 +α

c0.5772

+ exp[(λ1 − γ1)x] Γ

(c+ 1

c

),

respectively.

In the latest computation the following argument has been used. If Y is a random

variable with a Weibull distribution as set above then X = log Ybfollows a Gumpel

distribution with parameters c−1 and 0. The expectation of this distribution is well

known and in this case becomes EW (X) = −γ/c, where γ ≈ 0.5772 is the Euler

number. Therefore EW (log Y ) = log b− γ/c.

Example 1. The experimental region will be the interval X = [1, 2]. The Bayesian

KL-optimum design depends on the prior probability of each model to be true and

on the prior probability distributions of the parameters of the two statistical models,

which will be assumed discrete distributions. More specically, π1 and π2 = 1− π1

denote the prior probabilities for the Gamma and the Weibull pdf's while p1 and p2

are independent discrete distributions of θ1 = (α, γ1) and θ2 = (c, λ1), respectively.

7

To study the eect of the prior information four dierent cases of uniform distribu-

tions are considered (Table 1). The Bayesian KL-optimum design computed using

Table 1: Four dierent probability distributions for θ1 = (α, γ1) and θ2 = (c, λ1).The prior probabilities for the true model are π1 = 0.4 and π2 = 0.6, respectively

Distribution (a) Distribution (c)

α γ1 p1 c λ1 p2 α γ1 p1 c λ1 p2

3 1 0.25 3 1 0.25 3 1 0.25 3 1 0.253 4 0.25 3 4 0.25 4 2 0.25 4 2 0.256 1 0.25 6 1 0.25 5 3 0.25 5 3 0.256 4 0.25 6 4 0.25 6 4 0.25 6 4 0.25

Distribution (b) Distribution (d)

α γ1 p1 c λ1 p2 α γ1 p1 c λ1 p2

3 1 0.125 3 1 0.125 3 1 0.125 3 1 0.1253 2 0.125 3 2 0.125 4 1 0.125 4 1 0.1253 3 0.125 3 3 0.125 5 1 0.125 5 1 0.1253 4 0.125 3 4 0.125 6 1 0.125 6 1 0.1256 1 0.125 6 1 0.125 3 4 0.125 3 4 0.1256 2 0.125 6 2 0.125 4 4 0.125 4 4 0.1256 3 0.125 6 3 0.125 5 4 0.125 5 4 0.1256 4 0.125 6 4 0.125 6 4 0.125 6 4 0.125

standard optimization techniques is the same for the distributions (a) and (b) and

for the distributions (c) and (d), respectively. More specically, for the distributions

(a) and (b) the optimum design is

ξ∗B =

1 2

0.6154 0.3846

,

while for the other two distributions the design is

ξ∗B =

1 2

0.6187 0.3813

.

Thus, the Bayesian KL-optimum design seems to be quite robust with respect to

the parameters γ1 and λ1 while it seems to be sensitive on α and c, i.e. the shape

parameters of the two pdf's.

8

Example 2. As pointed in Remark 1, when the models are nested then the largest

model (i.e. the model which includes the other one as a special case) is the true

model. In this case and whenever one of the two models is assumed to be the true

with parameters only known through a probability distribution, a partially Bayesian

KL-optimum design may be computed. This is an intermediate situation between

local KL-optimality (López-Fidalgo, Tommasi and Trandar, 2007) and Bayesian

KL-optimality described in Section 2. Partially Bayesian KL-optimality can be

seen as a special case of the problem discussed in Section 2. In order to compute

partially Bayesian KL-optimum designs Theorem 1 applies by setting to zero the

prior probability that the false model is true.

Dierently from the previous example the true statistical model is assumed to

be the Gamma distribution (i.e. π2 = 0). The design region considered is again

the interval X = [1, 2]. Table 2 lists three dierent probability distributions for the

parameter θ1 = (α, γ1).

Table 2: Three dierent distributions for the parameter θ1 = (α, γ1)

Distribution (a) Distribution (b) Distribution (c)

α γ1 p1 α γ1 p1 α γ1 p1

3 1 0.25 3 1 0.125 3 1 0.1253 4 0.25 3 2 0.125 4 1 0.1256 1 0.25 3 3 0.125 5 1 0.1256 4 0.25 3 4 0.125 6 1 0.125

6 1 0.125 3 4 0.1256 2 0.125 4 4 0.1256 3 0.125 5 4 0.1256 4 0.125 6 4 0.125

For distributions (a) and (b) the partially Bayesian KL-optimum design is the same,

ξ∗PB =

1 2

0.5454 0.4545

,

9

while for distribution (c) it changes to

ξ∗PB =

1 2

0.5537 0.4463

.

Thus, in this case the behavior of the partially Bayesian KL-optimum design is sim-

ilar to the behavior of the Bayesian KL-optimum design. The marginal distribution

of parameter α is the same for distributions (a) and (b) and dierent for distribu-

tion (c). Meanwhile, the marginal distribution of the parameter γ1 is the same for

distributions (a) and (c) and dierent for distribution (b). Therefore, the partially

Bayesian KL-optimum design seems to be robust with respect to the parameter γ1

and to be sensitive on the shape parameter α.

4 Discussion

In the literature several optimality criteria for discriminating between models have

been proposed. For instance, the D- and Ds−optimality criteria may be used for

discriminating purposes. A recent approach to this problem is the TE−criterion

proposed by Waterhouse, Woods, Eccleston and Lewis (2007). This new criterion

is based on the expected likelihood ratio test statistic, which is a measure of the

reduction in deviance of two nested models. The expectation is computed through a

simulation study. The TE−criterion is compared with the D-, T- and Ds−optimality

criteria through simulation studies of test power. From these comparisons, the

TE−optimum design seems to have a similar performance to the T-optimum design.

However, dierently from the T-criterion, nominal values of the parameters of the

largest model are not required any more. Thus, in some sense the TE−optimality

criterion overcomes the problem of the T-optimum design dependence on unknown

parameters, even if for the TE−optimum design suitable prior distributions for the

response at each support point are needed, for example, from scientic knowledge.

Other possibilities for overcoming the optimum design dependence on unknown

parameters could be the following pure Bayesian criteria,

10

1. E(θ1,θ2)[∫X I(f1, f2, x, θ2) ξ(dx)]

2. E(θ1,θ2)[∫X I(f2, f1, x, θ1) ξ(dx)]

3. π1E(θ1,θ2)[∫X I(f1, f2, x, θ2) ξ(dx)] + π2E(θ1,θ2)[

∫X I(f2, f1, x, θ1) ξ(dx)].

These criteria are based on the joint prior distribution of the parameters of both

models.

In addition, the following minimax criteria could be considered as well,

1. min(θ1,θ2)

∫X I(f1, f2, x, θ2) ξ(dx)

2. min(θ1,θ2)

∫X I(f2, f1, x, θ1) ξ(dx)

3. π1 min(θ1,θ2)

∫X I(f1, f2, x, θ2) ξ(dx) + π2 min(θ1,θ2)

∫X I(f2, f1, x, θ1) ξ(dx).

In cases 1. and 2. one of the two models needs to be assumed as the true model.

This is overtaken by case 3., where a prior distribution concerning the practitioner's

certainty of the true model is given.

Thus, further research is necessary to solve the problem of dependence of the

optimum design on unknown parameters. This paper provides a possible solution.

References

[1] Atkinson, A. C., Fedorov, V.V.: The designs of experiments for discriminating

between two rival models. Biometrika, 62, 5770 (1975a).

[2] Atkinson, A. C., Fedorov, V.V.: Optimal design: experiments for discriminating

between several models. Biometrika, 62, 289303 (1975b).

[3] Kiefer, J., Wolfowitz J.: The equivalence of two extremum problems. Canad.

J. Math. 12, 363366 (1960).

[4] Kundu, D., Manglick, A.: Discriminating between the Weibull and log-normal

distributions. Nav. Res. Log. 51(6), 893905 (2004).

11

[5] LópezFidalgo, J., Tommasi, C., Trandar, P.C.: An optimal experimental

design criterion for discriminating between non-Normal models. J.R.Statist.Soc

B, 69(2), 231242 (2007).

[6] Ponce de Leon, A.C, Atkinson, A.C.: Optimal experimental design for dis-

criminating between two rival models in the presence of prior information. Bio-

metrika, 78, 601608 (1991a).

[7] Ponce de Leon, A.C, Atkinson, A.C.: Optimal design for discriminating between

two rival binary data models in the presence of prior information. Probastat '91

(ed. Pázman and Voluafová), 123-129. Bratislav: Comenius University (1991b).

[8] Ponce de Leon, A.C, Atkinson, A.C.: The design of experiments to discriminate

between two rival generalized linear models. Lecture Notes in Statistics - Ad-

vances in GLM and Statistical Modelling. Springer-Verlag, New York, 159-164

(1992).

[9] Tommasi, C.: Optimal designs for discriminating among several non-Normal

models. In López-Fidalgo, J., Rodríguez-Díaz, J.M., Torsney, B, (eds) Ad-

vances in Model-Oriented Design and Analysis mODa 8. Physica-Verlag, 213

220 (2007).

[10] Waterhouse, T.H., Woods, D.C., Eccleston, J.A., Lewis, S.M.: Design selec-

tion criteria for discrimination/estimation for nested models and a binomial

response. JSPI (2007), In press.

12