maximum likelihood estimators in finite mixture models with censored data

9
Maximum likelihood estimators in finite mixture models with censored data Yoichi Miyata Faculty of Economics, Takasaki City University of Economics, 1300 Kaminamie, Takasaki, Gunma 370-0801, Japan article info Article history: Received 19 June 2009 Received in revised form 21 April 2010 Accepted 12 May 2010 Available online 21 May 2010 Keywords: Censored data Finite mixture Maximum likelihood estimators Strong consistency abstract The consistency of estimators in finite mixture models has been discussed under the topology of the quotient space obtained by collapsing the true parameter set into a single point. In this paper, we extend the results of Cheng and Liu (2001) to give conditions under which the maximum likelihood estimator (MLE) is strongly consistent in such a sense in finite mixture models with censored data. We also show that the fitted model tends to the true model under a weak condition as the sample size tends to infinity. & 2010 Elsevier B.V. All rights reserved. 1. Introduction Finite mixture models have been much studied both theoretically and practically by several authors (Redner, 1981; Feng and McCulloch, 1996; McLachlan and Peel, 2000; Cheng and Liu, 2001). Feng and McCulloch (1996) have proposed unrestricted MLEs which are consistent under some complicated conditions. Cheng and Liu (2001) have provided easily- verified conditions under which the vector of MLEs will converge to an arbitrary point in the subset representing the true model, allowing the estimators to approach a boundary point of the parameter space. On the other hand, censored and multimodal observations often appear in some fields such as reliability engineering, education, and so on. Chauveau (1995) has discussed a stochastic EM algorithm for the ML fitting of finite mixture models with censored data. However little is known about the strong consistency. In addition, these models cannot be directly applied to Cheng and Liu’s results because each of the components does not uniformly converge to zero as the norm of its partial parameters tends to infinity. This paper extends their approach to show the strong consistency of the MLEs for finite mixture models with censored data when the number of components assumed is larger than or equal to the true one. Section 3 shows the strong consistency of the MLEs and the fitted distributions in finite mixture models with fixed censoring regions. Section 4 provides parameter spaces under which the consistency results hold in a mixture of censored exponential distributions, a mixture of censored normal distributions, and a random censorship mixture model. 2. Definitions and assumptions Let L 1 and B + denote the following spaces of integrable functions on R. L 1 ¼ f ðxÞjf ðxÞ is measurable, Jf J ¼ Z R jf ðxÞj dm o1 & ' , ð1Þ Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jspi Journal of Statistical Planning and Inference 0378-3758/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2010.05.006 E-mail address: [email protected] Journal of Statistical Planning and Inference 141 (2011) 56–64

Upload: yoichi-miyata

Post on 26-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

Journal of Statistical Planning and Inference 141 (2011) 56–64

0378-37

doi:10.1

E-m

journal homepage: www.elsevier.com/locate/jspi

Maximum likelihood estimators in finite mixture models withcensored data

Yoichi Miyata

Faculty of Economics, Takasaki City University of Economics, 1300 Kaminamie, Takasaki, Gunma 370-0801, Japan

a r t i c l e i n f o

Article history:

Received 19 June 2009

Received in revised form

21 April 2010

Accepted 12 May 2010Available online 21 May 2010

Keywords:

Censored data

Finite mixture

Maximum likelihood estimators

Strong consistency

58/$ - see front matter & 2010 Elsevier B.V. A

016/j.jspi.2010.05.006

ail address: [email protected]

a b s t r a c t

The consistency of estimators in finite mixture models has been discussed under the

topology of the quotient space obtained by collapsing the true parameter set into a

single point. In this paper, we extend the results of Cheng and Liu (2001) to give

conditions under which the maximum likelihood estimator (MLE) is strongly consistent

in such a sense in finite mixture models with censored data. We also show that the

fitted model tends to the true model under a weak condition as the sample size tends to

infinity.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

Finite mixture models have been much studied both theoretically and practically by several authors (Redner, 1981;Feng and McCulloch, 1996; McLachlan and Peel, 2000; Cheng and Liu, 2001). Feng and McCulloch (1996) have proposedunrestricted MLEs which are consistent under some complicated conditions. Cheng and Liu (2001) have provided easily-verified conditions under which the vector of MLEs will converge to an arbitrary point in the subset representing the truemodel, allowing the estimators to approach a boundary point of the parameter space. On the other hand, censored andmultimodal observations often appear in some fields such as reliability engineering, education, and so on. Chauveau (1995)has discussed a stochastic EM algorithm for the ML fitting of finite mixture models with censored data. However little isknown about the strong consistency. In addition, these models cannot be directly applied to Cheng and Liu’s resultsbecause each of the components does not uniformly converge to zero as the norm of its partial parameters tends to infinity.This paper extends their approach to show the strong consistency of the MLEs for finite mixture models with censored datawhen the number of components assumed is larger than or equal to the true one.

Section 3 shows the strong consistency of the MLEs and the fitted distributions in finite mixture models with fixedcensoring regions. Section 4 provides parameter spaces under which the consistency results hold in a mixture of censoredexponential distributions, a mixture of censored normal distributions, and a random censorship mixture model.

2. Definitions and assumptions

Let L1 and B+ denote the following spaces of integrable functions on R.

L1 ¼ f ðxÞjf ðxÞ is measurable, Jf J¼

ZRjf ðxÞjdmo1

� �, ð1Þ

ll rights reserved.

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 57

Bþ ¼ ff ðxÞjf 2 L1,Jf J¼ 1,f ðxÞZ0g: ð2Þ

Let O1, O2 be two closed sets in Rk. We denote a metric between the two sets as follows:

disðO1,O2Þ ¼ disðO2,O1Þ ¼ infy2O2

infx2O1

jx�yj, ð3Þ

where j � j is the Euclidean norm. If O1 and O2 are singleton sets, then this metric agrees with the classic Euclidean distance.The following is used later.

Property 1. (i) disðO1,O2Þ ¼ 0 if and only if there are sequences of points, fxng 2 O1 and fyng 2 O2 such that jxn�ynj-0 as

n-1.

(ii) disðxn,OÞ-0 if and only if there is a sequence fyng of points in O, such that jxn�ynj-0 as n-1.

Let F0 ¼ ff0ðxjhÞjh 2Hg be a parametric family of one dimensional probability density functions (pdfs) with respect to a

s�finite measure m0 on RDR from which mixtures are to be formed. Let X0 be a random variable taking values in R withthe pdf f 0ðxjp,hÞ ¼

Pgj ¼ 1 pjf

0ðxjhjÞ, where f 0ðxjhjÞ 2 F0, H is a closed set belonging to Rd,

p 2P� ðp1,p2, . . . ,pgÞ

�����pjZ0,Xg

j ¼ 1

pj ¼ 1g and

8<: ð4Þ

h 2 Hg� fðh1,h2, . . . ,hgÞjhj 2 H,ðj¼ 1,2, . . . ,gÞg, ð5Þ

(i.e., Hg¼H� � � � �H is the Cartesian product of g copies of H). Let G¼P�Hg be a parameter space. Then G is a closed

set.Given a partition R0,R1,y,Rq of R, we observe a random variable X ¼ X01½X02R0 �

þPq

k ¼ 1 ck1½X02Rk �taking

values in X ¼ R0 [ fc1, . . . ,cqg, where 1½�� is an indicator function, and ck is just a code for the event fX0 2 Rkg. Let m bethe measure on X which coincides with m0 on R0 and whose restriction of fc1, . . . ,cqg is the counting measure onthis set.

Then the pdf of X with respect to m is given by

f ðxjp,hÞ ¼Xg

j ¼ 1

pjf ðxjhjÞ, ð6Þ

where FkðhjÞ ¼R

Rkf 0ðtjhjÞdt, and

f ðxjhjÞ ¼ f 0ðxjhjÞ1½x2R0 � þXq

k ¼ 1

FkðhjÞ1½x ¼ ck �: ð7Þ

We say that Eq. (6) is a finite mixture model with fixed censoring region. For any given ðp0,h0Þ 2 G such that

f ðxjp0,h0Þ 2 Bþ , we define the set

Gðp0,h0Þ ¼ fðp,hÞjðp,hÞ 2 G, and f ðxjp,hÞ ¼ f ðxjp0,h0

Þg: ð8Þ

As well known in the ordinary mixture models, Gðp0,h0Þ is not a singleton set, and hence the MLE is not consistent in the

sense of converging to a unique point. Therefore we shall use the distance (3) between the MLE and Gðp0,h0Þ to discuss the

strong consistency.In this paper, we allow the true parameter ðp0,h0

Þ to be a boundary point of G. In this case, Gðp0,h0Þ becomes a

continuum of parameter values (e.g., see Cheng and Liu, 2001; McLachlan and Peel, 2000, p. 28). Then the true model takesthe reduced form f ðxjp0,h0

Þ ¼Pg0

j ¼ 1 p0ljf ðxjh0

ljÞ with 1rg0rg�1, and h0

lj2 H. This means that the true number of

components, g0, is unknown, but the maximum number of g0 is known.

2.1. Assumptions

We write expectations of g(x) under f ðxjh0i Þ by E

h0i½gðXÞ� ¼

RR0

gðxÞf ðxjh0i Þdxþ

Pqk ¼ 1 gðckÞfiðckjh

0i Þ ¼

RXgðxÞf ðxjh0

i Þdm. Herewe shall give sufficient conditions (a)–(g) under which the main results hold in the model (6).

(a)

For any hj 2 H, f ðxjhjÞ 2 Bþ ðj¼ 1, . . . ,gÞ. Furthermore, f ðxjh1j Þ ¼ f ðxjh2

j Þ in Bþ only if h1j ¼ h2

j .

(b) The support of f ðxjhjÞ does not depend on the parameter hj 2 H. (c) Let i=1,y,g and j=1,y,g.

� Eh0

i½logf ðXjhjÞ�4�1 for any hj 2 H,

� Eh0

i½logmaxff ðXjhjÞ,1g�o1 for any hj 2 H,

� Eh0

i½logsupjh0j�hj jrrmaxf1,f ðXjh0jÞg�o1 for small r40, and any hj 2 H,

� E 0 ½logsupjh jZ rmaxf1,f ðXjhjÞg�o1 for large r40.

hi j

(d)

For any fixed x 2 X , f ðxjhjÞ are continuous with respect to hj 2 H. (e) limjhj j-1f ðxjhjÞ ¼ 0 for any fixed x 2 R0.

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6458

(f)

For any infinite cluster point h�j in H,

Pqk ¼ 1 limi-1suphj2Uiðh

�j Þ

FkðhjÞr1, where fUiðh�j Þg, (i=1,2,y) is a sequence of

decreasing neighbourhoods of the point h�j such that \iZ1Uiðh

�j Þ ¼ h

�j .

� � 0

(g)

Suppose that g0 is any number with 1rg0rg�1, ðh1 , . . . ,hg0 Þ is any point in Hg , and ðh01, . . . ,h0

g Þ is any point in Hg .

Xg0

j ¼ 1

ajf ðxjh�j Þ ¼

Xg

j ¼ 1

bjf ðxjh0j Þ, a:e: P0 on R0 ð9Þ

impliesPg0

j ¼ 1 aj ¼ 0 orPg0

j ¼ 1 aj ¼Pg

j ¼ 1 bj, where P0 is the probability measure corresponding to f ðxjp0,h0Þ.

Conditions (a) and (b) correspond to Cheng and Liu’s assumption (a). Condition (c) corresponds to Cheng and Liu’sassumption (b). Conditions (d) and (e) correspond to Cheng and Liu’s assumption (c). Condition (d) means that for any fixedx 2 R0, f 0ðxjhjÞ are continuous with respect to hj, and FkðhjÞ (k=1,y,q, j=1,y,g) are continuous with respect to hj. Condition(f) holds automatically if the censoring region is one i.e. q=1. As described in Section 4, condition (g) is verified by the sametechniques that Teicher (1963), and/or Yakowitz and Spragins (1968) used to check whether the class of finite mixtures isidentifiable. Note that the identifiability of finite mixture models is defined slightly different from that of the ordinaryparametric models, e.g., see McLachlan and Peel (2000, p. 27).

Remark 1. Cheng and Liu’s approach cannot be adopted directly to show the strong consistency. For example, we considera mixture of normal distributions under Type I censoring,

f ðxjp,hÞ ¼Xg

j ¼ 1

pj nðxjmj,s2j Þ1½xo c� þ

Z 1c

nðtjmj,s2j Þdt1½x ¼ c�

� �, ð10Þ

where c is a known constant, and nðxjmj,s2j Þ is a Gaussian density with mean mj and variance s2

j . Let Fð�Þ be the standardnormal distribution function. Then we can see from

R1c nðtjmj,s2

j Þ dt¼ 1�Fððc�mjÞ=sjÞ that each of the components,f ðxjhjÞ ¼ nðxjmj,s2

j Þ1½xo c� þR1

c nðtjmj,s2j Þdt1½x ¼ c�, does not tend to 0 when x=c and sj-1. Thus model (10) does not fulfill

assumption (c) of Cheng and Liu (2001).

Remark 2. The effect of assumption (c) in Cheng and Liu are to make the density h uniquely defined at any infinite clusterpoint of a parameter space and

R1�1

hr1, and make the set Gða0,y0Þ a close set. Our new conditions (e) and (f) also play the

same role. Although model (6) cannot be uniquely defined at some infinite cluster points of G, as shown in Lemma 7, itfollows from conditions (e) and (f) that the limit superior of f is uniquely defined at such a point, and bounded above by afunction f� with

R1�1

f�r1. This fact is the key to the proof of Theorem 1.

3. Main results

We denote the MLEs by ðpn,hnÞ, and expectations of g(x) under f ðxjp0,h0

Þ by E0½gðXÞ� ¼R

R0gðxÞf ðxjp0,h0

ÞdxþPq

k ¼ 1 gðckÞ

f ðckjp0,h0Þ ¼RXgðxÞf ðxjp0,h0

Þdm where m is the measure given in Section 2. As a note, for any given n, the MLE ðpn,hnÞ is not

necessarily unique, so that it can actually be anyone of a set of possible choices. The point however, as the followingtheorem shows, is that anyone of these values is allowed.

Theorem 1. Let X=(X1,X2,y,Xn) be iid observations with the probability distribution f ðxjp0,h0Þ and the parameter space G. Then

under conditions (a)–(g),

P limn-1

disfðpn,hnÞ,Gðp0,h0

Þg ¼ 0Þ ¼ 1:�

ð11Þ

Clearly (11) implies disfGðpn,hnÞ,Gðp0,h0

Þg-0, w.p.1 because fðpn,hnÞgDGðpn,hn

Þ.

Proof. Without loss of generality, we can assume that G is compact because we extend G to include all its infinite clusterpoints that it is a compact set under a certain metric. See, for example, Kiefer and Wolfowitz (1956), and/or Hathaway(1985). Each set in G then consists of finite points and these infinite clusters.

We adopt the approach used in Wald (1949). Letting E40 be an arbitrary strictly positive value, we show that for any

closed subset S of G such that disfS,Gðp0,h0ÞgZE,

P limn-1

supðp,hÞ2S

Qni ¼ 1 f ðXijp,hÞQn

i ¼ 1 f ðXijp0,h0Þ¼ 0

!¼ 1: ð12Þ

To show (12), we only need to confirm that for each finite or infinite cluster point ðp� ,h�Þ in S, there is always a

neighbourhood Uðp� ,h�Þ of the point such that

E0 log supðp,hÞ2Uðp� ,h

�Þ

f ðXjp,hÞ

" #oE0½logf ðXjp0,h0

Þ�: ð13Þ

Note that supðp,hÞ2Uðp� ,h

�Þf ðxjp,hÞ is measurable by condition (d).

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 59

For any finite point ðp� ,h�Þ, Eq. (13) follows from Lemma 6 and the same argument as in the proof of Theorem 1 of Cheng

and Liu (2001, p. 607). For any infinite cluster point ðp� ,h�Þ, Eq. (13) also holds from Lemma 7.

Finally we shall prove that (13) implies (12). By using the Heine–Borel finite open cover theorem and the same technique

as used in the proof of Theorem 1 in Wald (1949), (12) follows. Therefore we can show that disfðpn,hnÞ,Gðp0,h0

Þg-0 w.p.1

by arguing as in the proof of Theorem 2 of Wald (1949) with jy�y0j replaced by disfðp,hÞ,Gðp0,h0Þg. &

Among conditions (a)–(g), (c) is rather tedious to verify. The following gives a simpler condition that is sufficient forcondition (c) to hold.

Corollary 1. Let f ðxjhjÞ satisfy conditions (a), (b), (d)–(g) and Eh0

i½logf ðXjhjÞ�4�1. Suppose that there exists a function S(x)

such that

�jf ðxjhjÞjrSðxÞ for any hj 2 H, ð14Þ

�Eh0

i½SðXÞ�o1: ð15Þ

Then the conclusion of Theorem 1 holds.

Proof. We have only to prove that conditions (14) and (15) imply condition (c). This can be proved by using the inequalitylogmaxf1,xgrx for xZ0. &

The following is useful in verifying condition (g).

Corollary 2. We remind the family F 0 ¼ ff0ðxjhÞjh 2 Hg stated in Section 2. f 0ðxjhÞ1½x2R0� has transforms gðtjhÞ defined for t

belonging to some domain of definition, SgðhÞ. It is assumed that the mappingM : f 0ðxjhÞ1½x2R0 �-gðtjhÞ is linear and one-to-one.Suppose that there is a total ordering $ such that h1!h2 implies:

(i)

Sgðh1ÞDSgðh2Þ, (ii) There is some t1 2 Sgðh1Þ

c (the complement of Sgðh1Þ) such that

limt-t1

gðtjh2Þ

gðtjh1Þ¼ 0:

Then for h1!h2! � � �!hJ in H, a1 ¼ a2 ¼ � � � ¼ aJ ¼ 0 if and only if

XJ

j ¼ 1

ajf0ðxjhjÞ ¼ 0 for any fixed x 2 R0: ð16Þ

Conditions (i) and (ii) are equivalent to those of Teicher (1963), and usually verified via integral transforms such as theFourier or Laplace transform. However it is often more convenient to argue in terms of densities. Hence, in Section 4.2, wewill use the identity transform M : f 0ðxjhÞ1½x2R0�-f 0ðxjhÞ1½x2R0 � to verify condition (g).

Proof. This is proved by arguing as in the proof of Theorem 2 of Teicher (1963). It follows from the transformed version of(16),

PJj ¼ 1 ajgðtjhjÞ ¼ 0, that

a1þXJ

j ¼ 2

aj

gðtjhjÞ

gðtjh1Þ¼ 0 for t 2 Sgðh1Þ \ ftjgðtjh1Þa0g:

Letting t-t1, we have a1=0 by (ii). Thus, repeating the same argument forPJ

j ¼ 2 ajgðtjhjÞ ¼ 0 yields a2 ¼ � � � ¼ aJ ¼ 0. &

Theorem 1 shows that when the true model is an indeterminate case then the MLE converges in the distance (3) towardsthe indeterminate set of points Gðp0,h0

Þ defining the model. This does not in itself guarantee that the fitted modelconverges to the true model, which is often used in practice. The following theorem guarantees that the fitted model tendsto the true model as n-1. If Gðp0,h0

Þ is a singleton set, the limit point limn-1ðpn,hnÞ ¼ ðp0,h0

Þ is finite. In contrast, ifGðp0,h0

Þ is a continuum of parameter values, as pointed out in Cheng and Liu (2001), the limit point might be an infinitecluster point. Thus, to show the convergence of the fitted model, we shall take account of the case that the MLE tends to aninfinite cluster point as n-1.

Theorem 2. Assume that conditions (d) and (e) hold, and

(A1) For any sequence fhnj g and fh0n

j g such that limn-1jhnj j ¼1 and limn-1jh

nj �h0n

j j ¼ 0, it holds that for j=1,y,g and

k=1,y,q,

limn-1jFkðhn

j Þ�Fkðh0nj Þj ¼ 0: ð17Þ

If disfðpn,hnÞ,Gðp0,h0

Þg-0 w.p.1, then for any fixed x 2 X ,

limn-1

f ðxjpn,hnÞ ¼ f ðxjp0,h0

Þ w:p:1: ð18Þ

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6460

Proof. When disfðpn,hnÞ,Gðp0,h0

Þg-0 as n-1, from Property 1, there is a sequence ðp0n,h0nÞ in Gðp0,h0

Þ such that0n n 0n n n n 0 0 n n 0n 0n

jp �p jþjh �h j-0 w.p.1. For any x 2 R0, f ðxjp ,h Þ�f ðxjp ,h Þ ¼ f ðxjp ,h Þ�f ðxjp ,h Þ-0 w.p.1 from conditions (d)and (e). Note that even if limn-1jh

nj j ¼1, the above holds from condition (e). Thus combining this with Eq. (17) completes

the proof. &

Note that Fkðhnj Þ might not have the limit as jhn

j j-1. For example, we consider one piece of the probability of X=c inmodel (10), F1ðmn

1,sn1Þ ¼ 1�Fððc�mn

1Þ=sn1Þ. Then limn-1F1ðmn

1,sn1Þ does not exist if ðmn

1,sn1Þ-ð1,1Þ. However condition (A1)

is mild, and holds if FkðhjÞ (j=1,y,g, k=1,y,q) satisfy the Lipschitz condition, i.e., there exists a constant L1Z0 such thatjFkðhj1Þ�Fkðhj2ÞjrL1jhj1�hj2j for all hj1, hj2 2 H. If condition (A1) fails, f ðxjpn,hn

Þ does not converge to f ðxjp0,h0Þ at some

censored point x=ck. However, in such a case, Eq. (18) still holds only for any x 2 R0.

4. Examples

In this section, we will give parameter spaces for mixtures of censored distributions under which the strong consistencyholds.

4.1. A mixture of censored exponential distributions

We consider a mixture of Weibull exponential distributions under Type I right-censoring,

f ðxjp,hÞ ¼Xg

j ¼ 1

pjfayjxa�1expð�yjx

aÞ1½0oxo c� þexpð�yjcaÞ1½x ¼ c�g, ð19Þ

where c40 and a40 are known constants, yj ¼ expðtjÞ is regarded as a function of tj, and G¼ fðp1, . . . ,pg ,t1, . . . ,tgÞjPg

j ¼ 1

pj ¼ 1,pjZ0,�1otjo1g is a parameter space. This model appears in failure analysis where failure often occurs for morethan one reason (e.g, see Mendenhall and Hader, 1958). For simplicity of exposition, we treat only the case of g=2 anda¼ 1, but the result holds for an arbitrary number g and constant a. Consequently, we will verify condition (g) for model(19) with g=2 and a¼ 1, i.e., a two-component mixture of censored exponential distributions. Then Eq. (9) in condition (g)becomes

a1y�1 e�y

�1 x ¼ b1y

01e�y

01xþb2y

02e�y

02x for 0oxoc, ð20Þ

where 0oy�1 o1, 0oy01o1, and 0oy0

2o1. If y�1 ay01, y0

1ay02, and y0

2ay�1 , we substitute x¼ E1,2E1,3E1 into (20) with asmall number E140. Without loss of generality we can assume that c43 and E1 ¼ 1. Then we have

�y�1 e�y�1 y0

1e�y01 y0

2e�y02

�y�1 e�2y�1 y01e�2y0

1 y02e�2y0

2

�y�1 e�3y�1 y01e�3y0

1 y02e�3y0

2

0BBB@

1CCCA

a1

b1

b2

0B@

1CA¼

0

0

0

0B@

1CA: ð21Þ

By a property of the Vandermonde matrix, the determinant of matrix in the left-hand side in (21), denoted by det(A),becomes

detðAÞ ¼ �y�1 y01y

02expð�y�1 �y

01�y

02Þdet

1 1 1

e�y�1 e�y

01 e�y

02

e�2y�1 e�2y01 e�2y0

2

0B@

1CAa0:

Therefore we have a1=b1=b2=0.If y�1 ¼ y0

1ay02, Eq. (9) in condition (g) becomes

0¼ ðb1�a1Þy01e�y

01xþb2y

02e�y

02x for 0oxoc: ð22Þ

Substituting x=1,2 into (22), and arguing as in the above leads to b1�a1=0, b2=0. Therefore a1=b1+b2. For the other cases,we have a1=0 or a1=b1+b2 by the same argument. Hence model (19) satisfies condition (g). Because we can easily verifythe other conditions (a)–(f), Theorem 1 holds. Note that G equals the parameter space of the model that does not adjust forcensoring i.e., c¼1.

4.2. A mixture of censored normal distributions

We consider model (10) with the parameter space

G¼ ðp1, . . . ,pg ,m1, . . . ,mg ,s1, . . . ,sgÞXg

j ¼ 1

pj ¼ 1,pjZ0,mj 2 R,sj 2 RZðj¼ 1, . . . ,gÞ

������8<:

9=;, ð23Þ

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 61

where RZ ¼ fxj0oZrxo1g. It is not unreasonable to assume that sj (j=1,y,g) have positive lower bounds as the cases ofzero variances are degenerate. Without these lower bounds, the likelihood will be unbounded if mj is set equal to anyobserved value and sj tends to zero, and hence this violates condition (c). For simplicity of exposition, we treat only thecase of g=2, but the result can be extended to an arbitrary number g. First, we verify condition (g). Suppose that

a1nðxjm�1 ,s�21 Þ ¼ b1nðxjm0

1,ðs01Þ

2Þþb2nðxjm0

2,ðs02Þ

2Þ for xoc, ð24Þ

where ðm�1 ,s�1 Þ 2 R�RZ, ðm01,s0

1Þ 2 R�RZ, and ðm02,s0

2Þ 2 R�RZ.A total ordering, which is called a ‘‘lexicographical’’ ordering, is defined by

ðm1,s1Þ!ðm2,s2Þ if s2os1 or if s1 ¼ s2 and m1om2:

Then ðm1,s1Þ!ðm2,s2Þ implies

limx-�1

nðxjm2,s22Þ1½xo c�

nðxjm1,s21Þ1½xo c�

¼ 0, ð25Þ

which satisfies the conditions of Corollary 2, and condition (a). Before applying Corollary 2 to Eq. (24), we need to combinethe same functions among nðxjm�1 ,s�2

1 Þ, nðxjm01,ðs0

1Þ2Þ, and nðxjm0

2,ðs02Þ

2Þ. Let y�1 ¼ ðm

�1 ,s�1 Þ, y

01 ¼ ðm0

1,s01Þ, and y0

2 ¼ ðm02,s0

2Þ. Ify�1 ¼ y0

1ay02, Eq. (24) becomes

ðb1�a1Þnðxjm01,ðs0

1Þ2Þþb2nðxjm0

2,ðs02Þ

2Þ ¼ 0 for xoc: ð26Þ

By Corollary 2, b1�a1=0, and b2=0, and hence a1=b1+b2.Furthermore, if y�1 ay0

1, y01ay0

2, and y02ay�1 , then Eq. (24) becomes

�a1nðxjm�1 ,s�21 Þþb1nðxjm0

1,ðs01Þ

2Þþb2nðxjm0

2,ðs02Þ

2Þ ¼ 0 for xoc: ð27Þ

Therefore a1=0, b1=0, and b2=0. For the other cases, we have a1=0 or a1=b1+b2 by the same argument. The otherconditions (b)–(f) can be verified from Corollary 1, and hence Theorem 1 holds.

Subsequently, we verify that the mixture model satisfies condition (A1) under the parameter space (23). We considerthe sequences fðmn

j ,snj Þg and fðm0n

j ,s0nj Þg in Theorem 2, i.e., limn-1jðmn

j ,snj Þj ¼1, and limn-1jðmn

j ,snj Þ�ðm

0nj ,s0n

j Þj ¼ 0. Writingmn

j ¼ m0nj þoð1Þ and sn

j ¼ s0nj þoð1Þ, we have

c�mnj

snj

�c�m0n

j

s0nj

¼�oð1Þ

s0nj þoð1Þ

1þc�m0n

j

s0nj

!: ð28Þ

If limn-1jðc�m0nj Þ=s

0nj jo1, it follows from Eq. (28) that

limn-1

Z 1c

nðtjmnj ,sn

j Þdt�

Z 1c

nðtjm0nj ,s0n

j Þdt

��������¼ 0: ð29Þ

On the other hand, because limn-1ðc�mnj Þ=s

nj ¼71 when limn-1ðc�m0n

j Þ=s0nj ¼ 71, (29) holds. Therefore Theorem 2 holds.

4.3. A random censorship mixture model

This section applies the result of Theorem 1 to a random censorship mixture model. We assume that the observationsconsist of n pairs ðX1,d1Þ, . . . ,ðXn,dnÞwhere Xi=min(X0

i ,Yi) is either an observed random variable X0i or an observed censoring

variable Yi independent of X0i , and di ¼ 1½X0

i4Yi �

. Furthermore, we assume that X01,y,X0

n are iid with densityf 0ðxjp,hÞ ¼

Pgj ¼ 1 pjf

0ðxjhjÞ, and Y1,y,Yn are iid with an arbitrary unknown density q(y). As in the usual random censorshipmodels, the density of ðXi,diÞ is given by

f ðx,djp,hÞ ¼ f 0ðxjp,hÞQ ðxÞ1½d ¼ 0� þqðxÞF ðxjp,hÞ1½d ¼ 1� ¼Xg

j ¼ 1

pjf ðx,djhjÞ, ð30Þ

where Q ðxÞ ¼R1

x qðtÞdt, F ðxjp,hÞ ¼R1

x f 0ðtjp,hÞdt, F ðxjhjÞ ¼R1

x f 0ðtjhjÞdt, and f ðx,djhjÞ ¼ f 0ðxjhjÞQ ðxÞ1½d ¼ 0� þqðxÞ

F ðxjhjÞ1½d ¼ 1�.Then we obtain the following result by the same argument as in the proof of Theorem 1 because (30) also takes the form

of a mixture model.Let (a)

0

–(d)0

be conditions (a)–(d) with f ðxjhjÞ replaced by f ðx,djhjÞ, and let (e)0

be condition (e) with ‘‘f ðxjhjÞ for anyx 2 R0’’ replaced by f ðx,0jhjÞ. Let (g)

0

be condition (g) with Eq. (9) replaced by

Xg0

j ¼ 1

ajf0ðxjh

�j ÞQ ðxÞ ¼

Xg

j ¼ 1

bjf0ðxjh0

j ÞQ ðxÞ, a:e:P0: ð31Þ

Corollary 3. Let ðX1,d1Þ, . . . ,ðXn,dnÞ be iid observations with the probability distribution (30) satisfying conditions (a)0

–(e)0

, and

(g)0

. Then the MLE ðpn,hnÞ is strongly consistent in the sense of (11).

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6462

This result can be also extended to the model in which Yi has a density with unknown parameters, but we do not discusshere to save space because the representation is rather tedious.

Proof. The one corresponding to condition (f) is given by (f)0

.

For any infinite cluster point h�j in H, Eq½limi-1suphj2Uiðh

�j Þ

F ðXjhjÞ�r1, where Eq½�� ¼RX�qðxÞdx.

Condition (f)0

obviously holds because F ðxjhjÞr1 for any x and hj 2 H. In addition, conditions (a)0

–(e)0

are essentially the

same as (a)–(e), and hence the results corresponding to Lemmas 3–6 follows. Therefore the result corresponding to Lemma

7 follows, which completes the proof. &

For example, if the density of X0 in (30) is f 0ðxjp,hÞ ¼Pg

j ¼ 1 pjnðxjmj,s2j Þ, the probability distribution of ðX,dÞ is given by

f ðx,djp,hÞ ¼Xg

j ¼ 1

pj nðxjmj,s2j ÞQ ðxÞ1½d ¼ 0� þqðxÞF

x�mj

sj

� �1½d ¼ 1�

� �,

where Fð�Þ ¼ 1�Fð�Þ. For simplicity of exposition, suppose that the density q(x) satisfies Eh0

i½logf ðX,djhjÞ�4�1 and

Eq½logmaxfqðXÞ,1g�o1. Then, by Corollaries 1 and 2, we can verify conditions (a)0

–(g)0

under the parameter space (23), andhence Corollary 3 holds.

5. Concluding remarks

We have extended the consistency proof of Cheng and Liu (2001) to mixture models with censored data. Therefore thestrong consistency still holds even if there are more components in the estimated model than the true model, and alsoholds in noncensored cases if conditions (a)–(e) are fulfilled. Besides this result can be easily extended to censoredmultivariate mixture models by a small modification of the conditions although we do not discuss here to save space.

Hathaway (1985) has used the device in Section 6 of Kiefer and Wolfowitz (1956) to prove the strong consistency ofMLEs in mixtures of normals under the parameter space (23) with sj 2 RZ replaced by the inequality constraintmini,jðsi=sjÞZc40. This device could be applied to the censored normal mixture models (10), but rather tediousevaluation will be needed.

Acknowledgements

The author is grateful to the associate editor and anonymous referees for helpful suggestions that led to improvement ofthe paper. This research was partially supported by a grant from Mathematics Education Society of Waseda University, agrant from Takasaki City University of Economics, and Grant-in-Aid for Scientific Research (A) 19204009.

Appendix A. Some lemmas

This section gives some lemmas for the proof in Section 3. Let fUiðp� ,h�Þg (i=1,2,y) be a sequence of decreasing

neighbourhoods of the point ðp� ,h�Þ such that

TiZ1Uiðp

� ,h�Þ ¼ ðp� ,h

�Þ. Recall the set S stated in the proof of Theorem 1.

Lemma 3. Under conditions (a)–(c), the following results hold.

E0½logf ðXjp,hÞ�4�1 for any ðp,hÞ 2 G, � E0½logmaxf1,f ðXjp,hÞg�o1 for any ðp,hÞ 2 G, � E0½logsupjðp0 ,h0 Þ�ðp,hÞjrrmaxf1,f ðXjp0,h0Þg�o1 for small r40, and any ðp,hÞ 2 G. � E0½logsupjðp,hÞjZ rmaxf1,f ðXjp,hÞg�o1 for large r40.

These results imply E0½jlogf ðXjp0,h0Þj�o1, and E0½logsup

ðp,hÞ2Uiðp� ,h�Þf ðXjp,hÞ�o1 for any finite or infinite cluster point

ðp� ,h�Þ in G.

Proof. The desired results follow from the following inequalities:

logXg

j ¼ 1

pjf ðxjhjÞZXg

j ¼ 1

pjlogf ðxjhjÞ ðJensen’s inequalityÞ, ðA:1Þ

logXg

j ¼ 1

pjf ðxjhjÞrXg

j ¼ 1

logf ðxjhjÞ for f ðxjhjÞZ1 ðj¼ 1, . . . ,gÞ: & ðA:2Þ

Lemma 4. Let C ¼ ff 2 L1j Jf Jo1,f Z0g and let Eg ½hðXÞ� ¼RXhðxÞgðxÞdm. Then for any f 2 C and g 2 Bþ , Eg ½logff ðXÞ=gðXÞg�o0.

Proof. The inequality follows from Jensen’s inequality. &

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 63

Lemma 5. Suppose that ðp� ,h�Þ is an infinite cluster point satisfying either lim

ðp,hÞ-ðp� ,h�Þf ðxjp,hÞ ¼ 0 for any x 2 R0 or

limi-1supðp,hÞ2Uiðp

� ,h�Þf ðckjp,hÞ ¼ 0 for some ck. Then under conditions (a)–(c), it follows that

limi-1

E0 log supðp,hÞ2Uiðp

� ,h�Þ

f ðXjp,hÞ

" #¼�1: ðA:3Þ

Proof. Eq. (A.3) follows from Lemma 3 and the same argument as in Lemma 3 of Wald (1949). &

Lemma 6. Let ðp� ,h�Þ be a finite or infinite cluster point in S. Then under conditions (a)–(c),

limi-1

E0 log supðp,hÞ2Uiðp

� ,h�Þ

f ðXjp,hÞ

" #rE0½logf ðXjp� ,h

�Þ�, ðA:4Þ

where f ðxjp� ,h�Þ ¼ limi-1sup

ðp,hÞ2Uiðp� ,h�Þf ðxjp,hÞ.

Proof. By arguing as in the proof of Theorem 1 of Cheng and Liu (2001), and using Fatou’s lemma and Lemma 3, the resultis proved. The detailed proof is available from the author. &

Lemma 7. If conditions (a)–(g) hold, for each infinite cluster point ðp� ,h�Þ 2 S, there exists a neighbourhood Uðp� ,h

�Þ satisfying

Eq. (13).

Proof. We show that (13) is true when Uiðp� ,h�Þ degenerate into the single point ðp� ,h

�Þ. Let h

�j be any infinite cluster

point, and let F� kðh�j Þ ¼ limi-1suphj2Uiðh

�j Þ

FkðhjÞ. Because limhj-h�j

f ðxjhjÞ ¼ 0 for any x 2 R0 from condition (e), f ðxjp� ,h�Þ is

bounded above by a function having one of the following forms:

(I)

f� ðxjp� ,h�Þ ¼

PAj ¼ 1 p

�aj

Pqk ¼ 1 F� kðh

�ajÞ1½x ¼ ck �

, where 0rArg, and p�aj

Pqk ¼ 1 F� kðh

�ajÞ40 (j=1,y,A).

(II)

f� ðxjp� ,h�Þ ¼

PBj ¼ 1 p

�bj

f 0ðxjh�bjÞ1½x2R0� þ

Pqk ¼ 1 Fkðh

�bjÞ1½x ¼ ck �

n o, where 1rBrg�1, and p�bj

f 0ðxjh�bjÞ40 (j=1,y,B).

(III)

f� ðxjp� ,h

�Þ ¼

XM1

j ¼ 1

p�mjf 0ðxjh

�mjÞ1½x2R0� þ

Xq

k ¼ 1

Fkðh�mjÞ1½x ¼ ck�

( )ðA:5Þ

þXM2

j ¼ M1þ1

p�mj

Xq

k ¼ 1

F� kðh�mjÞ1½x ¼ ck �

, ðA:6Þ

where 1rM1rg�1, M1þ1rM2rg, p� f 0ðxjh�Þ40 ðj¼ 1; . . . ;M1Þ, and p�

Pq F� kðh�Þ40 ðj¼M1þ1; . . . ;M2Þ.

mj mj mj k ¼ 1 mj

In (I), (13) follows from Lemma 5. In (II), it follows from the same argument as in the proof of Theorem 1 of Cheng and Liu

(2001, p. 608) that f� ðxjp� ,h�Þ does not equal f ðxjp0,h0

Þw.p.1 under P0 (henceforth, denoted by f� ðxjp� ,h�Þaf ðxjp0,h0

Þ a.e.

P0). Thus it follows from Lemma 6 and Jensen’s inequality that

limi-1

E0 log supðp,hÞ2Uiðp

� ,h�Þ

f ðXjp,hÞ

" #rE0½logf� ðXjp� ,h

�Þ� ðA:7Þ

oE0½logf ðXjp0,h0Þ�: ðA:8Þ

In (III), (A.7) holds from Lemma 6. To show inequality (A.8), we have only to prove f� ðxjp� ,h�Þaf ðxjp0,h0

Þ a.e. P0. If there

exists j0

ðj0 ¼M1þ1, . . . ,M2Þ such thatPq

k ¼ 1 F� kðh�mj0Þo1, then Jf� Jo1. Hence inequality (A.8) holds from Lemma 4.

Next, whenPq

k ¼ 1 F� kðh�mjÞ ¼ 1 for any j (j=M1+1,y,M2) in (III), we prove by reduction to the absurd that

f� ðxjp� ,h�Þaf ðxjp0,h0

Þ a.e. P0. Assuming that f� ðxjp� ,h�Þ ¼ f ðxjp0,h0

Þ a.e. P0, we have

XM1

j ¼ 1

p�mjf ðxjh

�mjÞ ¼

Xg

j ¼ 1

p0j f ðxjh0

j Þ, a:e: P0 on R0, ðA:9Þ

where 1rM1rg�1. Then it follows from condition (g) thatPM1

j ¼ 1 p�mj¼ 0 or

PM1

j ¼ 1 p�mj¼Pg

j ¼ 1 p0j ¼ 1. This contradicts the

assumption of (III), which completes the proof. &

References

Chauveau, D., 1995. A stochastic EM algorithm for mixtures with censored data. J. Statist. Plann. Inference 46, 1–25.Cheng, R.C.H., Liu, W.B., 2001. The consistency of estimators in finite mixture models. Scand. J. Statist. 28, 603–616.Feng, Z., McCulloch, C.E., 1996. Using bootstrap likelihood ratio in finite mixture models. J. Roy. Statist. Soc. Ser. B 58, 609–617.Hathaway, R.J., 1985. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Statist. 13, 795–800.

Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6464

Kiefer, J., Wolfowitz, J., 1956. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math.Statist. 27, 888–906.

McLachlan, G., Peel, D., 2000. Finite Mixture Models. Willy, New York.Mendenhall, W., Hader, R.J., 1958. Estimation of parameters of mixed exponentially distributed failure time distributions from censored life test data.

Biometrika 45, 504–520.Redner, A.R., 1981. Note on the consistency of the maximum likelihood estimation for nonidentifiable distribution. Ann. Statist. 9, 225–228.Teicher, H., 1963. Identifiability of finite mixtures. Ann. Math. Statist. 34, 1265–1269.Wald, A., 1949. Note on the consistency of the maximum likelihood estimates. Ann. Math. Statist. 20, 595–601.Yakowitz, S.J., Spragins, J.D., 1968. On the identifiability of finite mixtures. Ann. Math. Statist. 39 (1), 209–214.