maximum likelihood estimators in finite mixture models with censored data
TRANSCRIPT
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference
Journal of Statistical Planning and Inference 141 (2011) 56–64
0378-37
doi:10.1
E-m
journal homepage: www.elsevier.com/locate/jspi
Maximum likelihood estimators in finite mixture models withcensored data
Yoichi Miyata
Faculty of Economics, Takasaki City University of Economics, 1300 Kaminamie, Takasaki, Gunma 370-0801, Japan
a r t i c l e i n f o
Article history:
Received 19 June 2009
Received in revised form
21 April 2010
Accepted 12 May 2010Available online 21 May 2010
Keywords:
Censored data
Finite mixture
Maximum likelihood estimators
Strong consistency
58/$ - see front matter & 2010 Elsevier B.V. A
016/j.jspi.2010.05.006
ail address: [email protected]
a b s t r a c t
The consistency of estimators in finite mixture models has been discussed under the
topology of the quotient space obtained by collapsing the true parameter set into a
single point. In this paper, we extend the results of Cheng and Liu (2001) to give
conditions under which the maximum likelihood estimator (MLE) is strongly consistent
in such a sense in finite mixture models with censored data. We also show that the
fitted model tends to the true model under a weak condition as the sample size tends to
infinity.
& 2010 Elsevier B.V. All rights reserved.
1. Introduction
Finite mixture models have been much studied both theoretically and practically by several authors (Redner, 1981;Feng and McCulloch, 1996; McLachlan and Peel, 2000; Cheng and Liu, 2001). Feng and McCulloch (1996) have proposedunrestricted MLEs which are consistent under some complicated conditions. Cheng and Liu (2001) have provided easily-verified conditions under which the vector of MLEs will converge to an arbitrary point in the subset representing the truemodel, allowing the estimators to approach a boundary point of the parameter space. On the other hand, censored andmultimodal observations often appear in some fields such as reliability engineering, education, and so on. Chauveau (1995)has discussed a stochastic EM algorithm for the ML fitting of finite mixture models with censored data. However little isknown about the strong consistency. In addition, these models cannot be directly applied to Cheng and Liu’s resultsbecause each of the components does not uniformly converge to zero as the norm of its partial parameters tends to infinity.This paper extends their approach to show the strong consistency of the MLEs for finite mixture models with censored datawhen the number of components assumed is larger than or equal to the true one.
Section 3 shows the strong consistency of the MLEs and the fitted distributions in finite mixture models with fixedcensoring regions. Section 4 provides parameter spaces under which the consistency results hold in a mixture of censoredexponential distributions, a mixture of censored normal distributions, and a random censorship mixture model.
2. Definitions and assumptions
Let L1 and B+ denote the following spaces of integrable functions on R.
L1 ¼ f ðxÞjf ðxÞ is measurable, Jf J¼
ZRjf ðxÞjdmo1
� �, ð1Þ
ll rights reserved.
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 57
Bþ ¼ ff ðxÞjf 2 L1,Jf J¼ 1,f ðxÞZ0g: ð2Þ
Let O1, O2 be two closed sets in Rk. We denote a metric between the two sets as follows:
disðO1,O2Þ ¼ disðO2,O1Þ ¼ infy2O2
infx2O1
jx�yj, ð3Þ
where j � j is the Euclidean norm. If O1 and O2 are singleton sets, then this metric agrees with the classic Euclidean distance.The following is used later.
Property 1. (i) disðO1,O2Þ ¼ 0 if and only if there are sequences of points, fxng 2 O1 and fyng 2 O2 such that jxn�ynj-0 as
n-1.
(ii) disðxn,OÞ-0 if and only if there is a sequence fyng of points in O, such that jxn�ynj-0 as n-1.
Let F0 ¼ ff0ðxjhÞjh 2Hg be a parametric family of one dimensional probability density functions (pdfs) with respect to a
s�finite measure m0 on RDR from which mixtures are to be formed. Let X0 be a random variable taking values in R withthe pdf f 0ðxjp,hÞ ¼
Pgj ¼ 1 pjf
0ðxjhjÞ, where f 0ðxjhjÞ 2 F0, H is a closed set belonging to Rd,
p 2P� ðp1,p2, . . . ,pgÞ
�����pjZ0,Xg
j ¼ 1
pj ¼ 1g and
8<: ð4Þ
h 2 Hg� fðh1,h2, . . . ,hgÞjhj 2 H,ðj¼ 1,2, . . . ,gÞg, ð5Þ
(i.e., Hg¼H� � � � �H is the Cartesian product of g copies of H). Let G¼P�Hg be a parameter space. Then G is a closed
set.Given a partition R0,R1,y,Rq of R, we observe a random variable X ¼ X01½X02R0 �
þPq
k ¼ 1 ck1½X02Rk �taking
values in X ¼ R0 [ fc1, . . . ,cqg, where 1½�� is an indicator function, and ck is just a code for the event fX0 2 Rkg. Let m bethe measure on X which coincides with m0 on R0 and whose restriction of fc1, . . . ,cqg is the counting measure onthis set.
Then the pdf of X with respect to m is given by
f ðxjp,hÞ ¼Xg
j ¼ 1
pjf ðxjhjÞ, ð6Þ
where FkðhjÞ ¼R
Rkf 0ðtjhjÞdt, and
f ðxjhjÞ ¼ f 0ðxjhjÞ1½x2R0 � þXq
k ¼ 1
FkðhjÞ1½x ¼ ck �: ð7Þ
We say that Eq. (6) is a finite mixture model with fixed censoring region. For any given ðp0,h0Þ 2 G such that
f ðxjp0,h0Þ 2 Bþ , we define the set
Gðp0,h0Þ ¼ fðp,hÞjðp,hÞ 2 G, and f ðxjp,hÞ ¼ f ðxjp0,h0
Þg: ð8Þ
As well known in the ordinary mixture models, Gðp0,h0Þ is not a singleton set, and hence the MLE is not consistent in the
sense of converging to a unique point. Therefore we shall use the distance (3) between the MLE and Gðp0,h0Þ to discuss the
strong consistency.In this paper, we allow the true parameter ðp0,h0
Þ to be a boundary point of G. In this case, Gðp0,h0Þ becomes a
continuum of parameter values (e.g., see Cheng and Liu, 2001; McLachlan and Peel, 2000, p. 28). Then the true model takesthe reduced form f ðxjp0,h0
Þ ¼Pg0
j ¼ 1 p0ljf ðxjh0
ljÞ with 1rg0rg�1, and h0
lj2 H. This means that the true number of
components, g0, is unknown, but the maximum number of g0 is known.
2.1. Assumptions
We write expectations of g(x) under f ðxjh0i Þ by E
h0i½gðXÞ� ¼
RR0
gðxÞf ðxjh0i Þdxþ
Pqk ¼ 1 gðckÞfiðckjh
0i Þ ¼
RXgðxÞf ðxjh0
i Þdm. Herewe shall give sufficient conditions (a)–(g) under which the main results hold in the model (6).
(a)
For any hj 2 H, f ðxjhjÞ 2 Bþ ðj¼ 1, . . . ,gÞ. Furthermore, f ðxjh1j Þ ¼ f ðxjh2j Þ in Bþ only if h1j ¼ h2
j .
(b) The support of f ðxjhjÞ does not depend on the parameter hj 2 H. (c) Let i=1,y,g and j=1,y,g.� Eh0
i½logf ðXjhjÞ�4�1 for any hj 2 H,
� Eh0
i½logmaxff ðXjhjÞ,1g�o1 for any hj 2 H,
� Eh0
i½logsupjh0j�hj jrrmaxf1,f ðXjh0jÞg�o1 for small r40, and any hj 2 H,
� E 0 ½logsupjh jZ rmaxf1,f ðXjhjÞg�o1 for large r40.
hi j(d)
For any fixed x 2 X , f ðxjhjÞ are continuous with respect to hj 2 H. (e) limjhj j-1f ðxjhjÞ ¼ 0 for any fixed x 2 R0.Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6458
(f)
For any infinite cluster point h�j in H,Pqk ¼ 1 limi-1suphj2Uiðh
�j Þ
FkðhjÞr1, where fUiðh�j Þg, (i=1,2,y) is a sequence of
decreasing neighbourhoods of the point h�j such that \iZ1Uiðh
�j Þ ¼ h
�j .
� � 0
(g)
Suppose that g0 is any number with 1rg0rg�1, ðh1 , . . . ,hg0 Þ is any point in Hg , and ðh01, . . . ,h0g Þ is any point in Hg .
Xg0
j ¼ 1
ajf ðxjh�j Þ ¼
Xg
j ¼ 1
bjf ðxjh0j Þ, a:e: P0 on R0 ð9Þ
impliesPg0
j ¼ 1 aj ¼ 0 orPg0
j ¼ 1 aj ¼Pg
j ¼ 1 bj, where P0 is the probability measure corresponding to f ðxjp0,h0Þ.
Conditions (a) and (b) correspond to Cheng and Liu’s assumption (a). Condition (c) corresponds to Cheng and Liu’sassumption (b). Conditions (d) and (e) correspond to Cheng and Liu’s assumption (c). Condition (d) means that for any fixedx 2 R0, f 0ðxjhjÞ are continuous with respect to hj, and FkðhjÞ (k=1,y,q, j=1,y,g) are continuous with respect to hj. Condition(f) holds automatically if the censoring region is one i.e. q=1. As described in Section 4, condition (g) is verified by the sametechniques that Teicher (1963), and/or Yakowitz and Spragins (1968) used to check whether the class of finite mixtures isidentifiable. Note that the identifiability of finite mixture models is defined slightly different from that of the ordinaryparametric models, e.g., see McLachlan and Peel (2000, p. 27).
Remark 1. Cheng and Liu’s approach cannot be adopted directly to show the strong consistency. For example, we considera mixture of normal distributions under Type I censoring,
f ðxjp,hÞ ¼Xg
j ¼ 1
pj nðxjmj,s2j Þ1½xo c� þ
Z 1c
nðtjmj,s2j Þdt1½x ¼ c�
� �, ð10Þ
where c is a known constant, and nðxjmj,s2j Þ is a Gaussian density with mean mj and variance s2
j . Let Fð�Þ be the standardnormal distribution function. Then we can see from
R1c nðtjmj,s2
j Þ dt¼ 1�Fððc�mjÞ=sjÞ that each of the components,f ðxjhjÞ ¼ nðxjmj,s2
j Þ1½xo c� þR1
c nðtjmj,s2j Þdt1½x ¼ c�, does not tend to 0 when x=c and sj-1. Thus model (10) does not fulfill
assumption (c) of Cheng and Liu (2001).
Remark 2. The effect of assumption (c) in Cheng and Liu are to make the density h uniquely defined at any infinite clusterpoint of a parameter space and
R1�1
hr1, and make the set Gða0,y0Þ a close set. Our new conditions (e) and (f) also play the
same role. Although model (6) cannot be uniquely defined at some infinite cluster points of G, as shown in Lemma 7, itfollows from conditions (e) and (f) that the limit superior of f is uniquely defined at such a point, and bounded above by afunction f� with
R1�1
f�r1. This fact is the key to the proof of Theorem 1.
3. Main results
We denote the MLEs by ðpn,hnÞ, and expectations of g(x) under f ðxjp0,h0
Þ by E0½gðXÞ� ¼R
R0gðxÞf ðxjp0,h0
ÞdxþPq
k ¼ 1 gðckÞ
f ðckjp0,h0Þ ¼RXgðxÞf ðxjp0,h0
Þdm where m is the measure given in Section 2. As a note, for any given n, the MLE ðpn,hnÞ is not
necessarily unique, so that it can actually be anyone of a set of possible choices. The point however, as the followingtheorem shows, is that anyone of these values is allowed.
Theorem 1. Let X=(X1,X2,y,Xn) be iid observations with the probability distribution f ðxjp0,h0Þ and the parameter space G. Then
under conditions (a)–(g),
P limn-1
disfðpn,hnÞ,Gðp0,h0
Þg ¼ 0Þ ¼ 1:�
ð11Þ
Clearly (11) implies disfGðpn,hnÞ,Gðp0,h0
Þg-0, w.p.1 because fðpn,hnÞgDGðpn,hn
Þ.
Proof. Without loss of generality, we can assume that G is compact because we extend G to include all its infinite clusterpoints that it is a compact set under a certain metric. See, for example, Kiefer and Wolfowitz (1956), and/or Hathaway(1985). Each set in G then consists of finite points and these infinite clusters.
We adopt the approach used in Wald (1949). Letting E40 be an arbitrary strictly positive value, we show that for any
closed subset S of G such that disfS,Gðp0,h0ÞgZE,
P limn-1
supðp,hÞ2S
Qni ¼ 1 f ðXijp,hÞQn
i ¼ 1 f ðXijp0,h0Þ¼ 0
!¼ 1: ð12Þ
To show (12), we only need to confirm that for each finite or infinite cluster point ðp� ,h�Þ in S, there is always a
neighbourhood Uðp� ,h�Þ of the point such that
E0 log supðp,hÞ2Uðp� ,h
�Þ
f ðXjp,hÞ
" #oE0½logf ðXjp0,h0
Þ�: ð13Þ
Note that supðp,hÞ2Uðp� ,h
�Þf ðxjp,hÞ is measurable by condition (d).
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 59
For any finite point ðp� ,h�Þ, Eq. (13) follows from Lemma 6 and the same argument as in the proof of Theorem 1 of Cheng
and Liu (2001, p. 607). For any infinite cluster point ðp� ,h�Þ, Eq. (13) also holds from Lemma 7.
Finally we shall prove that (13) implies (12). By using the Heine–Borel finite open cover theorem and the same technique
as used in the proof of Theorem 1 in Wald (1949), (12) follows. Therefore we can show that disfðpn,hnÞ,Gðp0,h0
Þg-0 w.p.1
by arguing as in the proof of Theorem 2 of Wald (1949) with jy�y0j replaced by disfðp,hÞ,Gðp0,h0Þg. &
Among conditions (a)–(g), (c) is rather tedious to verify. The following gives a simpler condition that is sufficient forcondition (c) to hold.
Corollary 1. Let f ðxjhjÞ satisfy conditions (a), (b), (d)–(g) and Eh0
i½logf ðXjhjÞ�4�1. Suppose that there exists a function S(x)
such that
�jf ðxjhjÞjrSðxÞ for any hj 2 H, ð14Þ
�Eh0
i½SðXÞ�o1: ð15Þ
Then the conclusion of Theorem 1 holds.
Proof. We have only to prove that conditions (14) and (15) imply condition (c). This can be proved by using the inequalitylogmaxf1,xgrx for xZ0. &
The following is useful in verifying condition (g).
Corollary 2. We remind the family F 0 ¼ ff0ðxjhÞjh 2 Hg stated in Section 2. f 0ðxjhÞ1½x2R0� has transforms gðtjhÞ defined for t
belonging to some domain of definition, SgðhÞ. It is assumed that the mappingM : f 0ðxjhÞ1½x2R0 �-gðtjhÞ is linear and one-to-one.Suppose that there is a total ordering $ such that h1!h2 implies:
(i)
Sgðh1ÞDSgðh2Þ, (ii) There is some t1 2 Sgðh1Þc (the complement of Sgðh1Þ) such that
limt-t1
gðtjh2Þ
gðtjh1Þ¼ 0:
Then for h1!h2! � � �!hJ in H, a1 ¼ a2 ¼ � � � ¼ aJ ¼ 0 if and only if
XJ
j ¼ 1
ajf0ðxjhjÞ ¼ 0 for any fixed x 2 R0: ð16Þ
Conditions (i) and (ii) are equivalent to those of Teicher (1963), and usually verified via integral transforms such as theFourier or Laplace transform. However it is often more convenient to argue in terms of densities. Hence, in Section 4.2, wewill use the identity transform M : f 0ðxjhÞ1½x2R0�-f 0ðxjhÞ1½x2R0 � to verify condition (g).
Proof. This is proved by arguing as in the proof of Theorem 2 of Teicher (1963). It follows from the transformed version of(16),
PJj ¼ 1 ajgðtjhjÞ ¼ 0, that
a1þXJ
j ¼ 2
aj
gðtjhjÞ
gðtjh1Þ¼ 0 for t 2 Sgðh1Þ \ ftjgðtjh1Þa0g:
Letting t-t1, we have a1=0 by (ii). Thus, repeating the same argument forPJ
j ¼ 2 ajgðtjhjÞ ¼ 0 yields a2 ¼ � � � ¼ aJ ¼ 0. &
Theorem 1 shows that when the true model is an indeterminate case then the MLE converges in the distance (3) towardsthe indeterminate set of points Gðp0,h0
Þ defining the model. This does not in itself guarantee that the fitted modelconverges to the true model, which is often used in practice. The following theorem guarantees that the fitted model tendsto the true model as n-1. If Gðp0,h0
Þ is a singleton set, the limit point limn-1ðpn,hnÞ ¼ ðp0,h0
Þ is finite. In contrast, ifGðp0,h0
Þ is a continuum of parameter values, as pointed out in Cheng and Liu (2001), the limit point might be an infinitecluster point. Thus, to show the convergence of the fitted model, we shall take account of the case that the MLE tends to aninfinite cluster point as n-1.
Theorem 2. Assume that conditions (d) and (e) hold, and
(A1) For any sequence fhnj g and fh0n
j g such that limn-1jhnj j ¼1 and limn-1jh
nj �h0n
j j ¼ 0, it holds that for j=1,y,g and
k=1,y,q,
limn-1jFkðhn
j Þ�Fkðh0nj Þj ¼ 0: ð17Þ
If disfðpn,hnÞ,Gðp0,h0
Þg-0 w.p.1, then for any fixed x 2 X ,
limn-1
f ðxjpn,hnÞ ¼ f ðxjp0,h0
Þ w:p:1: ð18Þ
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6460
Proof. When disfðpn,hnÞ,Gðp0,h0
Þg-0 as n-1, from Property 1, there is a sequence ðp0n,h0nÞ in Gðp0,h0
Þ such that0n n 0n n n n 0 0 n n 0n 0n
jp �p jþjh �h j-0 w.p.1. For any x 2 R0, f ðxjp ,h Þ�f ðxjp ,h Þ ¼ f ðxjp ,h Þ�f ðxjp ,h Þ-0 w.p.1 from conditions (d)and (e). Note that even if limn-1jhnj j ¼1, the above holds from condition (e). Thus combining this with Eq. (17) completes
the proof. &
Note that Fkðhnj Þ might not have the limit as jhn
j j-1. For example, we consider one piece of the probability of X=c inmodel (10), F1ðmn
1,sn1Þ ¼ 1�Fððc�mn
1Þ=sn1Þ. Then limn-1F1ðmn
1,sn1Þ does not exist if ðmn
1,sn1Þ-ð1,1Þ. However condition (A1)
is mild, and holds if FkðhjÞ (j=1,y,g, k=1,y,q) satisfy the Lipschitz condition, i.e., there exists a constant L1Z0 such thatjFkðhj1Þ�Fkðhj2ÞjrL1jhj1�hj2j for all hj1, hj2 2 H. If condition (A1) fails, f ðxjpn,hn
Þ does not converge to f ðxjp0,h0Þ at some
censored point x=ck. However, in such a case, Eq. (18) still holds only for any x 2 R0.
4. Examples
In this section, we will give parameter spaces for mixtures of censored distributions under which the strong consistencyholds.
4.1. A mixture of censored exponential distributions
We consider a mixture of Weibull exponential distributions under Type I right-censoring,
f ðxjp,hÞ ¼Xg
j ¼ 1
pjfayjxa�1expð�yjx
aÞ1½0oxo c� þexpð�yjcaÞ1½x ¼ c�g, ð19Þ
where c40 and a40 are known constants, yj ¼ expðtjÞ is regarded as a function of tj, and G¼ fðp1, . . . ,pg ,t1, . . . ,tgÞjPg
j ¼ 1
pj ¼ 1,pjZ0,�1otjo1g is a parameter space. This model appears in failure analysis where failure often occurs for morethan one reason (e.g, see Mendenhall and Hader, 1958). For simplicity of exposition, we treat only the case of g=2 anda¼ 1, but the result holds for an arbitrary number g and constant a. Consequently, we will verify condition (g) for model(19) with g=2 and a¼ 1, i.e., a two-component mixture of censored exponential distributions. Then Eq. (9) in condition (g)becomes
a1y�1 e�y
�1 x ¼ b1y
01e�y
01xþb2y
02e�y
02x for 0oxoc, ð20Þ
where 0oy�1 o1, 0oy01o1, and 0oy0
2o1. If y�1 ay01, y0
1ay02, and y0
2ay�1 , we substitute x¼ E1,2E1,3E1 into (20) with asmall number E140. Without loss of generality we can assume that c43 and E1 ¼ 1. Then we have
�y�1 e�y�1 y0
1e�y01 y0
2e�y02
�y�1 e�2y�1 y01e�2y0
1 y02e�2y0
2
�y�1 e�3y�1 y01e�3y0
1 y02e�3y0
2
0BBB@
1CCCA
a1
b1
b2
0B@
1CA¼
0
0
0
0B@
1CA: ð21Þ
By a property of the Vandermonde matrix, the determinant of matrix in the left-hand side in (21), denoted by det(A),becomes
detðAÞ ¼ �y�1 y01y
02expð�y�1 �y
01�y
02Þdet
1 1 1
e�y�1 e�y
01 e�y
02
e�2y�1 e�2y01 e�2y0
2
0B@
1CAa0:
Therefore we have a1=b1=b2=0.If y�1 ¼ y0
1ay02, Eq. (9) in condition (g) becomes
0¼ ðb1�a1Þy01e�y
01xþb2y
02e�y
02x for 0oxoc: ð22Þ
Substituting x=1,2 into (22), and arguing as in the above leads to b1�a1=0, b2=0. Therefore a1=b1+b2. For the other cases,we have a1=0 or a1=b1+b2 by the same argument. Hence model (19) satisfies condition (g). Because we can easily verifythe other conditions (a)–(f), Theorem 1 holds. Note that G equals the parameter space of the model that does not adjust forcensoring i.e., c¼1.
4.2. A mixture of censored normal distributions
We consider model (10) with the parameter space
G¼ ðp1, . . . ,pg ,m1, . . . ,mg ,s1, . . . ,sgÞXg
j ¼ 1
pj ¼ 1,pjZ0,mj 2 R,sj 2 RZðj¼ 1, . . . ,gÞ
������8<:
9=;, ð23Þ
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 61
where RZ ¼ fxj0oZrxo1g. It is not unreasonable to assume that sj (j=1,y,g) have positive lower bounds as the cases ofzero variances are degenerate. Without these lower bounds, the likelihood will be unbounded if mj is set equal to anyobserved value and sj tends to zero, and hence this violates condition (c). For simplicity of exposition, we treat only thecase of g=2, but the result can be extended to an arbitrary number g. First, we verify condition (g). Suppose that
a1nðxjm�1 ,s�21 Þ ¼ b1nðxjm0
1,ðs01Þ
2Þþb2nðxjm0
2,ðs02Þ
2Þ for xoc, ð24Þ
where ðm�1 ,s�1 Þ 2 R�RZ, ðm01,s0
1Þ 2 R�RZ, and ðm02,s0
2Þ 2 R�RZ.A total ordering, which is called a ‘‘lexicographical’’ ordering, is defined by
ðm1,s1Þ!ðm2,s2Þ if s2os1 or if s1 ¼ s2 and m1om2:
Then ðm1,s1Þ!ðm2,s2Þ implies
limx-�1
nðxjm2,s22Þ1½xo c�
nðxjm1,s21Þ1½xo c�
¼ 0, ð25Þ
which satisfies the conditions of Corollary 2, and condition (a). Before applying Corollary 2 to Eq. (24), we need to combinethe same functions among nðxjm�1 ,s�2
1 Þ, nðxjm01,ðs0
1Þ2Þ, and nðxjm0
2,ðs02Þ
2Þ. Let y�1 ¼ ðm
�1 ,s�1 Þ, y
01 ¼ ðm0
1,s01Þ, and y0
2 ¼ ðm02,s0
2Þ. Ify�1 ¼ y0
1ay02, Eq. (24) becomes
ðb1�a1Þnðxjm01,ðs0
1Þ2Þþb2nðxjm0
2,ðs02Þ
2Þ ¼ 0 for xoc: ð26Þ
By Corollary 2, b1�a1=0, and b2=0, and hence a1=b1+b2.Furthermore, if y�1 ay0
1, y01ay0
2, and y02ay�1 , then Eq. (24) becomes
�a1nðxjm�1 ,s�21 Þþb1nðxjm0
1,ðs01Þ
2Þþb2nðxjm0
2,ðs02Þ
2Þ ¼ 0 for xoc: ð27Þ
Therefore a1=0, b1=0, and b2=0. For the other cases, we have a1=0 or a1=b1+b2 by the same argument. The otherconditions (b)–(f) can be verified from Corollary 1, and hence Theorem 1 holds.
Subsequently, we verify that the mixture model satisfies condition (A1) under the parameter space (23). We considerthe sequences fðmn
j ,snj Þg and fðm0n
j ,s0nj Þg in Theorem 2, i.e., limn-1jðmn
j ,snj Þj ¼1, and limn-1jðmn
j ,snj Þ�ðm
0nj ,s0n
j Þj ¼ 0. Writingmn
j ¼ m0nj þoð1Þ and sn
j ¼ s0nj þoð1Þ, we have
c�mnj
snj
�c�m0n
j
s0nj
¼�oð1Þ
s0nj þoð1Þ
1þc�m0n
j
s0nj
!: ð28Þ
If limn-1jðc�m0nj Þ=s
0nj jo1, it follows from Eq. (28) that
limn-1
Z 1c
nðtjmnj ,sn
j Þdt�
Z 1c
nðtjm0nj ,s0n
j Þdt
��������¼ 0: ð29Þ
On the other hand, because limn-1ðc�mnj Þ=s
nj ¼71 when limn-1ðc�m0n
j Þ=s0nj ¼ 71, (29) holds. Therefore Theorem 2 holds.
4.3. A random censorship mixture model
This section applies the result of Theorem 1 to a random censorship mixture model. We assume that the observationsconsist of n pairs ðX1,d1Þ, . . . ,ðXn,dnÞwhere Xi=min(X0
i ,Yi) is either an observed random variable X0i or an observed censoring
variable Yi independent of X0i , and di ¼ 1½X0
i4Yi �
. Furthermore, we assume that X01,y,X0
n are iid with densityf 0ðxjp,hÞ ¼
Pgj ¼ 1 pjf
0ðxjhjÞ, and Y1,y,Yn are iid with an arbitrary unknown density q(y). As in the usual random censorshipmodels, the density of ðXi,diÞ is given by
f ðx,djp,hÞ ¼ f 0ðxjp,hÞQ ðxÞ1½d ¼ 0� þqðxÞF ðxjp,hÞ1½d ¼ 1� ¼Xg
j ¼ 1
pjf ðx,djhjÞ, ð30Þ
where Q ðxÞ ¼R1
x qðtÞdt, F ðxjp,hÞ ¼R1
x f 0ðtjp,hÞdt, F ðxjhjÞ ¼R1
x f 0ðtjhjÞdt, and f ðx,djhjÞ ¼ f 0ðxjhjÞQ ðxÞ1½d ¼ 0� þqðxÞ
F ðxjhjÞ1½d ¼ 1�.Then we obtain the following result by the same argument as in the proof of Theorem 1 because (30) also takes the form
of a mixture model.Let (a)
0
–(d)0
be conditions (a)–(d) with f ðxjhjÞ replaced by f ðx,djhjÞ, and let (e)0
be condition (e) with ‘‘f ðxjhjÞ for anyx 2 R0’’ replaced by f ðx,0jhjÞ. Let (g)
0
be condition (g) with Eq. (9) replaced by
Xg0
j ¼ 1
ajf0ðxjh
�j ÞQ ðxÞ ¼
Xg
j ¼ 1
bjf0ðxjh0
j ÞQ ðxÞ, a:e:P0: ð31Þ
Corollary 3. Let ðX1,d1Þ, . . . ,ðXn,dnÞ be iid observations with the probability distribution (30) satisfying conditions (a)0
–(e)0
, and
(g)0
. Then the MLE ðpn,hnÞ is strongly consistent in the sense of (11).
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6462
This result can be also extended to the model in which Yi has a density with unknown parameters, but we do not discusshere to save space because the representation is rather tedious.
Proof. The one corresponding to condition (f) is given by (f)0
.
For any infinite cluster point h�j in H, Eq½limi-1suphj2Uiðh
�j Þ
F ðXjhjÞ�r1, where Eq½�� ¼RX�qðxÞdx.
Condition (f)0
obviously holds because F ðxjhjÞr1 for any x and hj 2 H. In addition, conditions (a)0
–(e)0
are essentially the
same as (a)–(e), and hence the results corresponding to Lemmas 3–6 follows. Therefore the result corresponding to Lemma
7 follows, which completes the proof. &
For example, if the density of X0 in (30) is f 0ðxjp,hÞ ¼Pg
j ¼ 1 pjnðxjmj,s2j Þ, the probability distribution of ðX,dÞ is given by
f ðx,djp,hÞ ¼Xg
j ¼ 1
pj nðxjmj,s2j ÞQ ðxÞ1½d ¼ 0� þqðxÞF
x�mj
sj
� �1½d ¼ 1�
� �,
where Fð�Þ ¼ 1�Fð�Þ. For simplicity of exposition, suppose that the density q(x) satisfies Eh0
i½logf ðX,djhjÞ�4�1 and
Eq½logmaxfqðXÞ,1g�o1. Then, by Corollaries 1 and 2, we can verify conditions (a)0
–(g)0
under the parameter space (23), andhence Corollary 3 holds.
5. Concluding remarks
We have extended the consistency proof of Cheng and Liu (2001) to mixture models with censored data. Therefore thestrong consistency still holds even if there are more components in the estimated model than the true model, and alsoholds in noncensored cases if conditions (a)–(e) are fulfilled. Besides this result can be easily extended to censoredmultivariate mixture models by a small modification of the conditions although we do not discuss here to save space.
Hathaway (1985) has used the device in Section 6 of Kiefer and Wolfowitz (1956) to prove the strong consistency ofMLEs in mixtures of normals under the parameter space (23) with sj 2 RZ replaced by the inequality constraintmini,jðsi=sjÞZc40. This device could be applied to the censored normal mixture models (10), but rather tediousevaluation will be needed.
Acknowledgements
The author is grateful to the associate editor and anonymous referees for helpful suggestions that led to improvement ofthe paper. This research was partially supported by a grant from Mathematics Education Society of Waseda University, agrant from Takasaki City University of Economics, and Grant-in-Aid for Scientific Research (A) 19204009.
Appendix A. Some lemmas
This section gives some lemmas for the proof in Section 3. Let fUiðp� ,h�Þg (i=1,2,y) be a sequence of decreasing
neighbourhoods of the point ðp� ,h�Þ such that
TiZ1Uiðp
� ,h�Þ ¼ ðp� ,h
�Þ. Recall the set S stated in the proof of Theorem 1.
Lemma 3. Under conditions (a)–(c), the following results hold.
�
E0½logf ðXjp,hÞ�4�1 for any ðp,hÞ 2 G, � E0½logmaxf1,f ðXjp,hÞg�o1 for any ðp,hÞ 2 G, � E0½logsupjðp0 ,h0 Þ�ðp,hÞjrrmaxf1,f ðXjp0,h0Þg�o1 for small r40, and any ðp,hÞ 2 G. � E0½logsupjðp,hÞjZ rmaxf1,f ðXjp,hÞg�o1 for large r40.These results imply E0½jlogf ðXjp0,h0Þj�o1, and E0½logsup
ðp,hÞ2Uiðp� ,h�Þf ðXjp,hÞ�o1 for any finite or infinite cluster point
ðp� ,h�Þ in G.
Proof. The desired results follow from the following inequalities:
logXg
j ¼ 1
pjf ðxjhjÞZXg
j ¼ 1
pjlogf ðxjhjÞ ðJensen’s inequalityÞ, ðA:1Þ
logXg
j ¼ 1
pjf ðxjhjÞrXg
j ¼ 1
logf ðxjhjÞ for f ðxjhjÞZ1 ðj¼ 1, . . . ,gÞ: & ðA:2Þ
Lemma 4. Let C ¼ ff 2 L1j Jf Jo1,f Z0g and let Eg ½hðXÞ� ¼RXhðxÞgðxÞdm. Then for any f 2 C and g 2 Bþ , Eg ½logff ðXÞ=gðXÞg�o0.
Proof. The inequality follows from Jensen’s inequality. &
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–64 63
Lemma 5. Suppose that ðp� ,h�Þ is an infinite cluster point satisfying either lim
ðp,hÞ-ðp� ,h�Þf ðxjp,hÞ ¼ 0 for any x 2 R0 or
limi-1supðp,hÞ2Uiðp
� ,h�Þf ðckjp,hÞ ¼ 0 for some ck. Then under conditions (a)–(c), it follows that
limi-1
E0 log supðp,hÞ2Uiðp
� ,h�Þ
f ðXjp,hÞ
" #¼�1: ðA:3Þ
Proof. Eq. (A.3) follows from Lemma 3 and the same argument as in Lemma 3 of Wald (1949). &
Lemma 6. Let ðp� ,h�Þ be a finite or infinite cluster point in S. Then under conditions (a)–(c),
limi-1
E0 log supðp,hÞ2Uiðp
� ,h�Þ
f ðXjp,hÞ
" #rE0½logf ðXjp� ,h
�Þ�, ðA:4Þ
where f ðxjp� ,h�Þ ¼ limi-1sup
ðp,hÞ2Uiðp� ,h�Þf ðxjp,hÞ.
Proof. By arguing as in the proof of Theorem 1 of Cheng and Liu (2001), and using Fatou’s lemma and Lemma 3, the resultis proved. The detailed proof is available from the author. &
Lemma 7. If conditions (a)–(g) hold, for each infinite cluster point ðp� ,h�Þ 2 S, there exists a neighbourhood Uðp� ,h
�Þ satisfying
Eq. (13).
Proof. We show that (13) is true when Uiðp� ,h�Þ degenerate into the single point ðp� ,h
�Þ. Let h
�j be any infinite cluster
point, and let F� kðh�j Þ ¼ limi-1suphj2Uiðh
�j Þ
FkðhjÞ. Because limhj-h�j
f ðxjhjÞ ¼ 0 for any x 2 R0 from condition (e), f ðxjp� ,h�Þ is
bounded above by a function having one of the following forms:
(I)
f� ðxjp� ,h�Þ ¼PAj ¼ 1 p
�aj
Pqk ¼ 1 F� kðh
�ajÞ1½x ¼ ck �
, where 0rArg, and p�aj
Pqk ¼ 1 F� kðh
�ajÞ40 (j=1,y,A).
(II)
f� ðxjp� ,h�Þ ¼PBj ¼ 1 p
�bj
f 0ðxjh�bjÞ1½x2R0� þ
Pqk ¼ 1 Fkðh
�bjÞ1½x ¼ ck �
n o, where 1rBrg�1, and p�bj
f 0ðxjh�bjÞ40 (j=1,y,B).
(III)
f� ðxjp� ,h�Þ ¼
XM1
j ¼ 1
p�mjf 0ðxjh
�mjÞ1½x2R0� þ
Xq
k ¼ 1
Fkðh�mjÞ1½x ¼ ck�
( )ðA:5Þ
þXM2
j ¼ M1þ1
p�mj
Xq
k ¼ 1
F� kðh�mjÞ1½x ¼ ck �
, ðA:6Þ
where 1rM1rg�1, M1þ1rM2rg, p� f 0ðxjh�Þ40 ðj¼ 1; . . . ;M1Þ, and p�
Pq F� kðh�Þ40 ðj¼M1þ1; . . . ;M2Þ.
mj mj mj k ¼ 1 mjIn (I), (13) follows from Lemma 5. In (II), it follows from the same argument as in the proof of Theorem 1 of Cheng and Liu
(2001, p. 608) that f� ðxjp� ,h�Þ does not equal f ðxjp0,h0
Þw.p.1 under P0 (henceforth, denoted by f� ðxjp� ,h�Þaf ðxjp0,h0
Þ a.e.
P0). Thus it follows from Lemma 6 and Jensen’s inequality that
limi-1
E0 log supðp,hÞ2Uiðp
� ,h�Þ
f ðXjp,hÞ
" #rE0½logf� ðXjp� ,h
�Þ� ðA:7Þ
oE0½logf ðXjp0,h0Þ�: ðA:8Þ
In (III), (A.7) holds from Lemma 6. To show inequality (A.8), we have only to prove f� ðxjp� ,h�Þaf ðxjp0,h0
Þ a.e. P0. If there
exists j0
ðj0 ¼M1þ1, . . . ,M2Þ such thatPq
k ¼ 1 F� kðh�mj0Þo1, then Jf� Jo1. Hence inequality (A.8) holds from Lemma 4.
Next, whenPq
k ¼ 1 F� kðh�mjÞ ¼ 1 for any j (j=M1+1,y,M2) in (III), we prove by reduction to the absurd that
f� ðxjp� ,h�Þaf ðxjp0,h0
Þ a.e. P0. Assuming that f� ðxjp� ,h�Þ ¼ f ðxjp0,h0
Þ a.e. P0, we have
XM1
j ¼ 1
p�mjf ðxjh
�mjÞ ¼
Xg
j ¼ 1
p0j f ðxjh0
j Þ, a:e: P0 on R0, ðA:9Þ
where 1rM1rg�1. Then it follows from condition (g) thatPM1
j ¼ 1 p�mj¼ 0 or
PM1
j ¼ 1 p�mj¼Pg
j ¼ 1 p0j ¼ 1. This contradicts the
assumption of (III), which completes the proof. &
References
Chauveau, D., 1995. A stochastic EM algorithm for mixtures with censored data. J. Statist. Plann. Inference 46, 1–25.Cheng, R.C.H., Liu, W.B., 2001. The consistency of estimators in finite mixture models. Scand. J. Statist. 28, 603–616.Feng, Z., McCulloch, C.E., 1996. Using bootstrap likelihood ratio in finite mixture models. J. Roy. Statist. Soc. Ser. B 58, 609–617.Hathaway, R.J., 1985. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Statist. 13, 795–800.
Y. Miyata / Journal of Statistical Planning and Inference 141 (2011) 56–6464
Kiefer, J., Wolfowitz, J., 1956. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math.Statist. 27, 888–906.
McLachlan, G., Peel, D., 2000. Finite Mixture Models. Willy, New York.Mendenhall, W., Hader, R.J., 1958. Estimation of parameters of mixed exponentially distributed failure time distributions from censored life test data.
Biometrika 45, 504–520.Redner, A.R., 1981. Note on the consistency of the maximum likelihood estimation for nonidentifiable distribution. Ann. Statist. 9, 225–228.Teicher, H., 1963. Identifiability of finite mixtures. Ann. Math. Statist. 34, 1265–1269.Wald, A., 1949. Note on the consistency of the maximum likelihood estimates. Ann. Math. Statist. 20, 595–601.Yakowitz, S.J., Spragins, J.D., 1968. On the identifiability of finite mixtures. Ann. Math. Statist. 39 (1), 209–214.