testing independent sparse heterogeneous mixtures

ARTICLE IN PRESS

Statistics & Probability Letters 74 (2005) 205–211

0167-7152/$ -

doi:10.1016/j.

$This work�Tel.: +979

E-mail add

www.elsevier.com/locate/stapro

Testing independent sparse heterogeneous mixtures$

Johan Lim�

Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA

Received 12 November 2003; received in revised form 13 December 2004; accepted 1 April 2005

Available online 24 June 2005

Abstract

We study the sequential discernibility between the independent fair coin tossing m0 and the sparseheterogeneous mixtures HMða; gÞ � ð1� �nÞ � Bernoullið1=2Þ þ �n � Bernoulliðð1þ ynÞ=2Þ with �n�n�g andyn�n�a. Extending the result in Lim (2003), we show that HMða; gÞ and m0 are sequentially discernible when0oaþ gp0:5, but are not so when 0:5ogþ ao1. It will be shown that each of the three differentdiscernibility in [Lim, 2003. Testing stochastic processes: stationarity, independence, and ergodicity.Technical Report], (1) discernibility with an entire sample path, (2) uniformly discernibility with an entiresample path, and (3) sequentially discernibility, is equivalent to each other under this setting, andfurthermore, it follows that sequential decision procedures are equivalent to the decisions based on anentire sequence. Finally, we show that the coin tossing with finitely many trials of biased coins is notsequentially discernible from that with infinitely many trials.r 2005 Elsevier B.V. All rights reserved.

MSC: primary 62M07; 62G10, secondary 62F03

Keywords: Discernibility; Heterogeneous mixtures

1. Introduction

Mixture models are useful in describing a wide variety of random phenomena and, because oftheir flexibility in modeling, they have continued to receive increasing attention over the years

see front matter r 2005 Elsevier B.V. All rights reserved.

spl.2005.04.049

is supported in part by NSF grant #DMS-0072331.

8621 576; fax: +979 8453 144.

ress: [email protected].

www.elsevier.com/locate/stapro

ARTICLE IN PRESS

J. Lim / Statistics & Probability Letters 74 (2005) 205–211206

from both practical and theoretical point of view. Fields in which mixture models are successfullyapplied include astronomy, biology, genetics, medicine, engineering, and many other fields inphysical and social sciences. In those disciplines, testing the number of mixture components ortesting homogeneity in mixture distributions are often the main research objective or the first steptoward it.Testing hypothesis in statistics is a procedure for deciding between two exclusive sets of

measures H0 and H1 using observations fX ig1i¼1. For deciding between H0 and H1, at each n, we

construct a testing function f nðfX igni¼1Þ which assigns observations to 0 (H0) or 1 (H1). Then, one

of the most fundamental question is when there exists a sequence of testing functions ff ng1n¼1 that

make finitely many errors with probability 1. In this paper, we are particularly interested in testing‘‘independence sparse heterogeneous departures’’ (ISHM) against to the ‘‘homogeneous’’counterpart. To be specific, the problem is on testing

HMða; gÞ � ð1� �nÞ � Bernoulli1

2

� �þ �n � Bernoulli

1þ yn

2

� �(1.1)

against the fair coin tossing, when �n � n�g and yn � n�a for g and a 2 ½0; 1�. un � n�g implies0olim infn!1 unngplim supn!1 unngo1. Here the sparsity implies that the average proportionof the biased coin in the first n draws is n�g; it thus follows that a finite dimensional empiricalmeasure does not deviate from that of the fair coin tossing when n ! 1. Also, it should bepointed out that, though our discussion in this paper is limited to model (1.1) of binaryobservations, the results of this paper can be extended to more general settings without difficultyas in Section 3.2 in Lim (2003).Testing ISHM is closely related to the phase transition in random coin tossing. The phase

transition in random coin tossing studies the minimal amount of bias that is detectable against afair coin tossing using infinite sequence of observations fX ng

1n¼1. One interesting result on the issue

is shown by Kakutani (1948) which states that mb and m0 are mutually singular if and only ifP1

n¼1b2n ¼ 1; mb is the distribution of independent coin tosses where b ¼ ðb1; b2; . . .Þ and bn is the

bias of the nth coin and m0 is the distribution of independent fair coin tosses. Here the mutualsingularity implies that the bias in mb is detectable from the fair coin tossing when the entiresequence of observations fX ng

1n¼1 is given. However, in statistical hypothesis testing point of view,

it is more interesting to assume the sequential observability of fX ng1n¼1. In the remainder of the

paper, partially motivated by the results of Kakutani (1948), we answer two questions on testingthe ISHM. First, we show that the sequential decisions are different from those based on an entiresample path. Second, we provide a necessary and sufficient condition for the minimal class ofheterogeneity that is sequentially discernible from the homogeneous counterpart.Before we proceed, it should be stated that this paper uses the following three different modes

of discernibility in Lim (2003).

Definition 1.1. Two classes of probability measures H0 and H1 on ðB1;R1Þ are Discernible with

an Entire Sample path (DES), if, for each P 2 H0 and Q 2 H1; there exists a measurable functionf P;Q : R1 ! f0; 1g such that f P;QðX

11 Þ ¼ 1; Q-a.s. and f P;QðX

11 Þ ¼ 0; P-a.s. When the discerning

function f P;Q does not depend on ðP;QÞ, H0 and H1 are called Uniformly Discernible with anEntire Sample path (UDES).

ARTICLE IN PRESS

J. Lim / Statistics & Probability Letters 74 (2005) 205–211 207

Definition 1.2. Two classes of probability measures H0 and H1 are Sequentially Discernible (SD),if there exists a sequence of measurable functions f n : Rn ! f0; 1g such that limn!1 f nðx1;x2; . . . ;xnÞ ¼ 1 Q-a.s. and limn!1 f nðx1;x2; . . . ; xnÞ ¼ 0 P-a.s., for every P 2 H0 and Q 2 H1.

Under IID setting, UDES is equivalent to DES and they are trivial in the sense that any disjointfamilies of probability measures are UDES (DES) because infinite number of samples provideexact knowledge on the true distribution. On the contrary, several different notions of SD havebeen proposed in previous literature. Hoeffding and Wolfowitz (1958) proposed five differentclasses of tests relying on the sample size function N and the decision function fðX 1; . . . ;X nÞ.Here, N is a stopping time with respect to fsðX 1; . . . ;X nÞg

1n¼1; and f 2 sðX 1; . . . ;X nÞ is the testing

function in f0; 1g. Among their five classes, the class of tests PðNo1Þ ¼ 1 has been widely used inprevious literature and known as SD (Fisher and Van Ness, 1969; Cover, 1973; Dembo and Peres,1994; Kulkarni and Zeitouni, 1996; Lim, 2003), whereas recently Nobel (2003) considered the testwith continuous functions f n with a range of ½0; 1�.Suppose fDng

1n¼1 is an unobserved binary sequence and an independent coin with bias yn40 is

tossed when Dn ¼ 1; a fair coin is tossed otherwise; the coin with bias y implies that PðX ¼ 1Þ ¼ð1þ yÞ=2 and PðX ¼ 0Þ ¼ ð1� yÞ=2. Let mðY; lÞ withY ¼ ðy1; . . . ; yn; . . .Þ and l ¼ ðt1; t2; . . . ; tn; . . .Þbe the probability measure for the record of independent coin tosses, where Dn ¼ 1 if n 2 ft1; t2; . . .g,and 0 otherwise. We say that l has 0-density when limn!1 jl \ f1; 2; . . . ; ngj=n ¼ 0, where jAj is thenumber of elements in A. For every l, in particular, mð0; lÞ is the same measure of fair coin tossing,denoted by m0. Finally, let INFðyÞ be the collection of mðY; lÞ with limn!1 jl \ f1; 2; . . . ; ngj=n ¼ 0and jlj ¼ 1 and FINðyÞ be the collection of mðY; lÞ and jljo1 with yn (in Y) is y for every n.In time series setting, Lim (2003) shows that sequential procedures are not equivalent to those

with an entire sample path; Lim provides an example that is UDES but not SD by using theergodic process constructed by cut-and-stack procedure (Shields, 1991). In this paper, we showthat SD ¼ UDES ¼ DES in the above IID setting; hence the sequential procedures are equivalentto the procedures based on an entire sequence. In addition, Lemma 2.3 provides an example tonon-equivalence under general settings, which is simpler than that presented in Lim (2003).In the next section, we prove the following results. Firstly, we show that HMða; gÞ and m0 are

not DES when 0:5oaþ go1 by using the Kakutani’s Dichotomy (Theorem 2.A). Secondly,HMða; gÞ and m0 are shown to be SD when 0oaþ go0:5 (Theorem 2.1). Theorem 2.2 proves thatHMða; gÞ and m0 are SD when aþ g ¼ 0:5. In addition, it can be shown that the boundary for theUDES and DES is aþ g ¼ 0:5; thus, SD, UDES, and DES are equivalent under this setting.Finally, we prove that INFðyÞ and FINðyÞ are not SD for every y 2 ð0; 1� (Theorem 2.4).

2. Main results

2.1. aþ b ¼ 0:5 is the boundary for SD, UDES, and DES

First, Theorem 2.A shows that HMða; gÞ is not DES from m0, when aþ g 2 ð0:5; 1Þ. Second,Theorem 2.1 shows that they are SD when aþ g 2 ð0; 0:5Þ. Finally, Theorem 2.2 shows that, theyare SD when aþ g ¼ 0:5.

Theorem 2.A. HMða; gÞ and m0 are not DES when aþ g 2 ð0:5; 1Þ.

ARTICLE IN PRESS


Proof. Let fX ng1n¼1 be the observation from HMða; gÞ. Then it has the same law with the

independent random coin tossing Y n, where Y n�Bernoulliðð1þ �nynÞ=2Þ. The bias of the nth coin(denoted by bn) has the order of n�ðaþgÞ and

P1

n¼1b2no1. It then follows that HMða; gÞ is not DES

from m0 from the Kakutani’s Dichotomy, which states that it is mutually singular to m0 whenP1

n¼1b2n ¼ 1; otherwise, it is absolutely continuous with respect to m0 (Kakutani, 1948). &

Theorem 2.1. HMða; gÞ and m0 are SD when aþ g 2 ð0; 0:5Þ.

Proof. Let Sn ¼Pn

i¼1X i. In fair coin tossing, it can be shown that Sn is eventually smaller thanð1þ �Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2n log log n

pm0-a.s. by the law of iterative logarithm. Thus it suffices to show that, for

every ml 2 HMða; gÞ,

ml Sn4ð1þ �Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2n log log n

peventually

n o¼ 1. (2.2)

To show (2.2), under ml,

P Snoð1þ �Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2n log log n

p� ¼ P fan � Sng4 an � ð1þ �Þ


pn o� ð2:3Þ

pConst. � exp �2 an � ð1þ �Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog log n

p� 2n

� �ð2:4Þ

¼ Const. � expf�2 � n1�2ðaþgÞg, ð2:5Þ

where an ¼ EðSnÞ ¼Pn

k¼1k�ðaþgÞ

�n1�ðaþgÞ. Eq. (2.4) is from the Hoeffding inequality for the sumof independent bounded random variables (Hoeffding, 1963). Since (2.5) has a finite sum foraþ g 2 ð0; 0:5Þ, Sn is eventually larger than ð1þ �Þ


pfrom the Borel–Cantelli lemma.

Finally, the function

f nðX 1; . . . ;X nÞ ¼ I Sn4ð1þ �Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2n log log n

p� sequentially discerns HMða; gÞ from m0. &

Theorem 2.2. HMða; gÞ and m0 are SD when aþ g ¼ 0:5.

Proof. Let fX ng1n¼1 be sequential binary observations. Define the sequence ftðnÞg

1n¼1 such that tðnÞ

is the smallest integer satisfying

�
tðnÞ4tðn � 1Þ andP � d1o 1
tðnÞ�tðn�1ÞtðnÞ�tðn�1Þj¼1

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffitðn�1Þþj

p od2 for positive constants d1 and d2.

Let mn ¼ maxfk : tðkÞpng and the statistics Zn be

Zn ¼1

mn

1ffiffiffiffiffibn

pXmn

k¼1

Stðk�1Þþ1;tðkÞ þ1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n � tðmnÞp StðmnÞþ1;n

( ),

where Sa;b ¼ X a þ X aþ1 þ � � � þ X b. Now we will show that

(i)
limn!1 Zn ¼ 0, m0-a.s. (ii) For every �40 and m1 2 HMða; gÞ, jZnj4� eventually, m1-a.s.

ARTICLE IN PRESS


First, it can be shown that mn�1= n fromXn ffiffiffip

ffiffiffip

d1mnok¼1

1= k�ffiffiffin

pod2mn.

To show (i), letting Y k ¼ Stðk�1Þ;tðkÞ=ffiffiffiffiffibk

p, we find EY k ¼ 0 and the ð2pÞth moment

EjY kj2p ¼

1

bpk

� EfX tðk�1Þþ1 þ � � � þ X tðkÞg2p ¼ Oð1Þ. (2.6)

It follows

PfjZnj4�g ¼ P1

mn

Xmn

k¼1

Y k þ1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n � tðmnÞp StðmnÞþ1;n

��4�

( )

pP1

mn

Xmn

k¼1

Y k

��4 �

2

( )þ P

1

mn

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin � tðmnÞ

p StðmnÞþ1;n

��4 �

2

( )

p64 � EXmn

k¼1

Y k

��6

8<:

9=;,

fm6n�6g þ 64 � EfjStðnÞþ1;nj

6g=fðn � tðmnÞÞ3m6ng ð2:7Þ

¼ Oð1=m3nÞ, ð2:8Þ

where the inequality in (2.7) is from (2.6). Then (2.8) has a finite sum and, by the Borel–Cantellilemma,

m0fjZnjo�; eventuallyg ¼ 1.

To show (ii), let m1 be a measure in HMða; gÞ with aþ g ¼ 0:5. Under m1,

1

mn

Xmn

k¼1

Y k ¼d 1

mn

Xmn

k¼1

1ffiffiffiffiffibn

pXbn

j¼1

X%

ðk�1Þbnþj þ1

mn

Xmn

k¼1

1ffiffiffiffiffibn

pXbn

j¼1

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðk � 1Þbn þ j

p41

mn

Xmn

k¼1

1ffiffiffiffiffibn

pXbn

j¼1

X%

ðk�1Þbnþj þ d1, ð2:9Þ

where X%

i is the result of fair coin tossing and A¼d

B in (2.9) implies that A and B have the samedistribution. Therefore, it can be shown that jZnj is larger than �, eventually ml-a.s. &

Theorem 2.1-2 show that HMða; gÞ and m0 are SD when aþ gp0:5, so they are UDES and DES.On the contrary, Theorem 2.A shows that they are not DES when aþ g40:5. Therefore, theboundary for SD is the same as those of UDES and DES.Similar problems in Gaussian mixtures have been recently addressed by Jin and Donoho (2003),

where they obtain the boundary of SD for g 2 ð0; 0:5Þ in the model �n � Nð0; 1Þ þ ð1� �nÞ � Nðmn; 1Þwith �n�n�g and mn�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2q log n

p. In their work, the diverging mean of the heterogeneous

components allows the positive result, although g is larger than 0:5. In contrast, in Lim (2003), heprovides a positive result for regular mixture components with 0ogo0:5 by introducing a specific

ARTICLE IN PRESS


type of dependence structure induced by a hidden renewal process; he assumes l is a random set oftimes where a renewal process fDng

1n¼1 visits a distinguished state.

2.2. INFðyÞ and FINðyÞ are not SD for every y 2 ð0; 1�

We first show that INFðyÞ and FINðyÞ are not SD when y ¼ 1. Then we prove that INFðyÞand FINðyÞ are not SD for every y 2 ð0; 1� by using the monotonicity between the models inTheorem 2.4.

Lemma 2.3. Let A be the set of infinite binary sequences (of 0 and 1) with finite numbers of 1,and let B be the set of infinite binary sequences (of 0 and 1) with infinite numbers of 1 suchthat the density of 1 is 0 in the sense that limn!1 jl \ f1; 2; . . . ; ngj=n ¼ 0. Then, there is no sequence

of functions f n : f0; 1gn ! f0; 1g satisfying that limn!1 f nðaÞ ¼ 0 for all a 2 A andlimn!1 f nðbÞ ¼ 1 for all b 2 B. Here f nðaÞ is defined as f nða1; a2; . . . ; anÞ where ða1; a2; . . . ; anÞ is

the first n components of a.

Proof. Suppose that there exist sequential discerning functions ff ng1n¼1 satisfying the condition in

the Lemma. Then it suffices to find an element l 2 B and a subsequence of ff ng1n¼1;say ff nk

g; thatsatisfy f nk

ðlÞ ¼ 0 for all k. Take any sequence b in B and choose any f n1satisfying

f n1ð0; 0; 0; . . .Þ ¼ 0. Replace the first n1 digits of b with 0 and denote the resulting sequence by

b1, which is clearly in B. Find the first nonzero entry in b1; denoted by m1; and let a1 be thesequence obtained by truncating b1 after the first nonzero entry and replacing those entries with 0.Since a1 is in A; there is n2 greater than maxðn1;m1Þ such that f n2

ða1Þ ¼ 0. Replace the first n2digits of b1 with those of a1. Let b2 is the resulting sequence which is again clearly in B. Note thatthe first n1 digits of b2 are identical to those of ð0; 0; . . .Þ; thus, f n1

ð0; 0; . . .Þ ¼ 0. Similarly, since thefirst n2 digits of b2 are identical to those of a1; f n2

ða1Þ ¼ 0. Inductively find a sequence bn in B.Finally l ¼ limn!1 bn defines a sequence in B such that f nk

ðlÞ ¼ 0 for all k. This is acontradiction. &

Theorem 2.4. INFðyÞ and FINðyÞ are not SD for every y 2 ð0; 1�.

Proof. Let B be the set of sequences l such that limn!1 jl \ f1; 2; . . . ; ngj=n ¼ 0 and jlj ¼ 1, andA be that of sequences l with jljo1.Let fX ng

1n¼1 be a binary process with values in f�1; 1g from INFðyÞ. Then there is a sequence

fY ng1n¼1 in B satisfying.

X n ¼ Un � Y n þ Vn � ð1� Y nÞ, (2.10)

where Un has a value 1 with probability ð1þ yÞ=2 and otherwise �1. Suppose INFðyÞ is SD fromFINðyÞ with the sequence of testing functions ff ng

1n¼1; hence, limn!1 f nðX 1; . . . ;X nÞ ¼ 0 if m 2

FINðyÞ and limn!1 f nðX 1; . . . ;X nÞ ¼ 1 if m 2 INFðyÞ. Then, define the sequence of testingfunctions for FINðyÞ against INFðyÞ as follows:

gnðY 1; . . . ;Y nÞ ¼ f nðX 1; . . . ;X nÞ

¼ f 0nððY 1;U1;V1Þ; . . . ; ðY n;Un;VnÞÞ, ð2:11Þ

where f 0n is obtained by substituting X n with Un � Y n þ Vn � ð1� Y nÞ in f n. It follows that gn

converges to 1 when fY ng1n¼1 is from B. Likewise, it can be shown that limn!1 f nðX 1; . . . ;X nÞ ¼ 0

ARTICLE IN PRESS


if m 2 FINðyÞ, and fgng1n¼1 converges to 0 if fY ng

1n¼1 2 A. Hence, B and A are SD, which

contradicts Lemma 2.3. &

Devroy and Lugosi (2002) recently considered a similar testing problem in mixture models andshowed that the finite mixture classes are not SD from the mixture classes with infinite number ofmixture components. Their proof is similar to Lemma 2.3 in the sense that it finds a probabilitymeasure which contradicts to the existence of testing functions.

Acknowledgements

The author is grateful to Amir Dembo and anonymous referees for many suggestions.

References

Cover, T., 1973. On determining the irrationality of the mean of a random variable. Ann. Statist. 1, 862–871.

Dembo, A., Peres, Y., 1994. A topological criterion for hypothesis testing. Ann. Statist. 22, 106–117.

Devroy, L., Lugosi, G., 2002. Almost sure classification of densities. J. Nonparametric Statist. 14, 675–698.

Fisher, L., Van Ness, J.W., 1969. Distinguishability of probability measures. Ann. Math. Statist. 40, 381–399.

Hoeffding, W., 1963. Probabilistic inequality for sums of bounded random variables. J. Am. Statist. Association 58,

13–30.

Hoeffding, W., Wolfowitz, J., 1958. Distinguishability of sets of distributions (the case of independent and identically

distributed chance variables). Ann. Math. Statist. 29, 700–718.

Kakutani, S., 1948. On equivalence of infinite product measures. Ann. Math. 49, 214–224.

Kulkarni, S.R., Zeitouni, O., 1996. A general classification rule for probability measures. Ann. Stat. 23, 1393–1407.

Lim, J., 2003. Testing stochastic processes: stationarity, independence, and ergodicity. Technical Report, Department

of Statistics, Stanford University.

Nobel, A., 2003. Hypothesis testing for families of ergodic processes. Preprint

Shields, P.C., 1991. Cutting and stacking: a method for constructing stationary processes. IEEE Trans. Inform. Theory

37, 1605–1617.

testing independent sparse heterogeneous mixtures

Documents