inference for zero-inflated poisson...

26
41 Chapter 3 Inference for Zero-Inflated Poisson Distribution 3.0 Introduction In chapter two, we have discussed inference for θ of Zero-Inflated Power Series Distribution. Zero-Inflated Poisson Distribution is a particular case of Zero-Inflated Power Series Distribution. In this chapter, we provide the inference for Zero-Inflated Poisson Distribution and Zero-Inflated Truncated Poisson Distribution. In the literature, numbers of researchers have worked on zero-inflated Poisson distribution. Yip (1988) has described an inflated Poisson distribution dealing with the number of insects per leaf. Lambert (1992) considered zero-inflated Poisson regression model. Van Den Broek (1995) has discussed score test for testing Poisson distribution against ZIP distribution. Xie et al. (2001) have reported use of ZIP distribution in statistical process control and studied performance of various tests for testing Poisson distribution against zero-inflated Poisson alternative. Gupta et al. (2004) have discussed score test for zero-inflated generalized Poisson regression model. Thas et al. (2005) have discussed smooth tests for the zero-

Upload: lequynh

Post on 13-Aug-2018

259 views

Category:

Documents


0 download

TRANSCRIPT

41

Chapter 3

Inference for Zero-Inflated Poisson Distribution

3.0 Introduction

In chapter two, we have discussed inference for θ of Zero-Inflated

Power Series Distribution. Zero-Inflated Poisson Distribution is a particular

case of Zero-Inflated Power Series Distribution. In this chapter, we provide

the inference for Zero-Inflated Poisson Distribution and Zero-Inflated

Truncated Poisson Distribution. In the literature, numbers of researchers have

worked on zero-inflated Poisson distribution. Yip (1988) has described an

inflated Poisson distribution dealing with the number of insects per leaf.

Lambert (1992) considered zero-inflated Poisson regression model. Van Den

Broek (1995) has discussed score test for testing Poisson distribution against

ZIP distribution. Xie et al. (2001) have reported use of ZIP distribution in

statistical process control and studied performance of various tests for testing

Poisson distribution against zero-inflated Poisson alternative. Gupta et al.

(2004) have discussed score test for zero-inflated generalized Poisson

regression model. Thas et al. (2005) have discussed smooth tests for the zero-

42

inflated Poisson distribution. Castillo et al. (2005) have studied overdispersed

and underdispersed Poisson distributions.

ZIPD contains two parameters. The first parameter )(π indicates

inflation of zero and the other parameter )(θ is that of Poisson distribution. In

this chapter, we focus on the inference of parameter θ of ZIPD. However it

appears that no test has been reported for testing parameter θ of Poisson

distribution. We provide maximum likelihood estimators, Fisher information

matrix and moment estimator of the parameters. The three asymptotic tests for

testing the parameter of Poisson distribution based on full likelihood,

conditional likelihood and moment estimator are provided. The performance

of these three tests has been studied for ZIPD. Asymptotic confidence

intervals for the parameter are also provided.

The rest of the chapter is organized as follows. In section 3.1, we

report maximum likelihood estimators of both the parameters of ZIPSD and

corresponding asymptotic variances using full likelihood and conditional

likelihood approach. In section 3.2, we provide three asymptotic tests for

testing the parameter of Poisson distribution. In section 3.3, the performance

of the three tests has been studied by simulation. We have developed C -

program for the same (Appendix 1). Based on study it is observed that an

asymptotic test based on full likelihood estimator and the one based on

conditional likelihood estimator have nearly similar performance. In section

3.4, the asymptotic confidence intervals based on three estimation approaches

are given. Section 3.5 is devoted to ZITPD. The ZITPD is a member of

ZIPSD. All the theory is analogues to ZIPD. Section 3.6 devoted to three tests

for testing the parameters of ZITPD.

3.1 Zero-Inflated Poisson Distribution

Full Likelihood Function Approach

Let nXXX ...,,, 21 be a random sample observed from ZIP distribution, then

the likelihood function is given by

43

( ) 0,!

1);,(

1

1

>

+−=

−−

=

−∏ πθθπ

πππθθ

θi

iia

xan

i x

eexL …(3.1.1)

The corresponding log likelihood function is given by

=);,(log xL πθ

∑∑ ∑∑== ==

− −+−++−=n

i

ii

n

i

n

i

ii

n

i

ii xaxaaaen11 11

0 !log)log(log)1log( θθπππ θ

…(3.1.2)

Suppose θ̂ and π̂ are mles of θ and π respectively, then )1(

ˆˆ

0

θπ

−−

−=

en

nn and

the corresponding mle of θ is the solution to equation ( )θθ

ˆ1

ˆ

−−=

ex

In the following we find the elements of Fisher information matrix.

We have, ( )

( ) ππππ θ

θ ∑=

++−

+−=

∂∂

n

i

ia

e

enL 10

1

1log,

and 2

1

2

20

2

2

)1(

)1(log

ππππ θ

θ ∑=

−+−

+−−=

n

i

ia

e

enL.

Therefore,

2

1

02

2

2

2

)()1(

)1(log

ππππ θ

θ

++−

+−=

∂−

∑=

n

i

iaE

nEe

eLE ,

22

2 )1(

)1(

)1()1(

ππ

ππππ θ

θ

θθ −

−− −+

+−

+−+−=

en

e

ene.

We note that, ( ) )1(0

θππ −+−= ennE and )1(1

θπ −

=

−=

∑ enaE

n

i

i .

Hence,

ππππ

θ

θ

θ )1(

)1(

)1(log 2

2

2 −

− −+

+−

+−=

∂−

en

e

neLE ,

44

)1(

)1(θ

θ

πππ −

+−

−=

e

en. …(3.1.3)

Now

θπππ

θ θ

θ ∑∑ =

=−

+−+−

−=∂∂

n

i

iin

i

i

xa

ae

enL 1

1

0

)1(

)(log

2

1

2

0

2

2

)1(

)1(log

θππππ

θ θ

θ ∑=

−+−

−−=

n

i

ii xa

e

enL

Therefore,

+

+−

−=

∂−

θπππ

πθ θ

θ 1

)1(

)1(log2

2

e

en

LE …(3.1.4)

Further, ( )2

02

1

log

θ

θ

ππθπ −

+−−=

∂∂∂

e

enL

Hence

( )θθ

ππθπ −

+−=

∂∂∂

−e

enLE

1

log2

…(3.1.5)

Thus the elements of Fisher Information Matrix are

( )

( )

+−−

=−

θ

θ

eπππ

enI

1

111 ,

+

+−

−=

θeππ

eππnI

θ

θ1

1

)1(22 ,

θ

θ

ππ −

+−==

e

enII

12112

Therefore, asymptotic variance of θ̂ is given by

1

11

1222ˆ ),(

−=

I

IIVar θπ

θ and estimator of the same is )ˆ,ˆ(ˆ θπ

θVar .

…(3.1.6)

45

Conditional Likelihood Function Approach

The conditional likelihood function is given by

0,)1(!

);(1

>

−=∏=

−∗ θ

ex

θexθL

n

i

a

θi

xθ ii

…(3.1.7)

The corresponding log likelihood function is given by

∑∑−

=

−−

=

−−−−−−=00

1

00

1

!log)1log()()()log();(*lognn

i

i

nn

i

i xennnnxxL θθθθ

−+−−=

∂∂

−−

=∑ θ

θ

θθ e

ennx

L nn

i

i1

1)(1*log

0

1

0

Equating θ∂

∂ *log L equal to zero we get the mle of θ .

The corresponding mle θ~

is the solution to equation,

θ

e

θx ~

1

~

−−= …(3.1.8)

Now consider,

2

0

122

2

)1(

)(1*log 0

θ

θ

θθ −

−−

= −

−+−=

∂∑

e

ennx

L nn

i

i .

Therefore,

2

0

122

2

)1(

)(1*log 0

θ

θ

θθ −

−−

= −

−−

=

∂− ∑

e

ennxE

LE

nn

i

i ,

2

00

2 )1(

)(

)1(

)(1θ

θ

θ

θθ −

− −

−−

−=

e

enn

e

nn,

−−

−=

− )1(

1

)1(

)( 0

θ

θ

θ θ e

e

e

nn.

The asymptotic variance of θ~

is

1

0~

)1(

1

)1(

)()(

−−

−=

θ

θ

θθe

e

θe

nnθAV …(3.1.9)

46

Moment Estimator of ZIP Distribution

For the zero-inflated Poisson distribution the mean and variance are

given by, θπ== XXE )( and ( ))1(1)( 2 πθπθ −+== SXVar …(3.1.10)

Therefore,

θπ=)(XE and ( )

),()1(1

)( 2 θπσπθπθ=

−+=

nXVar say …(3.1.11)

Let θπ== XXE )( …(3.1.12)

and ( ))1(12 πθπθ −+=S

( )πθθπθ −+= 1

( )XX −+= θ1

2XXX −+= θ

22XXSX +−=θ

X

XXS 22

ˆ +−=θ

)1(ˆ2

XX

S−−=θ

Now, θ

πˆ

ˆX=

Hence, )1(

ˆ2

2

XXS

X

−−=π

22

2

XXS

X

+−= …(3.1.13)

Solving Eq. (3.1.12) and Eq. (3.1.13) we get the moment estimators of π and

θ .

3.2 Tests for the Parameter θ of ZIP Distribution

Let us consider the problem of testing 00 : θθ =H . We assume that π is

unknown.

a) Test Based On θ̂

The test statistic for testing 00 : θθ =H vs 01 : θθ ≠H , is given by

47

),ˆ(

ˆ

00ˆ

07

θπ

θθ

θAV

Z−

= , …(3.2.1)

where )( 0ˆ θθAV is an estimate of asymptotic variance of θ̂ . The test 7ψ

rejects 0H , if 2/17 α−> zZ

b) Test Based On θ~

The test statistic here is: )(

~

0~

08

θ

θθ

θAVZ

−= , …(3.2.2)

where )( 0~ θθAV is as defined in Eq. (3.1.9). The test 8ψ rejects 0H if

2/18 α−> zZ .

c) Test Based On Sample Mean

The test statistic is

),ˆ(ˆ

ˆ

00

2

0

0

0

9

θππ

θπ

XAV

Xn

Z−

= , …(3.2.3)

where, 0

0ˆθ

πX= ,

Power of the test is given by

),(9θπβψ ( )∑

=

=Φ+Φ−=n

k

kk knPAB0

0 )()ˆ()ˆ(1

where ( )

),(

),ˆ(ˆˆˆ

0

00

2

02/100

θπ

θπθππθπ α

X

X

kAV

AVzB

−+=

−−

,

( )

),(

),ˆ(ˆˆˆ

0

0

2

02/100

θπ

θπθππθπ α

X

X

kAV

AVzA

−−=

−−

and

( ) knkn

k PPknP −−== )1()( 000 , with ( )θππ −+−= eP 10

48

In the following, we report performance of the three tests developed in

the section (3.2), which is based on simulation experiments.

3.3 Simulation Study

A simulation study is carried out to investigate the power of the three

tests proposed in the section 3.2. We generate 25000 samples of sizes 50 and

100 for different values of θ and π . Based on the generated sample, the test

statistics were calculated. Percentage of times the test statistic exceeds 2/1 α−z

is computed. It is infact an estimate of power of the respective test. ‘C’

programs are developed to find power of the test (Appendix 1). The results for

the case of =0θ 2 and 5 and =π 0.3, 0.4, 0.5, 0.6, 0.7 are presented in the

Table 3.3.1 and Table 3.3.2.

49

Table 3.3.1: Power (in %) of the test 7ψ , 8ψ and 9ψ for 20 =θ

Parameters n=50 n=100

π θ 7ψ 8ψ 9ψ 7ψ 8ψ 9ψ

2.00 6.7517 4.7758 2.7999 5.8318 4.7558 2.8519

2.20 9.4676 7.6117 7.0357 11.7195 10.1956 8.6397

2.40 18.3913 16.2833 14.0954 28.8708 26.3829 20.8112

2.60 31.7907 29.0588 24.5390 52.6419 49.8180 40.2304

2.80 48.6461 45.5942 38.7065 74.3890 72.0291 61.4575

3.00 64.4254 61.7055 53.4699 89.1324 87.7885 78.7289

3.20 77.4689 75.4130 66.8533 95.9282 95.3722 90.3964

3.40 86.6245 85.1526 78.4729 98.7920 98.5601 96.0442

3.60 92.9443 92.0523 86.4885 99.7120 99.6600 98.5481

3.80 96.2442 95.8282 91.9203 99.9080 99.8960 99.5680

4.00 98.0921 97.8641 95.2882 99.9720 99.9720 99.8600

4.20 99.1800 99.0960 97.5401 99.9920 99.9920 99.9720

4.40 99.5240 99.4400 98.7441 99.9960 99.9960 99.9800

0.3

4.60 99.7440 99.7280 99.3640 99.9960 99.9960 99.9960

2.00 6.7277 4.7078 3.6279 6.1678 4.9038 3.8718

2.20 10.6396 8.6397 8.8716 14.2834 12.1195 11.1996

2.40 23.0191 20.3152 19.0112 36.9785 33.1787 28.5829

2.60 39.9304 36.5665 33.4067 63.9894 60.0576 52.8299

2.80 59.4496 56.1178 50.8460 85.3646 83.0367 75.6170

3.00 75.7850 73.1731 66.8533 95.6602 94.7362 89.9724

3.20 87.4805 85.4926 80.3728 99.0040 98.7081 96.6841

3.40 94.1242 93.2163 89.4284 99.8080 99.7560 99.1240

3.60 97.5041 97.0121 94.7802 99.9600 99.9560 99.8320

3.80 98.9200 98.7361 97.4801 99.9960 99.9960 99.9400

4.00 99.6600 99.5720 98.9200 99.9960 99.9920 99.9920

4.20 99.8640 99.8320 99.5280 99.9960 99.9960 99.9920

4.40 99.9560 99.9520 99.8360 99.9960 99.9960 99.9960

0.4

4.60 99.9920 99.9920 99.9360 99.9960 99.9960 99.9960

50

Table 3.3.1 continued…

Parameters n=50 n=100

π θ 7ψ 8ψ

9ψ 7ψ 8ψ

2.00 6.3997 4.4278 4.1158 6.4037 4.7518 4.5678

2.20 11.9155 9.7636 11.2596 16.9313 14.2954 14.9474

2.40 26.7789 23.2991 24.2550 43.1343 39.0824 37.3265

2.60 47.9021 43.5023 43.1423 73.1571 69.5892 64.2814

2.80 68.3293 64.6854 62.5335 91.7403 90.2124 85.4846

3.00 84.4246 81.8647 78.4849 98.2281 97.8001 95.7122

3.40 97.5081 97.0201 95.0802 99.9600 99.9480 99.8480

3.60 99.2800 99.0480 98.2361 99.9920 99.9920 99.9760

3.80 99.7520 99.6960 99.3360 99.9960 99.9960 99.9960

4.00 99.9240 99.9000 99.7400 99.9960 99.9960 99.9960

4.20 99.9880 99.9840 99.9480 99.9960 99.9960 99.9960

0.5

4.60 99.9960 99.9960 99.9960 99.9960 99.9960 99.9960

2.00 6.7397 4.9918 5.6798 6.7477 5.1118 5.9078

2.20 13.2795 10.6196 14.0674 19.2192 15.7234 18.3193

2.40 30.9228 26.6269 30.5428 49.5980 44.9342 45.2742

2.60 55.5178 50.3940 51.9859 80.2368 77.0009 74.5450

2.80 77.0409 72.8491 72.4451 95.6282 94.4362 92.2843

3.00 90.3084 87.9525 86.7725 99.3440 99.1880 98.5721

3.20 96.7641 95.8202 94.6482 99.9560 99.9480 99.7520

3.40 99.0880 98.7521 98.2561 99.9960 99.9920 99.9760

3.60 99.8080 99.6880 99.5880 99.9960 99.9960 99.9960

3.80 99.9280 99.8960 99.8680 99.9960 99.9960 99.9960

4.00 99.9840 99.9840 99.9560 99.9960 99.9960 99.9960

0.6

4.60 99.9960 99.9960 99.9960 99.9960 99.9960 99.9960

2.00 6.9957 5.0078 7.3237 6.7437 4.9598 8.0317

2.20 14.7754 11.3755 17.7233 21.4991 17.5553 22.6991

2.40 35.9826 30.5268 36.9425 56.7097 51.0980 54.2618

2.60 61.2855 55.1498 60.5616 86.5325 83.2407 82.6607

2.80 82.8047 78.5529 80.6288 97.8161 96.9681 96.2362

3.00 94.0002 92.1123 92.3723 99.8440 99.7360 99.4560

3.20 98.4481 97.7401 97.5961 99.9840 99.9680 99.9480

3.40 99.7040 99.5520 99.3800 99.9960 99.9960 99.9920

0.7

4.20 99.9960 99.9960 99.9960 99.9960 99.9960 99.9960

51

Table 3.3.2: Power (in %) of the test 7ψ , 8ψ and 9ψ for 50 =θ

Parameters n=50 n=100

π θ 7ψ 8ψ 9ψ 7ψ 8ψ 9ψ

5.0 5.1958 5.1958 0.2080 5.0918 5.0918 0.0360

5.2 7.0437 7.0437 0.5920 8.1997 8.1997 0.1160

5.4 11.3635 11.3635 1.3679 16.5073 16.5073 0.3480

5.6 18.5953 18.5953 2.7079 30.9268 30.9268 1.1720

5.8 28.3509 28.3509 5.1558 47.9861 47.9861 3.4959

6.0 40.6664 40.6664 9.0676 65.5334 65.5334 7.7597

6.2 52.0179 52.0179 14.5554 79.5808 79.5808 15.6594

6.4 63.4295 63.4295 22.4071 89.2244 89.2244 26.4509

6.6 73.6651 73.6651 31.4627 94.5922 94.5922 39.5944

6.8 81.6967 81.6967 41.1264 97.6641 97.6641 54.1418

7.0 87.9165 87.9165 50.6580 99.0280 99.0280 67.3053

7.4 95.3282 95.3282 69.3932 99.8640 99.8640 87.1605

0.3

7.8 98.2761 98.2761 83.2647 99.9840 99.9840 96.2801

5.0 5.1038 5.1038 0.3160 4.9798 4.9798 0.0760

5.2 6.9997 6.9997 0.9600 8.8436 8.8436 0.2280

5.4 13.2875 13.2875 2.0759 20.6672 20.6672 0.8200

5.6 22.9911 22.9911 4.8118 38.6905 38.6905 3.1599

5.8 35.4506 35.4506 9.4436 59.2656 59.2656 8.0877

6.0 48.9980 48.9980 16.5113 76.3049 76.3049 17.2193

6.2 63.4255 63.4255 25.3870 88.8404 88.8404 31.6307

6.4 74.8450 74.8450 36.2106 95.4562 95.4562 48.6061

6.6 84.0966 84.0966 48.2621 98.3121 98.3121 65.7494

6.8 91.0884 91.0884 60.1696 99.5120 99.5120 79.2608

7.0 94.5642 94.5642 70.6332 99.8640 99.8640 88.5685

7.4 98.6121 98.6121 86.3045 99.9920 99.9920 97.5961

0.4

7.8 99.6800 99.6800 94.4162 99.9960 99.9960 99.6680

52

Table 3.3.2 continued….

Parameters n=50 n=100

π θ 7ψ 8ψ

9ψ 7ψ 8ψ

5.0 5.1678 5.1678 0.5480 5.0998 5.0998 0.2480

5.4 14.5514 14.5514 3.9718 24.2870 24.2750 2.1519

5.6 26.8429 26.8429 8.4197 46.2861 46.2741 6.4917

5.8 41.5583 41.5583 15.5554 68.5693 68.5653 16.4753

6.0 58.0337 58.0337 26.8549 85.2366 85.2326 32.3307

6.4 83.4007 83.4007 52.9939 98.1681 98.1641 70.2972

6.6 90.7804 90.7804 65.3494 99.5640 99.5640 84.1566

6.8 95.5162 95.5162 76.5489 99.8960 99.8960 93.0443

7.0 98.1321 98.1321 85.3406 99.9640 99.9640 97.3321

7.4 99.6160 99.6160 95.0602 99.9960 99.9960 99.7480

0.5

7.8 99.9360 99.9360 98.6801 99.9960 99.9960 99.9680

5.0 5.1198 5.1198 0.9800 5.3198 5.2918 0.5040

5.2 8.1597 8.1597 2.6879 11.3315 11.2196 0.9520

5.4 16.8233 16.8233 6.5157 27.9189 27.7509 4.3798

5.6 31.0148 31.0148 13.5235 53.3099 53.1419 12.5555

5.8 48.1021 48.1021 24.6070 76.1050 75.9490 28.3869

6.0 65.0414 65.0414 39.0224 90.7324 90.6724 50.5500

6.4 89.2924 89.2924 69.0652 99.3080 99.2920 86.0726

6.6 94.8562 94.8562 80.3968 99.8800 99.8800 94.8162

7.0 99.2560 99.2560 94.1722 99.9960 99.9960 99.5320

7.4 99.9200 99.9200 98.5601 99.9960 99.9960 99.9760

0.6

7.8 99.9920 99.9920 99.7120 99.9960 99.9960 99.9960

5.0 5.0118 5.0118 1.4759 4.8318 4.7998 0.9760

5.2 8.8476 8.8476 4.5638 12.0595 11.9555 2.2119

5.4 18.9792 18.9792 11.0596 31.5747 31.3587 8.1797

5.6 35.0946 35.0946 21.9191 59.4336 59.2376 22.3911

5.8 54.0098 54.0098 36.5465 81.7927 81.6407 44.0662

6.0 71.7411 71.7411 54.1778 94.3002 94.2482 68.4733

6.4 93.3803 93.3803 82.4527 99.8000 99.7960 95.0602

6.6 97.3321 97.3321 90.9324 99.9760 99.9760 98.5601

6.8 99.1560 99.1560 95.5162 99.9960 99.9960 99.7120

7.0 99.7520 99.7520 98.0401 99.9960 99.9960 99.9240

7.4 99.9680 99.9680 99.7400 99.9960 99.9960 99.9960

0.7

7.8 99.9960 99.9960 99.9680 99.9960 99.9960 99.9960

53

From the simulation study reported in Tables (3.3.1) and (3.3.2), we observe

that

(i) The test based on full likelihood approach is better than the one

based on conditional likelihood approach when θ is small. However,

probability of Type–I error of the former test is more than that of later.

Therefore, we can consider the test based on conditional likelihood approach

as an alternative to the one based on full likelihood approach. The later test is

also easy for computations.

(ii) For large values of θ , both the tests are equally good.

Therefore, we recommend the use of conditional likelihood approach, when

θ is large. Conditional approach is proved to be an alternative to the full

likelihood approach for large values of θ . If θ is large, proportion of zeros

corresponding the Poisson distribution are relatively low. Hence these zeros

can be ignored while making inference about θ . However, for smaller values

of θ , such ignorance will have effect on inference of θ .

3.4 Asymptotic Confidence Interval for the Parameter θ

Asymptotic confidence interval for θ based on the test 7ψ is given by

( ))ˆ,ˆ(ˆ,)ˆ,ˆ(ˆˆ2/1ˆ2/1 θπθθπθθαθα AVzAVz −− +−

where )ˆ,ˆ(ˆ θπθ

AV is an estimate of an asymptotic variance of θ̂ and

asymptotic confidence interval for θ based on the test 8ψ is given by

( ))~(

~,)

~(

~~2/1~2/1 θθθθ θαθα AVzAVz −− +− .

where )~

(~ θAVθ

is an estimate of an asymptotic variance of θ~

as given in the

Eq. (3.1.10)

Asymptotic confidence interval for θ based on the test 9ψ is given by

+− −− ),ˆ(

ˆ,),ˆ(

ˆ2/12/1 θπ

πθπ

π θαθα AVzX

AVzX

54

where ππθθ

θπθ ˆ

))ˆ1(1(),ˆ(

−+=AV

In the following, we conduct simulation study to estimate coverage

probability of these three confidence intervals. The results are presented in the

Table 3.4.1

Table 3.4.1 :

Coverage Probability of the three asymptotic confidence intervals

Sample

Size n=50 n=100

π 0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7

θ = 2

M

S

C

0.923

0.931

0.951

0.944

0.935

0.955

0.947

0.935

0.952

0.945

0.932

0.948

0.939

0.929

0.947

0.874

0.942

0.952

0.964

0.938

0.951

0.964

0.938

0.953

0.959

0.938

0.952

0.948

0.928

0.948

θ = 3

M

S

C

0.983

0.948

0.948

0.984

0.949

0.949

0.982

0.948

0.949

0.979

0.946

0.948

0.974

0.945

0.948

0.991

0.949

0.951

0.990

0.949

0.951

0.988

0.948

0.950

0.986

0.949

0.951

0.980

0.946

0.949

θ = 4

M

S

C

0.995

0.951

0.952

0.995

0.950

0.951

0.994

0.947

0.947

0.991

0.949

0.948

0.989

0.949

0.951

0.997

0.949

0.950

0.998

0.950

0.951

0.997

0.950

0.950

0.995

0.948

0.948

0.992

0.948

0.948

θ = 5

M

S

C

0.998

0.949

0.949

0.998

0.949

0.949

0.997

0.952

0.951

0.996

0.947

0.947

0.995

0.947

0.947

0.999

0.950

0.950

0.999

0.949

0.949

0.999

0.949

0.949

0.998

0.947

0.948

0.996

0.949

0.949

θ = 10

M

S

C

0.999

0.953

0.953

0.996

0.949

0.949

0.999

0.949

0.949

0.999

0.949

0.949

0.992

0.949

0.949

0.999

0.949

0.949

0.999

0.951

0.951

0.999

0.951

0.951

0.999

0.949

0.949

0.999

0.950

0.950

M : Coverage probability using method of moments, S : Coverage probability using full likelihood

function, C :Coverage probability using conditional likelihood function.

It is observed from Table 3.4.1 that estimated coverage probability of the

confidence interval based on θ~

is more for small values of θ . Further

55

investigation revealed that the same is at the cost of increase in the length of the

confidence interval. However, for large values of θ confidence interval based

on θ and θ~

perform equally good. However, the one based on method of

moment estimator does not perform satisfactorily. Investigations show that it

gives higher coverage due to increase in the length.

Example:

Traffic Accident Research: Kuan et al. (1991) discuss data coming

from the California Department of Motor Vehicles master driver license file.

Here the variable of interest is the number of accidents per driver.

Here, 5422=n , 44990 =n , 2031.0=x , 5583.0ˆ =π , θ̂ = 3637.0~=θ ,

4415.4)ˆ,ˆ( 0ˆ =θπAVπ , 96.12/1 =−αZ . If the above data fitted for Poisson

distribution it does not fit well (p-value = 0.00) but it fits well for zero-

inflated Poisson distribution (p-value = 0.298). Therefore, we assume that

data follow ZIP distribution. Hence, 95% Asymptotic Confidence Intervals

(ACI) for the parameter θ using full likelihood and conditional likelihood

approach are (0.3406, 0.3868) and (0.3118, 0.4157) respectively.

ACI based on moments is (0.3385, 0.3845). Here π̂ and θ based on

moments are 5617.0ˆ =π , 3615.0=θ

Result:

In the light of results regarding point estimation of θ , reported by Yip

(1988) and the present study, it is observed that inference based on

Number of

accidents

Number of

Drivers

0 4499

1 766

2 136

≥ 3 21

Total 5422

56

conditional likelihood approach is as good as the one based on full likelihood

approach. In the view of computational aspect, conditional likelihood

approach is recommended. This work is published in the journal Statistical

Methodology, 4(2007), 393-406.

In the following, we provide inference for zero-inflated truncated Poisson

distribution

3.5 Zero-Inflated Truncated Poisson distribution

Truncated samples from discrete distributions arise in numerous

situations where counts of zero are not observed. As an example, consider the

distribution of the number of children per family in developing nations, where

records are maintained only if there is at least a child in the family. The

number of childless families remains unknown. The resulting sample is thus

truncated with zero class missing. In continuous distribution, a sample of this

type would be described as singly left truncated. In other situations, sample

from discrete distributions might be censored on the right.

In this section, we consider zero-inflated truncated Poisson distribution

truncated at right at the support point ''t onwards, where ''t is known.

Moments, maximum likelihood estimators, Fisher information matrix for full

and conditional likelihood are provided. In the section 3.6, we provide three

tests for testing the parameter of the ZITPD.

Consider the probability mass function of truncated Poisson distribution

(TPD) truncated at the support point ''t onwards. The probability mass function

of TPD is given by

txfortXPx

exXP

x

,...,2,1,0,)(1(!

)( =>−

==− θθ

,

!!

0

=

∑=

t

y

y

x

y

ex

e

θ

θθ

θ

57

,)(! θ

θAx

x

= where

= ∑

=

t

y

y

yA

0 !)(

θθ

Using this truncated distribution, we define the zero-inflated truncated

Poisson distribution truncated at ''t onwards.

The probability mass function of ZITP distribution is given by

)(

)1(θπ

πA+− 0=xfor ,

== )( xXP

)(! θ

θπAx

x

txfor ,...,2,1= and 10,0 <<> πθ

…(3.5.1)

Moments of ZITP distribution

The moment generating function of ZITPD is given by

( ))()(

)1()( tx eA

AtM θ

θπ

π +−=

The mean and variance are given by

Mean = =)(XE)(

)(

θθθπ

A

A′

=)( 2XE ( ))()()(

θθθθπθ

AAA

′+′′

′−′+′′=

)(

)()()(

)()(

2

θθθπ

θθθθπθ

A

AAA

AXVar …(3.5.2)

Estimation of the parameters using full likelihood function

Let nXXX ...,,, 21 be a random sample observed from zero-inflated

truncated Poisson distribution truncated at ''t onwards, where ''t is the point in

the support defined in the above probability mass function. Then the likelihood

function is given by

0,)(!)(

1);,(

1

1

>

+−=

=∏ πθ

θθπ

θπ

ππθii a

xan

i AxAxL

58

The corresponding log likelihood function is given by

∑=

+

+−=

n

i

iaA

nxL1

0 log)(

1log);,(log πθπ

ππθ

∑ ∑∑= ==

−−+n

i

n

i

i

n

i

iiii Aaxaxa1 11

)(log!loglog θθ …(3.5.3)

To find mles of θ and π , we differentiate the Eq. (3.5.3) with respective π

and θ , and then equating to zero we get

( )1)ˆ(

)ˆ()(ˆ 0

−=

θ

θπ

An

Ann …(3.5.4)

and ( )( ) )ˆ(

)ˆ(

)ˆ()ˆ(

1

)ˆ(

ˆ1

2

01

θ

θ

θθπ

π

θπ

θ A

Aa

AA

Anxa

n

i

i

n

i

ii′

+

+−

′=

∑∑==

+−

+−′

=

)ˆ()ˆ(

ˆ1

ˆ)(

)ˆ(

)ˆ( 00

θθπ

π

π

θθ

AA

nnn

A

A

+−

+−−′=

∑=

πθππθπ

θθ

θ )()1(

)()1)((

)(

)( 01

A

nAnn

A

Axa

n

i

ii

…(3.5.5)

Substituting ( )1)ˆ(

)ˆ()(ˆ 0

−=

θ

θπ

An

Ann in the above equation we have

( )1)ˆ(

)ˆ()(

ˆ01

′−=

∑=

θ

θ

θ A

Annxa

n

i

ii

,

( )∑=

=′−−−n

i

ii AnnAxa1

0 0)ˆ(ˆ)(1)ˆ( θθθ , …(3.5.6)

59

which is non-linear equation in θ̂ . Therefore, we use a numerical technique to

solve it. Let

( )∑=

′−−−=n

i

ii AnnAxah1

0 )ˆ(ˆ)(1)ˆ()ˆ( θθθθ and

( )ˆ()ˆ(ˆ)()ˆ()ˆ(1

0 θθθθθ AAnnAxahn

i

ii′+′′−−′=′ ∑

=

.

Using Newton-Raphson iterative formula ,...2,1,0,)ˆ(

)ˆ(ˆˆ1 =

′−=+ i

h

hii

θθ

θθ with

suitable initial value of 0θ we get θ̂ . Substituting this value of θ̂ in Eq. (3.5.4),

we get the value of π̂ .

In the following we find the elements of Fisher information matrix

Here we have

πθπ

π

θπ

∑=+

+−

+−

=∂∂

n

i

ia

A

An

L 1

0

)(1

)(

11

log,

2

1

2

2

0

2

2

)(1

)(

11

log

π

θπ

π

θπ

∑=−

+−

+−

−=∂

n

i

ia

A

An

L,

2

1

2

2

0

2

2

)(1

)(

11)(

log

π

θπ

π

θπ

+

+−

+−

=

∂−

∑=

n

i

iaE

A

AnE

LE ,

πθ

θπ

π

θ

+

+−

+−

=)(

11

)(1

)(

11

2

An

A

An

,

60

πθ

θπ

π

θπ

+−

+−

+−

=

∂−

)(

11

)(1

)(

11

log

2

2

2 An

A

An

LE ,

( )( )πθπθπ

θ+−

−=

)()(

1)(11

AA

AnI . …(3.5.7)

Now

πθπ

π

θπ

∑=+

+−

+−

=∂∂

n

i

ia

A

An

L 1

0

)(1

)(

11

log,

( )

2

202

)(1

)(

11

)(1

)(

)(

log

+−

+−−

+−

′−

=∂∂

θπ

π

θπ

θπ

πθθ

θπ

A

AAA

An

L,

2

202

)(1

)(

11

)(1

)(

)()(

log

+−

+−−

+−

=

∂∂∂

θπ

π

θπ

θπ

πθθ

θπ

A

AAA

AnE

LE ,

( )πθπθθθ

θπ +−

′==

∂∂∂

−)()()(

)(log12

2

AAA

AnI

LE …(3.5.8)

Further differentiating Eq. (3.5.3) twice with respect to θ , we get

2

22

2

4

22

0

2

2

)(1

)()(

)(

)(

)()(2)()(

)(1

log

+−

′−

′−′′

+−

−=∂

θπ

π

θπ

θθ

θθθθθ

θπ

ππ

θ

A

AA

A

f

AAAA

An

L

( )2

1

2

2

1

)(

)()()(

θ

θθθ

θ A

AAAaxan

i

i

n

i

ii ∑∑==

′−′′−− .

61

Therefore,

=

∂−

2

2log

θL

E

+−

′+

′−′′

)(1)(

)(

)(

)()(2)()(

4

2

4

22

θπ

πθ

θπθ

θθθθπ

AA

A

A

AAAAn

( )

′−′′

+′

+2

2

)(

)()()()(

11

)(

)(

θ

θθθθ

π

θθθπ

A

AAAA

n

A

An.

Hence,

=22I

+−

′+

′−′′

)(1)(

)(

)(

)()(2)()(

4

2

4

22

θπ

πθ

θπθ

θθθθπ

AA

A

A

AAAAn

( )

′−′′

+′

+2

2

)(

)()()()(

11

)(

)(

θ

θθθθ

π

θθθπ

A

AAAA

n

A

An.

The asymptotic variance of π̂ and θ̂ are

1

22

1211

11ˆ ),(

−==

I

IIIAV θππ .

1

11

1222

22ˆ ),(

−==

I

IIIAV θπ

θ . …(3.5.9)

Conditional Likelihood Function Approach

The conditional likelihood function is given by

0,)1)((!

);(1

>

−=∏=

∗ θθθ

θn

i

a

i

x ii

AxxL …(3.5.10)

The corresponding log likelihood function is given by

62

∑∑−

=

=

−−−−=00

1

0

1

!log)1)(log()()log();(*lognn

i

i

nn

i

i xAnnxxL θθθ …(3.5.11)

The corresponding mle θ~

is the solution to an equation

1)

~(

)~

(~

−=θθθ

A

Ax …(3.5.12)

Now consider,

( )( )∑∑

=

= −

′−′′−−−=

∂∂ 00

12

2

122

*2

1)(

)()(1)(log nn

i

nn

i

i

A

AAAxL

θθθθ

θθ

( )( )∑∑

=

= −

′−′′−+

=

∂−

00

12

2

122

*2

1)(

)()(1)(log nn

i

nn

i

i

A

AAAE

xE

LE

θθθθ

θθ

{ }( )2

20

2

0

1)(

)()()1)(()(

)1)((

)()(

′−′′−−+

′−=

θ

θθθθθ

θθ

A

AAAnn

A

Ann

′−−′′+

−−

=)1)((

)()1)()(()(

)1)((

)( 20

θθθθ

θθ

θ A

AAAA

A

nn. …(3.5.13)

Therefore, asymptotic variance of θ~

is different than the asymptotic variance of

estimate of θ based on the standard likelihood approach. The same is given by

( ) ( )

12

0~

1)(

))(()())1)(()(

1)(

)()(

′−′′−+

−=

θθθθ

θθ

θθθ A

AAAA

A

nnAV

…(3.5.14)

Moment Estimator of ZITP Distribution

Mean = =)(XE)(

)(

θθθπ

A

A′ …(3.5.15)

=)(2

XE ( ))()()(

θθθθπθ

AAA

′+′′

),()(

)()()(

)()( 2

2

θπσθθθπ

θθθθπθ

=

′−′+′′=

A

AAA

AXVar say

…(3.5.16)

63

. )(

)(

θθθπ

A

Ax

′= …(3.5.17)

( ))()()(

1

2

θθθθπθ

AAAn

xn

i

i

′+′′=∑=

…(3.5.18)

Solving Eq. (3.5.17) and Eq. (3.5.18), we get moment estimators of π and θ .

3.6 Tests for the Parameter θ of ZITP Distribution

Suppose we want to test 00 : θθ =H vs 01 : θθ ≠H , (assuming π is unknown)

a) Test based on θ̂

),ˆ(

ˆ

00ˆ

010

θπ

θθ

θAV

Z−

= …(3.6.1)

where ),ˆ( 00ˆ θπθ

AV is defined in Eq. (3.5.9).The test 10ψ rejects 0H , if

2/110 α−> zZ .

b) Test Based On θ~

The test statistic here is )(

~

0~

011

θ

θθ

θAVZ

−= , …(3.6.2)

Where, )( 0~ θAVθ

is as defined in Eq. (3.5.14). The test 11ψ rejects 0H if

2/111 α−> zZ .

c) Test Based On Sample Mean

The test statistic

),ˆ(ˆ

ˆ

00

2

0

0

0

12

θππ

θπ

XAV

Xn

Z−

= , …(3.6.3)

where

=)(

)(ˆ

00

00 θθ

θπ

A

AX

Power of the test is given by

64

),ˆ(12θπβψ ( )∑

=

=Φ+Φ−=n

k

kk knPAB0

0 )()ˆ()ˆ(1

where , ( )

),(

),ˆ(ˆˆˆ

0

00

2

02/100

θπ

θπθππθπ α

X

X

kAV

AVzB

−+=

−−

,

( )

),(

),ˆ(ˆˆˆ

0

0

2

02/100

θπ

θπθππθπ α

X

X

kAV

AVzA

−−=

−−

and

( ) knknk PPknP −−== )1()( 000 , with

)(10 θ

ππ

AP +−=

Example

Let us consider the data of Traffic Accident Research given by Kuan et

al. (1991). From the data we see that there is excess number of zero counts and

the frequency of X is greater than or equal to 3 is 21. Generally such data is

modeled by Poisson distribution. But Poisson distribution does not fit well for

the data. We fit the above data for ZIPD. In ZIPD there are two parameters π

and θ .These parameters are obtained from ( )( )θπ

ˆ

0

1

/1ˆ

−−

−=

e

nn and θ̂ is evaluated

from a non-linear equation θ

θˆ

0

1 1

)(ˆ

−= −

−=∑

e

nnx

n

i

i .

In this problem 5422,44990 == nn . Using these values we get

55832.0ˆ =π and 363701.0ˆ =θ . Using these values we fit the ZIPD for the

above data.

65

Table 3.6.1: Goodness of fit using ZIPD

Now we use ZITPD (truncated at 6 and above) to the same data and estimate

the parameters as 559258.0ˆ =π and 363511.0ˆ =θ .

Table : 3.6.2 : : Goodness of fit using ZITPD

Number of

accidents

Number of Drivers

iO

Expected

Frequency

iE i

ii

E

EO 2)( −

0 4499 4499.000 0.000

1 766 765.304 0.001

2 136 139.171 0.072

≥ 3 21 16.872 1.010

Total 5422 5420.347 1.083

Degrees of freedom 1

2χ (1,0.01)= 6.635

p - value 0.29808

Number

of

accidents

Number of Drivers

iO

Expected Frequency

iE i

ii

E

EO 2)( −

0 4499 4499.0000 0.0000000

1 766 765.2829 0.0006720

2 136 139.1927 0.0732322

3 21 16.8779

4 0 1.5349

5 0 0.1116

0.3308002

Total 5422 5422 0.4047044

Degrees of freedom 1

2χ (1,0.01)= 6.635

p - value 0.524669

66

From the Table 3.6.1, we observed that calculated chi square value

(1.083) is less than the table value of 2χ (1,0.01) = 6.635. Therefore, we accept

the hypotheses and conclude that ZIPD fits well. The P value is 0.29808.

From Table : 3.6.2 we can observe that calculated chi square value

(0.4047044) is less than the table value of 2χ (1,0.01) = 6.635 and the p-value is

0.524669. As p-value for ZIPD is less than the p-value for ZITPD, we prefer

ZITPD to model the data.

The ZIP distribution has been shown to be useful for modeling

outcomes of manufacturing process producing numerous defect-free products.

When there are several types of defects, the multivariate ZIP model can be

useful to detect specific process equipment problems and to reduce multiple

types of defects simultaneously. In the next chapter, we introduce Bivariate

ZIPSD and Bivariate ZIPD and discuss the inference related to the parameters

involved in the model.