a new estimator of population mean in stratified sampling

This article was downloaded by: [Monash University Library]On: 09 September 2013, At: 12:26Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theoryand MethodsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/lsta20

A New Estimator of Population Mean inStratified SamplingJavid Shabbir a & Sat Gupta ba Department of Statistics, Quaid-I-Azam University, Islamabad,Pakistanb Department of Mathematical Sciences, University of North Carolinaat Greensboro, North Carolina, USAPublished online: 22 Sep 2006.

To cite this article: Javid Shabbir & Sat Gupta (2006) A New Estimator of Population Mean inStratified Sampling, Communications in Statistics - Theory and Methods, 35:7, 1201-1209, DOI:10.1080/03610920600629112

To link to this article: http://dx.doi.org/10.1080/03610920600629112

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lsta20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/03610920600629112

http://dx.doi.org/10.1080/03610920600629112

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Communications in Statistics—Theory and Methods, 35: 1201–1209, 2006Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610920600629112

Sampling Theory

A New Estimator of PopulationMeanin Stratified Sampling

JAVID SHABBIR1 AND SAT GUPTA2

1Department of Statistics, Quaid-I-Azam University, Islamabad, Pakistan2Department of Mathematical Sciences, University of North Carolinaat Greensboro, North Carolina, USA

Kadilar and Cingi (2005) have suggested a new ratio estimator in stratifiedsampling. The efficiency of this estimator is compared with the traditional combinedratio estimator on the basis of mean square error (MSE). We propose anotherestimator by utilizing a simple transformation introduced by Bedi (1996). Theproposed estimator is found to be more efficient than the traditional combined ratioestimator as well as the Kadilar and Cingi (2005) ratio estimator.

Keywords Bias; MSE; Ratio estimator; Stratified sampling.

Mathematics Subject Classification Primary 62D05; Secondary 62F10.

1. Introduction

Let a finite population having N distinct and identifiable units be divided into Lstrata. Let nh be the size of the sample drawn from hth stratum of size Nh by usingsimple random sampling without replacement. Let

L∑h=1

nh = n andL∑

h=1

Nh = N�

Let y and x be the response and auxiliary variables, respectively, assuming valuesyhi and xhi for the ith unit in the hth stratum. Let the stratum means be

�Yh =1Nh

Nh∑i=1

yhi and �Xh =1Nh

Nh∑i=1

xhi�

Received August 12, 2005; Accepted November 23, 2005Address correspondence to Sat Gupta, Department of Mathematical Sciences,

University of North Carolina at Greensboro, North Carolina, USA; E-mail: [email protected]

1201

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13

1202 Shabbir and Gupta

respectively. Let

yst =L∑

h=1

Whyh and xst =L∑

h=1

Whxh�

where

yh =1nh

nh∑i=1

yhi and xh =1nh

nh∑i=1

xhi

are the stratum sample means and Wh = Nh/N . To estimate �Y = ∑Lh=1 Wh

�Yh, weassume that �X = ∑L

h=1 Wh�Xh is known.

A commonly used estimator for �Y is the traditional combined ratio estimatordefined as

�YCR = yst

( �Xxst

)� (1)

The bias of �YCR, to a first degree of approximation, is given by

Bias��YCR� �1�X

L∑h=1

W 2h �h

[RS2

xh − �hSyhSxh]� (2)

where �h = � 1nh

− 1Nh�, R = �Y/�X, and �h, Syh, and Sxh are the population correlation

coefficients between y and x and the population standard deviation of y and thepopulation standard deviation of x in stratum h.

The MSE of �YCR, to a first degree of approximation, is given by

MSE��YCR� �L∑

h=1

W 2h �h

[S2yh + R2S2

xh − 2R�hSyhSxh]� (3)

In this paper, we propose another estimator that performs better than the traditionalcombined ratio estimator and its modification proposed by Kadilar and Cingi(2005).

2. Kadilar and Cingi Estimator

Following Searls (1964), Kadilar and Cingi (2005) have suggested this modificationof the combined ratio estimator:

�YKC = K�YCR� (4)

where K is a constant. The bias and MSE of �YKC , to a first degree of approximation,are given by

Bias��YKC� = �K − 1��Y + KBias��YCR� (5)

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13

A New Estimator of Population Mean 1203

and

MSE��YKC� = �K − 1�2�Y 2 + K2MSE��YCR�� (6)

The optimum value of K so as to minimize MSE��YKC� is given by

K∗ = Y 2

�Y 2 +MSE��YCR��

The corresponding optimum MSE is given by

MSE��YKC�opt =�Y 2MSE��YCR�

�Y 2 +MSE��YCR�� (7)

It is obvious from (7) that

MSE��YKC�opt < MSE��YCR��

Hence the estimator �YKC is more efficient than the combined ratio estimator �YCR.We would like to propose a similar modification, but of a different estimator,

that is based on a transformed auxiliary variable.

3. Proposed Estimator

Consider the following estimator, which is an adaptation of the estimator by Rayand Singh (1981).

�YRS =[y + b��X� − x��

](�Xx

)�

� (8)

where � and � are constant and b is the sample regression coefficient. Kadilar andCingi (2004) considered a special case of this estimator by setting � = � = 1. Themodified estimator is given by

�Y ∗RS =

[y + b��X − x�

](�Xx

)(9)

In stratified sampling, the dual of this estimator is given by

�Y = [yst + b��X − xst�

]( xst�X

)� (10)

This estimator can be further modified by using appropriate transformation ofthe auxiliary variable. We consider the transformation by Bedi (1996). Let Zi =xi + X �i = 1� � � � � N�, where X denotes the population total for the auxiliaryvariable. In stratified sampling, the above transformation becomes Zhi = xhi + X.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13


Also zh = xh +X and �Zh = �Xh + X, so zst =∑L

h=1 Wh�xh + X� = xst + N�X and �Z =∑Lh=1 Wh��Xh + X� = �N + 1��X. Now let

0 =yst −�Y

�Y � 1 =xst −�X

�X �

Then E�0� = E�1� = 0, E�20� =∑L

h=1 W2h �hC

2yh, E�21� =

∑Lh=1 W

2h �hC

2xh, and

E�01� =∑L

h=1 W2h �h�hCyhCxh, where Cyh = Syh/�Y and Cxh = Sxh/�X.

Using Bedi’s transformation, the estimator �Y in (10) can be modified to

�YM = [yst + b��X − xst�

]( zst�Z)� (11)

The estimator �YM can also be written as

�YM = [�Y �1+ 0�+ b�X −�X�1+ 1��](�X�1+ 1�+ N�X

�N + 1��X)

or

�YM = [�Y �1+ 0�− b�X1](

1+ 1N + 1

)� (12)

We now propose an estimator of the type discussed in (4) by modifying theestimator in (11). The proposed estimator is

�Y P = ��YM� (13)

where � is a constant to be determined later. By (12) and (13), we have

�Y P = �

[�Y(1+ 0 +

1N + 1

+ 01N + 1

)− b�X

(1 +

21N + 1

)]

or

�Y P −�Y = ��− 1��Y + �

[�Y(0 +

1N + 1

+ 01N + 1

)− b�X

(1 +

21N + 1

)]� (14)

From (14), the bias of �Y P is given by

Bias��Y P� = E��Y P −�Y � = ��− 1��Y + ��YE(

01N + 1

)− ��XE

(21

N + 1

)� (15)

where � =∑L

h=1 W2h �h�hSyhSxh∑L

h=1 W2h �hS

2xh

is the population regression coefficient. Substituting for �in (15), we get the bias as

Bias��Y P� = ��− 1��Y � (16)

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13


Also from (14), the MSE��Y P� is given by

MSE��Y P� = E��Y P −�Y �2 = E

[��− 1��Y + ��Y

(0 +

1N + 1

)− �b�X1

]2

or

MSE��Y P� = ��− 1�2�Y 2 + �2[�Y 2E

(20 +

21�N + 1�2

+ 201

�N + 1�

)+ �2�X2E�21�

]

− 2�2��Y�XE(01 +

21�N + 1�

)� (17)

Again substitution for � gives

MSE��YP� = ��− 1�2�Y 2 + �2L∑

h=1

W 2h �h

[S2yh�1− �2

c�+R2S2

xh

�N + 1�2

]� (18)

where �c =∑L

h=1 W2h �h�hSyhSxh√∑L

h=1 W2h �hS

2xh

√∑Lh=1 W

2h �hS

2yh

is combined correlation coefficient in stratified

sampling across all strata.For � = 1, expressions in (16) and (18) will give the bias and variance of �YM as

Bias��YM� = 0 (19)

Var��YM� =L∑

h=1

W 2h �h

[S2yh�1− �2

c�+R2S2

xh

�N + 1�2

]� (20)

However, we seek an optimum value of � by minimizing MSE��Y P�. Setting�MSE��Y P�/�� = 0 in (18), we get

� = �Y 2

�Y 2 + Var��YM�= �opt (say)� 0 < �opt < 1�

We obtain the optimum bias and MSE of �Y P after substituting the optimum valueof � in (16) and (18). These are given by

Bias��Y P�opt = − �YVar��YM�

�Y 2 + Var��YM�(21)

and

MSE��Y P�opt =�Y 2Var��YM�

�Y 2 + Var��YM�� (22)

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13


4. Efficiency Comparison

We now compare the proposed estimator �Y P with the traditional combined ratioestimator ��YCR� and the Kadilar and Cingi (2005) estimator �YKC .

(i) From (3) and (22),

MSE��YCR�−MSE(�Y P

)opt

= �Y 2�MSE��YCR − Var��YM��+ �MSE��YCR�Var��YM��

��Y 2 + Var��YM��

Thus MSE��YCR�−MSE��Y P�opt > 0 if MSE��YCR� > Var��YM�. This will be true if

L∑h=1

W 2h �hR

2S2xh

[(1− �hSyh

RSxh

)2

+(

Syh

RSxh

)2

��2c − �2

h�−1

�N + 1�2

]> 0

or

�N + 1�2 >(

A

B + C

)�

where

A =L∑

h=1

W 2h �hS

2xh� B =

L∑h=1

W 2h �hS

2xh

[(1− �hSyh

RSxh

)2]�

and

C =∑L

h=1 W2h �hS

2yh��

2c − �2

h�

R2�

The condition �N + 1�2 >(

AB+C

)is likely to hold true always because of the term

�N + 1� on the left-hand side.

(ii) From (7) and (22),

MSE��YKC�opt −MSE��YP�opt = �Y 4 MSE��YCR�− Var��YM��Y 2 +MSE��YCR��Y 2 + Var��YM��

> 0

if MSE��YCR� > Var��YM�. But this will be true if �N + 1�2 > A/�B + C� as in (i).

Hence the proposed estimator �Y P is more efficient than both the combined ratioestimator �YCR and the estimator �YKC by Kadilar and Cingi (2005) if �N + 1�2 >A/�B + C�, a condition likely to hold true almost always.

5. Data Description and Results

We use the following three examples for comparison. Data summaries and resultsare in Tables 1–4 in the Appendix.

Example 1 [Source: Kadilar and Cingi (2005)]. y is apple production amount in854 villages of Turkey in 1999, and x is the number of apple trees in 854 villages ofTurkey in 1999. The data are stratified by the region of Turkey from each stratum,

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13


and villages are selected randomly using the Neyman allocation as

nh =NhSh∑Lh=1 NhSh

�

Example 2 [Source: Murthy (1967, p. 228)]. y is factories in region; x is fixedcapital. From 80 factories, the data have been classified arbitrarily into four strataon the basis of x values. The strata are x ≤ 500, 500 < x ≤ 1000, 1000 < x ≤ 2000,and x > 2000, respectively. We have randomly selected samples from each stratumby using the proportional allocation, nh = nNh

N, using a total sample size of n = 45.

Example 3 [Source: Murthy (1967, p. 228)]. y is factories in region; x is number ofworkers. From 80 factories, the data have been classified arbitrarily into four strataon the basis of x values. The strata are x < 100, 100 ≤ x < 200, 200 ≤ x < 500, andx ≥ 500, respectively. Again, we use the same procedure of selecting the samplesfrom each stratum as we did in Example 2. We use a total sample size of n = 45.

From data summaries in Tables 1–3, it is easy to verify that the condition�N + 1�2 > A/�B + C� is satisfied for all three examples comfortably.

Condition verification:

(i) Example 1: �N + 1�2 = 731025 > A/�B + C� = 93�295(ii) Example 2: �N + 1�2 = 6561 > A/�B + C� = 3�911(iii) Example 3: �N + 1�2 = 6561 > A/�B + C� = 1�800

Results in Table 4 clearly show gains in efficiency when using the proposedestimator.

6. Concluding Remarks

The new estimator �Y P proves more efficient than the the traditional combined ratioestimator �YCR and its modification �YKC proposed by Kadilar and Cingi (2005), sincethe condition �N + 1�2 > A/�B + C� is likely to hold true comfortably. For the threeexamples discussed here, there is not much difference between the Kadilar and Cingiestimator �YKC and the traditional combined ratio estimator �YCR.

Appendix

Table 1Example 1 summaries

Total Stratum→ 1 2 3 4 5 6

N = 854 Nh 106 106 94 171 204 173n = 140 nh 9 17 38 67 7 2�X = 37600 �Xh 24375 27421 72409 74365 26441 9844�Y = 2930 �Yh 1536 2212 9384 5588 967 404Sx = 144794 Sxh 49189 57461 160757 285603 45403 18794Sy = 17106 Syh 6425 11552 29907 28643 2390 946� = 0�92 �h 0.82 0.86 0.90 0.99 0.71 0.89R = 0�07793 �h 0.102 0.049 0.016 0.009 0.138 0.4942�c = 0�82629 W 2

h 0.015 0.015 0.012 0.04 0.057 0.041

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13



Total Stratum→ 1 2 3 4

N = 80 Nh 19 32 14 15n = 45 nh 11 18 8 8�X = 1126�46 �Xh 349�684 706�594 1539�57 2620�53�Y = 5182�64 �Yh 2967�95 4657�63 6537�21 7843�67Sx = 845�61 Sxh 109�449 109�222 277�181 370�972Sy = 1835�66 Syh 757�089 669�127 416�113 645�688� = 0�9413 �h 0�9364 0�9260 0�9835 0�9692R = 4�6008 �h 0�0383 0�0243 0�0536 0�0583�c = 0�77693 W 2

h 0�05641 0�16 0�03063 0�03516


Total Stratum→ 1 2 3 4

N = 80 Nh 25 23 16 16n = 45 nh 14 13 9 9�X = 284�75 �Xh 71�0 140�696 362�937 749�5�Y = 5182�64 �Yh 3156�64 4766�22 6334�19 7795�31Sx = 270�495 Sxh 14�6116 28�0364 91�3823 174�463Sy = 1835�66 Syh 740�012 515�697 501�399 653�09� = 0�9144 �h 0�8167 0�8231 0�9582 0�9805R = 18�201 �h 0�03143 0�03344 0�04861 0�04861�c = 0�67079 W 2

h 0�09766 0�08266 0�04 0�04

Table 4Results based on data in Tables 1, 2, and 3

Example 1 Example 2 Example 3

�Y �Bias� MSE Eff �Bias� MSE Eff �Bias� MSE Eff

�YCR 13�69 223496�77 100�00 0�99 4232�68 100�00 3�73 16456�65 100�00�YKC 87�69 217825�95 102�60 0�17 4232�01 100�02 0�55 16446�57 100�06�Y P 73�00 213877�86 104�50 0�31 1633�77 259�07 0�40 2057�73 799�75

Efficiency = Eff = MSE��YCR�

MSE��Y z�opt× 100, where z = KC, P.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13


Acknowledgments

The authors would like to thank the referees for their constructive suggestionsthat helped improve the presentation of the paper. Also, the first author wishes toacknowledge with thanks the facilities made available by UNCG during his visitingassignment in the summer of 2005.

References

Bedi, P. K. (1996). Efficient utilization of auxiliary information at estimation stage. Biomet. J.38(8):973–976.

Kadilar, C., Cingi, H. (2004). Ratio estimator in simple random sampling. Appl. Math.Comput. 151:893–904.

Kadilar, C., Cingi, H. (2005). A new ratio estimator in stratified sampling. Comm. Statist.Theory Meth. 34:1–6.

Murthy, M. N. (1967). Sampling Theory and Methods. India: Statistical Publishing Society.Ray, S. K., Singh, R. K. (1981). Difference cum product type estimators. J. Ind. Statist. Asso.

19:147–151.Searls, D. T. (1964). Utilization of known coefficient of kurtosis in the estimation procedure

of variance. J. Am. Statist. Asso. 59:1225–1226.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

2:26

09

Sept

embe

r 20

13

a new estimator of population mean in stratified sampling

Documents