on improvement in variance estimation using auxiliary information

This article was downloaded by: [Pennsylvania State University]On: 09 September 2013, At: 12:32Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

On Improvement in Variance Estimation Using AuxiliaryInformationJavid Shabbir a & Sat Gupta ba Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistanb Department of Mathematics and Statistics, University of North Carolina at Greensboro,Greensboro, North Carolina, USAPublished online: 28 Aug 2007.

To cite this article: Javid Shabbir & Sat Gupta (2007) On Improvement in Variance Estimation Using Auxiliary Information,Communications in Statistics - Theory and Methods, 36:12, 2177-2185, DOI: 10.1080/03610920701215092

To link to this article: http://dx.doi.org/10.1080/03610920701215092

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lsta20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/03610920701215092

http://dx.doi.org/10.1080/03610920701215092

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Communications in Statistics—Theory and Methods, 36: 2177–2185, 2007Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610920701215092

Survey Sampling

On Improvement in Variance EstimationUsing Auxiliary Information

JAVID SHABBIR1 AND SAT GUPTA2

1Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan2Department of Mathematics and Statistics, University of North Carolinaat Greensboro, Greensboro, North Carolina, USA

Kadilar and Cingi (2006) have introduced an estimator for the population varianceusing an auxiliary variable in simple random sampling. We propose a new ratio-type exponential estimator for population variance which is always more efficientthan usual ratio and regression estimators suggested by Isaki (1983) and by Kadilarand Cingi (2006). Efficiency comparison is carried out both mathematically andnumerically.

Keywords Auxiliary variable; Bias; Efficiency; Exponential ratio type-estimator;Mean square error (MSE); Variance (Var).

Mathematics Subject Classification Primary 62O05; Secondary 62F10.

1. Introduction and Notation

Let � be a finite population consisting of N units from which a sample of size n

is to be drawn by simple random sampling without replacement (SRSWOR). Let yand x denote the study and auxiliary variables having sample means y and x, andpopulation means Y and X, respectively. To estimate S2

y =∑N

i=1�yi − Y �2/�N − 1�,it is assumed that S2

x =∑N

i=1�xi − X�2/�N − 1� is known. Let s2y =∑n

i=1�yi − y�2/�n− 1� and s2x =

∑ni=1�xi − x�2/�n− 1� be the sample variances of y and x,

respectively. Let Cy = Sy/Y and Cx = Sx/X denote the coefficients of variation ofy and x, respectively. Let �yx and Cyx = Syx

YXbe the correlation coefficient and

coefficient of variation between y and x, respectively.

Received June 30, 2006; Accepted November 24, 2006Address correspondence to Sat Gupta, Department of Mathematics and Statistics,

University of North Carolina at Greensboro, Greensboro, NC 27402, USA; E-mail:[email protected]

2177

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013

2178 Shabbir and Gupta

Define: Let �0 =(s2y − S2

y

)/S2

y and �1 =(s2x − S2

x

)/S2

x , therefore E��0�=E��1� =0, E

(�20) = ��∗

2�y�, E��21� = ��∗2�x�� E��0�1� = ��∗22, where �2�y� = 40/

220 and �2�x� =

04/202 are the coefficient of kurtosis of y and x, respectively. Let �∗

2�y� = �2�y� − 1,�∗2�x� = �2�x� − 1, and �∗22 = �22 − 1, where �pq = pq/

(p/220

q/202

)� pq =

∑Ni=1

(yi − Y

)p(xi − X�q/N , and � = 1

n. We ignored the finite population correction (fpc) term

because of ease of computation.We discuss below some of the known estimators of population variance.

(i) Conventional variance estimator. For the usual unbiased variance estimatorS2y (the sample variance), the variance is given by

Var(S2y

) = �S4y�

∗2�y� (1)

(ii) Isaki ratio estimator. Isaki (1983) introduced the following ratio estimatorfor population variance:

S2R = s2y

(S2x

s2x

) (2)

Its bias and MSE, to first order of approximation, are given by

Bias(S2R

) = �S2y

[�∗2�x� − �∗22�

](3)

and

MSE(S2R

) = �S4y

[�∗2�y� + �∗

2�x� − 2�∗22] (4)

(iii) Isaki regression estimator. Isaki (1983) also suggested the followingregression estimator for population variance:

S2Reg = s2y + b

(S2x − s2x

)� (5)

where b is the sample regression coefficient whose population regression coefficientis � = S2y �

∗22

S2x�∗2�x�

. The variance of S2Reg, to first order of approximation, is given by

Var(S2Reg

) = �S4y�

∗2�y�

(1− �∗222

�∗2�y��

∗2�x�

)= Var

(S2y

)(1− �2

�s2y �s2x�

)� (6)

where ��s2y �s2x�= �∗22√

�∗2�y�√

�∗2�x�(see Garcia and Cebrian, 1996).

(iv) Kadilar and Cingi estimator. Shabbir and Yaab (2003) introduced thefollowing ratio-type estimator for population mean:

Y SY = �1y + �2

(yX

x

)�� (7)

where � = 1+�Cyx

1+�C2x, such that �1 + �2 = 1.

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013

On Improvement in Variance Estimation 2179

Kadilar and Cingi (2006) used this idea to suggest the following ratio-typeestimator for population variance:

S2KC = w1s

2y + w2

(s2yS2x

s2x

)�� (8)

where w1 + w2 = 1.The optimum bias and MSE of the estimator in (8) are given by

Bias(S2KC

)opt

= S2y

[�w∗

1 − 1�+ w∗2��

∗2�x�

](9)

and

MSE(S2KC

)opt

= �S4y

[z2�∗

2�y� − 2w∗2z��

∗22 + w∗2

2 �2�∗2�x�

]� (10)

respectively, where z = w∗1 + w∗

2�� w∗1 =

�∗2�y��−1�+�∗22�1−2��+�∗2�x��∗2�y� �1−��2/��+2�∗22�1−��+�∗2�x��

and w∗2 = 1− w∗

1.

Kadilar and Cingi (2006) show that when � = 1, the two estimators S2KC and

S2Reg are equally efficient. Kadilar and Cingi also derive conditions under which their

estimator is better than the ratio and regression estimators of Isaki (1983). However,these conditions may not hold true in all cases. This is our motivation to try adifferent variance estimator whose superiority is not subject to any condition.

2. Proposed Estimator

Bahl and Tuteja (1991) introduced an exponential ratio-type estimator forpopulation mean as given by

Y BT = y exp(X − x

X + x

) (11)

The estimator Y BT is more efficient than the usual ratio estimator Y R undercertain conditions. Following Bahl and Tuteja (1991) and Kadilar and Cingi (2006),we propose the following estimator for the population variance:

S2P = [k1s2y + k2�S

2x − s2x�

]exp(S2x − s2x

S2x + s2x

)� (12)

where k1 and k2 are suitably chosen constants. There are two choices in terms ofhow to select k1 and k2. Some authors, such as Kadilar and Cingi (2006), use theconstraint k1 + k2 = 1, while others use an unconstrained selection of k1 and k2.This latter group includes Upadhyaya et al. (1985), Singh (1986, 2000, 2002), Singhet al. (1988), and Dubey and Singh (2001). These authors choose k1 and k2 whichminimize the MSE for the proposed estimator and do not insist on having k1 + k2 =1. We use both options here.

Case 1: Unconstrained choice of k1 and k2. Using notation from Section 1,(12) can be rewritten as

S2P = [k1S2

y �1+ �0�− k2S2x�1][1− 1

2�1 +

38�21 −

] (13)

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


The bias and MSE, to first order of approximation, are given by

Bias(S2P

) � �k1 − 1�S2y + k1S

2y�

[38�∗2�x� −

12�∗22

]+ 1

2k2S

2x��

∗2�x� (14)

and

MSE(S2P

) � �k1 − 1�2S4y + k21S

4y�

[�∗2�y� +

14�∗2�x� − �∗22

]+ k22S

4x��

∗2�x�

− 2k1k2S2yS

2x�

(�∗22 −

12�∗2�x�

) (15)

Setting�MSE

(S2P

)�ki

= 0� �i = 1� 2�, we get the optimum values of k1 and k2 asgiven by

k∗1 =1

1+ �[�∗2�y� − �∗222

�∗2�x�

] and k∗2 =k∗1S

2y

[�∗22 − 1

2�∗2�x�

]S2x�

∗2�x�

Substituting the optimum values k∗1 and k∗2 in (14) and (15), we get the optimumbias and MSE of S2

P as given by

Bias(S2P

)opt

��S2

y

[18�

∗2�x� − �∗

2�y�

(1− �2

�s2y �s2x�

)]1+ ��∗

2�y�

(1− �2

�s2y �s2x�

) (16)

and

MSE(S2P

)opt

� Var(S2Reg

)1+ Var

(S2Reg

)S4y

(see Appendix) (17)

Case 2: k1 + k2 = 1. The proposed estimator, with this constraint, can bewritten as

S∗2P = [k1s2y + �1− k1�

(S2x − s2x

)]exp(S2x − s2x

S2x + s2x

)� (18)

where k1 is suitably chosen constant such that 0 ≤ k1 ≤ 1. Using notation fromSec. 1, (18) can be rewritten as

S∗2P = [k1S2

y �1+ �0�− �1− k1�S2x�1][1− 1

2�1 +

38�21 −

] (19)

The bias and MSE, to first order of approximation, are given by

Bias(S∗2P

) � �k1 − 1�(S2y −

12�S2

x�∗2�x�

)+ k1S

2y�

(38�∗2�x� −

12�∗22

)(20)

and

MSE(S∗2P

) � E

[�k1 − 1�S2

y + k1S2y

(�0 −

12�1

)+ �k1 − 1�S2

x�1

]2 (21)

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


Simplifying (21), we get

MSE(S∗2P

) � �S4y

[k21

{1�+(SxSy

)4

�∗2�x� + �∗

2�y� +14�∗2�x� − �∗22 + 2

(SxSy

)2(�∗22 −

12�∗2�x�

)}

− 2k1

{1�+(SxSy

)4

�∗2�x� +

(SxSy

)2(�∗22 −

12�∗2�x�

)}+ 1

�+(SxSy

)4

�∗2�x�

]

(22)

The optimum value of k1 that minimizes MSE(S∗2P

)is given by

k1 =A1 + A3

A1 + A2 + 2A3

= k∗∗1 (say), where A1 =1�+(SxSy

)4

�∗2�x��

A2 = �∗2�y� +

14�∗2�x� − �∗22 and A3 =

(SxSy

)2(�∗22 −

12�∗2�x�

)

Substituting the optimum value k∗∗1 in (22), we get the optimum MSE of S∗2P as

given by

MSE(S∗2P

)opt

� �S4y

(A1 −

�A1 + A3�2

�A1 + A2 + 2A3�

) (23)

3. Comparison of Estimators

We first compare the proposed estimator S2P with all other competing estimators

considered here.

(i) By (1) and (17),

Var(S2y

)−MSE(S2P

)opt

> 0 if Var(S2y

)− Var(S2y

)(1− �2

�s2y �s2x�

)1+

Var(S2y

)(1−�2

�s2y �s2x�

)S4y

> 0�

Var(S2y

)1+ Var(S2y

)(1− �2

�s2y �s2x�

)S4y

− Var(S2y

)(1− �2

�s2y �s2x�

)> 0� or

Var(S2y

)�2�s2y �s

2x�+

Var(S2y

)(1− �2

�s2y �s2x�

)S4y

> 0

(ii) By (4) and (17),

MSE(S2R

)−MSE(S2P

)opt

> 0 if MSE(S2R

)− Var(S2y

)(1− �2

�s2y �s2x�

)1+

Var(S2y

)(1−�2

�s2y �s2x�

)S4y

> 0�

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


MSE(S2R

)Var

(S2y

)(1− �2

�s2y �s2x�

)S4y

+MSE(S2R

)− Var(S2y

)(1− �2

�s2y �s2x�

)> 0�

MSE(S2R

)Var

(S2y

)(1− �2

�s2y �s2x�

)S4y

+ �S4y�

∗2�y�

[1+ �∗

2�x�

�∗2�y�

− 2�∗22�∗2�y�

]

− Var(S2y

)(1− �2

�s2y �s2x�

)> 0�MSE

(S2R

)Var

(S2y

)(1− �2

�s2y �s2x�

)S4y

+ Var(S2y

)[1+ �∗

2�x�

�∗2�y�

− 2�∗22�∗2�y�

]

− Var(S2y

)(1− �2

�s2y �s2x�

)> 0�MSE

(S2R

)Var

(S2y

)(1− �2

�s2y �s2x�

)S4y

+ Var(S2y

)[�∗2�x�

�∗2�y�

− 2�∗22�∗2�y�

− �2�s2y �s

2x�

]> 0�

MSE(S2R

)Var

(S2y

)(1− �2

�s2y �s2x�

)S4y

+ Var(S2y

)[�∗2�x�

�∗2�y�

− 2�∗22�∗2�y�

− �∗222�∗2�y��

∗2�x�

]> 0�

Var(S2y

)MSE(S2R

)(1− �2

�s2y �s2x�

)S4y

+

√�∗2�x�√

�∗2�y�

− �∗22√�∗2�y�

√�∗2�x�

2 > 0� or

Var(S2y

)MSE(S2R

)(1− �2

�s2y �s2x�

)S4y

+

√�∗2�x�√

�∗2�y�

− ��s2y �s2x�

2 > 0

Similarly,

(iii) By (6) and (17),

Var(S2Reg

)−MSE(S2P

)opt

=Var

(S2y

)(1− �2

�s2y �s2x�

)S2y

2

> 0

(iv) By (10) and (17),

MSE(S2KC

)opt

−MSE(S2P

)opt

= Var(S2y

)MSE(S2KC

)opt

(1− �2

�s2y �s2x�

)S4y

+ z2

w∗2�

z

√�∗2�x�√

�∗2�y�

− ��s2y �s2x�

2 > 0

All of the four conditions above will always hold true indicating that theproposed estimator is more efficient than the competing estimators.

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


Now we compare MSE(S2P

)opt

with MSE(S∗2P

)opt. Using (17) and (23),

MSE(S∗2P

)opt

−MSE(S2P

)opt

> 0 if

MSE(S∗2P

)opt

[1+ Var

(S2Reg

)S4y

]− Var

(S2Reg

)> 0 or

MSE(S∗2P

)opt

+ Var(S2Reg

)MSE(S∗2P

)opt

S4y

− 1

> 0

When this condition is satisfied, the proposed estimator in the unconstrained case ismore efficient than the proposed estimator in the constrained case.

4. Numerical Example and Results

We use the same data as the one used by Kadilar and Cingi (2006). For these data,

y � Level of apple production (1 unit = 100 tonnes), andx � Number of apple trees (1 unit = 100 trees) in 104 villages in the East Anatolia

Region in Turkey in 1999.

N = 104� n = 20� Y = 6254� X = 13931683� Cy = 1866� Cx = 1653�

Cyx = 2668� Sy = 11669964� Sx = 23029072� � = 0865� �2�y� = 16523�

�2�x� = 17516� �22 = 14398� � = 005� w∗1 = 01877� w∗

2 = 081223

Efficiencies of different estimators relative to S2y are given by

RE = Var(S2y

)MSE

(S2i

) × 100� i = y� R�Reg� KC� P

5. Conclusion

From Table 1, we observe that for this data set, the proposed estimator S2P is

consistently more efficient than all the other estimators considered here. This was

Table 1Percent RE of different estimators w.r.t S2

y

Estimator→ S2y S2

R S2Reg S2

KC S2P S∗2

P

n = 10 100.00 296.071 333.515 336.060 488.745 333.802n = 20 100.00 296.071 333.515 334.787 411.130 333.802n = 30 100.00 296.071 333.515 334.293 385.258 333.802n = 40 100.00 296.071 333.515 334.030 372.321 333.802n = 50 100.00 296.071 333.515 333.867 364.561 333.802n = 60 100.00 296.071 333.515 333.756 359.387 333.802

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


Table 2Optimum values of k1 and k2

Sample size k∗1 (Case 1) k∗2 (Case 1) �1− k∗∗1 � (Case 2)

n = 10 0.70389 5.6E-08 7.9E-08n = 20 0.84178 6.7E-08 7.9E-08n = 30 0.70389 7.2E-08 7.9E-08n = 40 0.93318 7.5E-08 7.9E-08n = 50 0.95389 7.6E-08 7.9E-08n = 60 0.96822 7.7E-08 7.9E-08

clearly expected based on the efficiency comparison in the previous section. TheKadilar and Cingi estimator S2

KC performs better than the ratio estimator but isvery similar to the regression estimator. Also, one can note that the efficiency of theproposed estimator, even in the constrained case, is comparable with the regressionestimator

(S2Reg

)and Kadilar and Cingi estimator

(S2KC

)and is better than the ratio

estimator(S2R

). However, the efficiency in the constrained case does not compare

well with the efficiency of the unconstrained estimator S2P for this data set. Results

in Table 2 also suggest that the constrained optimization may not be very ideal sincefor this optimization, the first coefficient is almost 1 and the second coefficient isalmost zero.

Appendix

Substituting the optimum values k∗1 and k∗2 in (15), we have

MSE(S2P

)opt

� S4y[

1+ �(�∗2�y� − �∗222

�∗2�x�

)]2{�(�∗

2�y� −�∗222�∗2�x�

)}2

+ �

(�∗2�y� +

�∗2�x�

4− �∗22

)

− �

�∗2�x�

(�∗22 −

�∗2�x�

2

)2]

MSE(S2P

)opt

� S4y[

1+ �(�∗2�y� − �∗222

�∗2�x�

)]2{�(�∗

2�y� −�∗222�∗2�x�

)}2

+ �

(�∗2�y� −

�∗222�∗2�x�

)

MSE(S2P

)opt

�S4y�

(�∗2�y� − �∗222

�∗2�x�

)[1+ �

(�∗2�y� − �∗222

�∗2�x�

)] = Var(S2Reg

)1+ Var

(S2Reg

)S4y

The above result is given in (17).

Acknowledgments

The authors are thankful to the referees for their valuable suggestions that helpedimprove the article.

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013


References

Bahl, S., Tuteja, R. K. (1991). Ratio and product exponential estimators. J. Inform. Opt. Sci.12(1):159–164.

Dubey, V., Singh, S. K. (2001). An improved regression estimator for estimating populationmean. J. Ind. Soc. Agr. Statist. 54(2):179–183.

Garcia, M. R., Cebrian, A. A. (1996). Repeated substitution method: the ratio estimator forthe population variance. Metrika 43:101–105.

Isaki, C. T. (1983). Variance estimation using auxiliary information. J. Amer. Statist. Assoc.78:117–123.

Kadilar, C., Cingi, H. (2006). Improvement in variance estimation using auxiliaryinformation. Hacett. J. Math. Statist. 35(1):111–115.

Shabbir, J., Yaab, Z. (2003). Improvement over transformed auxiliary variables in estimatingthe finite population mean. Biomed. J. 45(6):723–729.

Singh, H. P. (1986). Estimation of ratio, product and mean using auxiliary information insample surveys. Alig. J. Statist. 6:32–44.

Singh, G. N. (2000). A general class of ratio-type estimators under super population model.Biomed. J. 42(3):363–375.

Singh, G. N. (2002). Empirical studies of generalized classes of ratio and product typeestimators under a linear model. Statist. Trans. 5(4):701–720.

Singh, P., Upadhyaya, L. N., Namjoshi, D. (1988). Estimation of finite population variance.Curr. Sci. 57(24):1331–1334.

Upadhyaya, L. N., Singh, H. P., Vos, J. W. E. (1985). On the estimation of populationmeans and ratios using supplementary information. Statist. Neerlan. 39:309–318.

Dow

nloa

ded

by [

Penn

sylv

ania

Sta

te U

nive

rsity

] at

12:

32 0

9 Se

ptem

ber

2013

on improvement in variance estimation using auxiliary information

Documents