on improvement in variance estimation using auxiliary information
TRANSCRIPT
This article was downloaded by: [Pennsylvania State University]On: 09 September 2013, At: 12:32Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20
On Improvement in Variance Estimation Using AuxiliaryInformationJavid Shabbir a & Sat Gupta ba Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistanb Department of Mathematics and Statistics, University of North Carolina at Greensboro,Greensboro, North Carolina, USAPublished online: 28 Aug 2007.
To cite this article: Javid Shabbir & Sat Gupta (2007) On Improvement in Variance Estimation Using Auxiliary Information,Communications in Statistics - Theory and Methods, 36:12, 2177-2185, DOI: 10.1080/03610920701215092
To link to this article: http://dx.doi.org/10.1080/03610920701215092
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Communications in Statistics—Theory and Methods, 36: 2177–2185, 2007Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610920701215092
Survey Sampling
On Improvement in Variance EstimationUsing Auxiliary Information
JAVID SHABBIR1 AND SAT GUPTA2
1Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan2Department of Mathematics and Statistics, University of North Carolinaat Greensboro, Greensboro, North Carolina, USA
Kadilar and Cingi (2006) have introduced an estimator for the population varianceusing an auxiliary variable in simple random sampling. We propose a new ratio-type exponential estimator for population variance which is always more efficientthan usual ratio and regression estimators suggested by Isaki (1983) and by Kadilarand Cingi (2006). Efficiency comparison is carried out both mathematically andnumerically.
Keywords Auxiliary variable; Bias; Efficiency; Exponential ratio type-estimator;Mean square error (MSE); Variance (Var).
Mathematics Subject Classification Primary 62O05; Secondary 62F10.
1. Introduction and Notation
Let � be a finite population consisting of N units from which a sample of size n
is to be drawn by simple random sampling without replacement (SRSWOR). Let yand x denote the study and auxiliary variables having sample means y and x, andpopulation means Y and X, respectively. To estimate S2
y =∑N
i=1�yi − Y �2/�N − 1�,it is assumed that S2
x =∑N
i=1�xi − X�2/�N − 1� is known. Let s2y =∑n
i=1�yi − y�2/�n− 1� and s2x =
∑ni=1�xi − x�2/�n− 1� be the sample variances of y and x,
respectively. Let Cy = Sy/Y and Cx = Sx/X denote the coefficients of variation ofy and x, respectively. Let �yx and Cyx = Syx
YXbe the correlation coefficient and
coefficient of variation between y and x, respectively.
Received June 30, 2006; Accepted November 24, 2006Address correspondence to Sat Gupta, Department of Mathematics and Statistics,
University of North Carolina at Greensboro, Greensboro, NC 27402, USA; E-mail:[email protected]
2177
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
2178 Shabbir and Gupta
Define: Let �0 =(s2y − S2
y
)/S2
y and �1 =(s2x − S2
x
)/S2
x , therefore E��0�=E��1� =0, E
(�20) = ��∗
2�y�, E��21� = ��∗2�x�� E��0�1� = ��∗22, where �2�y� = 40/
220 and �2�x� =
04/202 are the coefficient of kurtosis of y and x, respectively. Let �∗
2�y� = �2�y� − 1,�∗2�x� = �2�x� − 1, and �∗22 = �22 − 1, where �pq = pq/
(p/220
q/202
)� pq =
∑Ni=1
(yi − Y
)p(xi − X�q/N , and � = 1
n. We ignored the finite population correction (fpc) term
because of ease of computation.We discuss below some of the known estimators of population variance.
(i) Conventional variance estimator. For the usual unbiased variance estimatorS2y (the sample variance), the variance is given by
Var(S2y
) = �S4y�
∗2�y� (1)
(ii) Isaki ratio estimator. Isaki (1983) introduced the following ratio estimatorfor population variance:
S2R = s2y
(S2x
s2x
) (2)
Its bias and MSE, to first order of approximation, are given by
Bias(S2R
) = �S2y
[�∗2�x� − �∗22�
](3)
and
MSE(S2R
) = �S4y
[�∗2�y� + �∗
2�x� − 2�∗22] (4)
(iii) Isaki regression estimator. Isaki (1983) also suggested the followingregression estimator for population variance:
S2Reg = s2y + b
(S2x − s2x
)� (5)
where b is the sample regression coefficient whose population regression coefficientis � = S2y �
∗22
S2x�∗2�x�
. The variance of S2Reg, to first order of approximation, is given by
Var(S2Reg
) = �S4y�
∗2�y�
(1− �∗222
�∗2�y��
∗2�x�
)= Var
(S2y
)(1− �2
�s2y �s2x�
)� (6)
where ��s2y �s2x�= �∗22√
�∗2�y�√
�∗2�x�(see Garcia and Cebrian, 1996).
(iv) Kadilar and Cingi estimator. Shabbir and Yaab (2003) introduced thefollowing ratio-type estimator for population mean:
Y SY = �1y + �2
(yX
x
)�� (7)
where � = 1+�Cyx
1+�C2x, such that �1 + �2 = 1.
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
On Improvement in Variance Estimation 2179
Kadilar and Cingi (2006) used this idea to suggest the following ratio-typeestimator for population variance:
S2KC = w1s
2y + w2
(s2yS2x
s2x
)�� (8)
where w1 + w2 = 1.The optimum bias and MSE of the estimator in (8) are given by
Bias(S2KC
)opt
= S2y
[�w∗
1 − 1�+ w∗2���
∗2�x�
](9)
and
MSE(S2KC
)opt
= �S4y
[z2�∗
2�y� − 2w∗2z��
∗22 + w∗2
2 �2�∗2�x�
]� (10)
respectively, where z = w∗1 + w∗
2�� w∗1 =
�∗2�y���−1�+�∗22�1−2��+�∗2�x���∗2�y� �1−��2/��+2�∗22�1−��+�∗2�x��
and w∗2 = 1− w∗
1.
Kadilar and Cingi (2006) show that when � = 1, the two estimators S2KC and
S2Reg are equally efficient. Kadilar and Cingi also derive conditions under which their
estimator is better than the ratio and regression estimators of Isaki (1983). However,these conditions may not hold true in all cases. This is our motivation to try adifferent variance estimator whose superiority is not subject to any condition.
2. Proposed Estimator
Bahl and Tuteja (1991) introduced an exponential ratio-type estimator forpopulation mean as given by
Y BT = y exp(X − x
X + x
) (11)
The estimator Y BT is more efficient than the usual ratio estimator Y R undercertain conditions. Following Bahl and Tuteja (1991) and Kadilar and Cingi (2006),we propose the following estimator for the population variance:
S2P = [k1s2y + k2�S
2x − s2x�
]exp(S2x − s2x
S2x + s2x
)� (12)
where k1 and k2 are suitably chosen constants. There are two choices in terms ofhow to select k1 and k2. Some authors, such as Kadilar and Cingi (2006), use theconstraint k1 + k2 = 1, while others use an unconstrained selection of k1 and k2.This latter group includes Upadhyaya et al. (1985), Singh (1986, 2000, 2002), Singhet al. (1988), and Dubey and Singh (2001). These authors choose k1 and k2 whichminimize the MSE for the proposed estimator and do not insist on having k1 + k2 =1. We use both options here.
Case 1: Unconstrained choice of k1 and k2. Using notation from Section 1,(12) can be rewritten as
S2P = [k1S2
y �1+ �0�− k2S2x�1][1− 1
2�1 +
38�21 −
] (13)
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
2180 Shabbir and Gupta
The bias and MSE, to first order of approximation, are given by
Bias(S2P
) � �k1 − 1�S2y + k1S
2y�
[38�∗2�x� −
12�∗22
]+ 1
2k2S
2x��
∗2�x� (14)
and
MSE(S2P
) � �k1 − 1�2S4y + k21S
4y�
[�∗2�y� +
14�∗2�x� − �∗22
]+ k22S
4x��
∗2�x�
− 2k1k2S2yS
2x�
(�∗22 −
12�∗2�x�
) (15)
Setting�MSE
(S2P
)�ki
= 0� �i = 1� 2�, we get the optimum values of k1 and k2 asgiven by
k∗1 =1
1+ �[�∗2�y� − �∗222
�∗2�x�
] and k∗2 =k∗1S
2y
[�∗22 − 1
2�∗2�x�
]S2x�
∗2�x�
Substituting the optimum values k∗1 and k∗2 in (14) and (15), we get the optimumbias and MSE of S2
P as given by
Bias(S2P
)opt
��S2
y
[18�
∗2�x� − �∗
2�y�
(1− �2
�s2y �s2x�
)]1+ ��∗
2�y�
(1− �2
�s2y �s2x�
) (16)
and
MSE(S2P
)opt
� Var(S2Reg
)1+ Var
(S2Reg
)S4y
(see Appendix) (17)
Case 2: k1 + k2 = 1. The proposed estimator, with this constraint, can bewritten as
S∗2P = [k1s2y + �1− k1�
(S2x − s2x
)]exp(S2x − s2x
S2x + s2x
)� (18)
where k1 is suitably chosen constant such that 0 ≤ k1 ≤ 1. Using notation fromSec. 1, (18) can be rewritten as
S∗2P = [k1S2
y �1+ �0�− �1− k1�S2x�1][1− 1
2�1 +
38�21 −
] (19)
The bias and MSE, to first order of approximation, are given by
Bias(S∗2P
) � �k1 − 1�(S2y −
12�S2
x�∗2�x�
)+ k1S
2y�
(38�∗2�x� −
12�∗22
)(20)
and
MSE(S∗2P
) � E
[�k1 − 1�S2
y + k1S2y
(�0 −
12�1
)+ �k1 − 1�S2
x�1
]2 (21)
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
On Improvement in Variance Estimation 2181
Simplifying (21), we get
MSE(S∗2P
) � �S4y
[k21
{1�+(SxSy
)4
�∗2�x� + �∗
2�y� +14�∗2�x� − �∗22 + 2
(SxSy
)2(�∗22 −
12�∗2�x�
)}
− 2k1
{1�+(SxSy
)4
�∗2�x� +
(SxSy
)2(�∗22 −
12�∗2�x�
)}+ 1
�+(SxSy
)4
�∗2�x�
]
(22)
The optimum value of k1 that minimizes MSE(S∗2P
)is given by
k1 =A1 + A3
A1 + A2 + 2A3
= k∗∗1 (say), where A1 =1�+(SxSy
)4
�∗2�x��
A2 = �∗2�y� +
14�∗2�x� − �∗22 and A3 =
(SxSy
)2(�∗22 −
12�∗2�x�
)
Substituting the optimum value k∗∗1 in (22), we get the optimum MSE of S∗2P as
given by
MSE(S∗2P
)opt
� �S4y
(A1 −
�A1 + A3�2
�A1 + A2 + 2A3�
) (23)
3. Comparison of Estimators
We first compare the proposed estimator S2P with all other competing estimators
considered here.
(i) By (1) and (17),
Var(S2y
)−MSE(S2P
)opt
> 0 if Var(S2y
)− Var(S2y
)(1− �2
�s2y �s2x�
)1+
Var(S2y
)(1−�2
�s2y �s2x�
)S4y
> 0�
Var(S2y
)1+ Var(S2y
)(1− �2
�s2y �s2x�
)S4y
− Var(S2y
)(1− �2
�s2y �s2x�
)> 0� or
Var(S2y
)�2�s2y �s
2x�+
Var(S2y
)(1− �2
�s2y �s2x�
)S4y
> 0
(ii) By (4) and (17),
MSE(S2R
)−MSE(S2P
)opt
> 0 if MSE(S2R
)− Var(S2y
)(1− �2
�s2y �s2x�
)1+
Var(S2y
)(1−�2
�s2y �s2x�
)S4y
> 0�
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
2182 Shabbir and Gupta
MSE(S2R
)Var
(S2y
)(1− �2
�s2y �s2x�
)S4y
+MSE(S2R
)− Var(S2y
)(1− �2
�s2y �s2x�
)> 0�
MSE(S2R
)Var
(S2y
)(1− �2
�s2y �s2x�
)S4y
+ �S4y�
∗2�y�
[1+ �∗
2�x�
�∗2�y�
− 2�∗22�∗2�y�
]
− Var(S2y
)(1− �2
�s2y �s2x�
)> 0�MSE
(S2R
)Var
(S2y
)(1− �2
�s2y �s2x�
)S4y
+ Var(S2y
)[1+ �∗
2�x�
�∗2�y�
− 2�∗22�∗2�y�
]
− Var(S2y
)(1− �2
�s2y �s2x�
)> 0�MSE
(S2R
)Var
(S2y
)(1− �2
�s2y �s2x�
)S4y
+ Var(S2y
)[�∗2�x�
�∗2�y�
− 2�∗22�∗2�y�
− �2�s2y �s
2x�
]> 0�
MSE(S2R
)Var
(S2y
)(1− �2
�s2y �s2x�
)S4y
+ Var(S2y
)[�∗2�x�
�∗2�y�
− 2�∗22�∗2�y�
− �∗222�∗2�y��
∗2�x�
]> 0�
Var(S2y
)MSE(S2R
)(1− �2
�s2y �s2x�
)S4y
+
√�∗2�x�√
�∗2�y�
− �∗22√�∗2�y�
√�∗2�x�
2 > 0� or
Var(S2y
)MSE(S2R
)(1− �2
�s2y �s2x�
)S4y
+
√�∗2�x�√
�∗2�y�
− ��s2y �s2x�
2 > 0
Similarly,
(iii) By (6) and (17),
Var(S2Reg
)−MSE(S2P
)opt
=Var
(S2y
)(1− �2
�s2y �s2x�
)S2y
2
> 0
(iv) By (10) and (17),
MSE(S2KC
)opt
−MSE(S2P
)opt
= Var(S2y
)MSE(S2KC
)opt
(1− �2
�s2y �s2x�
)S4y
+ z2
w∗2�
z
√�∗2�x�√
�∗2�y�
− ��s2y �s2x�
2 > 0
All of the four conditions above will always hold true indicating that theproposed estimator is more efficient than the competing estimators.
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
On Improvement in Variance Estimation 2183
Now we compare MSE(S2P
)opt
with MSE(S∗2P
)opt. Using (17) and (23),
MSE(S∗2P
)opt
−MSE(S2P
)opt
> 0 if
MSE(S∗2P
)opt
[1+ Var
(S2Reg
)S4y
]− Var
(S2Reg
)> 0 or
MSE(S∗2P
)opt
+ Var(S2Reg
)MSE(S∗2P
)opt
S4y
− 1
> 0
When this condition is satisfied, the proposed estimator in the unconstrained case ismore efficient than the proposed estimator in the constrained case.
4. Numerical Example and Results
We use the same data as the one used by Kadilar and Cingi (2006). For these data,
y � Level of apple production (1 unit = 100 tonnes), andx � Number of apple trees (1 unit = 100 trees) in 104 villages in the East Anatolia
Region in Turkey in 1999.
N = 104� n = 20� Y = 6254� X = 13931683� Cy = 1866� Cx = 1653�
Cyx = 2668� Sy = 11669964� Sx = 23029072� � = 0865� �2�y� = 16523�
�2�x� = 17516� �22 = 14398� � = 005� w∗1 = 01877� w∗
2 = 081223
Efficiencies of different estimators relative to S2y are given by
RE = Var(S2y
)MSE
(S2i
) × 100� i = y� R�Reg� KC� P
5. Conclusion
From Table 1, we observe that for this data set, the proposed estimator S2P is
consistently more efficient than all the other estimators considered here. This was
Table 1Percent RE of different estimators w.r.t S2
y
Estimator→ S2y S2
R S2Reg S2
KC S2P S∗2
P
n = 10 100.00 296.071 333.515 336.060 488.745 333.802n = 20 100.00 296.071 333.515 334.787 411.130 333.802n = 30 100.00 296.071 333.515 334.293 385.258 333.802n = 40 100.00 296.071 333.515 334.030 372.321 333.802n = 50 100.00 296.071 333.515 333.867 364.561 333.802n = 60 100.00 296.071 333.515 333.756 359.387 333.802
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
2184 Shabbir and Gupta
Table 2Optimum values of k1 and k2
Sample size k∗1 (Case 1) k∗2 (Case 1) �1− k∗∗1 � (Case 2)
n = 10 0.70389 5.6E-08 7.9E-08n = 20 0.84178 6.7E-08 7.9E-08n = 30 0.70389 7.2E-08 7.9E-08n = 40 0.93318 7.5E-08 7.9E-08n = 50 0.95389 7.6E-08 7.9E-08n = 60 0.96822 7.7E-08 7.9E-08
clearly expected based on the efficiency comparison in the previous section. TheKadilar and Cingi estimator S2
KC performs better than the ratio estimator but isvery similar to the regression estimator. Also, one can note that the efficiency of theproposed estimator, even in the constrained case, is comparable with the regressionestimator
(S2Reg
)and Kadilar and Cingi estimator
(S2KC
)and is better than the ratio
estimator(S2R
). However, the efficiency in the constrained case does not compare
well with the efficiency of the unconstrained estimator S2P for this data set. Results
in Table 2 also suggest that the constrained optimization may not be very ideal sincefor this optimization, the first coefficient is almost 1 and the second coefficient isalmost zero.
Appendix
Substituting the optimum values k∗1 and k∗2 in (15), we have
MSE(S2P
)opt
� S4y[
1+ �(�∗2�y� − �∗222
�∗2�x�
)]2{�(�∗
2�y� −�∗222�∗2�x�
)}2
+ �
(�∗2�y� +
�∗2�x�
4− �∗22
)
− �
�∗2�x�
(�∗22 −
�∗2�x�
2
)2]
MSE(S2P
)opt
� S4y[
1+ �(�∗2�y� − �∗222
�∗2�x�
)]2{�(�∗
2�y� −�∗222�∗2�x�
)}2
+ �
(�∗2�y� −
�∗222�∗2�x�
)
MSE(S2P
)opt
�S4y�
(�∗2�y� − �∗222
�∗2�x�
)[1+ �
(�∗2�y� − �∗222
�∗2�x�
)] = Var(S2Reg
)1+ Var
(S2Reg
)S4y
The above result is given in (17).
Acknowledgments
The authors are thankful to the referees for their valuable suggestions that helpedimprove the article.
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013
On Improvement in Variance Estimation 2185
References
Bahl, S., Tuteja, R. K. (1991). Ratio and product exponential estimators. J. Inform. Opt. Sci.12(1):159–164.
Dubey, V., Singh, S. K. (2001). An improved regression estimator for estimating populationmean. J. Ind. Soc. Agr. Statist. 54(2):179–183.
Garcia, M. R., Cebrian, A. A. (1996). Repeated substitution method: the ratio estimator forthe population variance. Metrika 43:101–105.
Isaki, C. T. (1983). Variance estimation using auxiliary information. J. Amer. Statist. Assoc.78:117–123.
Kadilar, C., Cingi, H. (2006). Improvement in variance estimation using auxiliaryinformation. Hacett. J. Math. Statist. 35(1):111–115.
Shabbir, J., Yaab, Z. (2003). Improvement over transformed auxiliary variables in estimatingthe finite population mean. Biomed. J. 45(6):723–729.
Singh, H. P. (1986). Estimation of ratio, product and mean using auxiliary information insample surveys. Alig. J. Statist. 6:32–44.
Singh, G. N. (2000). A general class of ratio-type estimators under super population model.Biomed. J. 42(3):363–375.
Singh, G. N. (2002). Empirical studies of generalized classes of ratio and product typeestimators under a linear model. Statist. Trans. 5(4):701–720.
Singh, P., Upadhyaya, L. N., Namjoshi, D. (1988). Estimation of finite population variance.Curr. Sci. 57(24):1331–1334.
Upadhyaya, L. N., Singh, H. P., Vos, J. W. E. (1985). On the estimation of populationmeans and ratios using supplementary information. Statist. Neerlan. 39:309–318.
Dow
nloa
ded
by [
Penn
sylv
ania
Sta
te U
nive
rsity
] at
12:
32 0
9 Se
ptem
ber
2013